Python’s import structure is freaking confusing. Learning by examples (i.e. imitating example code) does not help understanding the logic of it, and there are a lot of possible invalid combinations that are dead ends. You need to understand the concepts below to use it confidently!
Just like C++ quirks, very often there’s valid reasoning behind this confusing Python design choice and it’s not immediately obvious. Each language cater certain set of use cases at the expense of making other scenarios miserable. That’s why there’s no best universal language for all projects. Know the trade-offs of the languages so you can pick the right tool for the job.
MATLAB’s one file per function/script design
MATLAB made the choice of having one file describe one exposed object/function/class/script so it maps directly into the mental model of file systems. This is good for both user’s sanity and have behavioral advantages for MATLAB’s interpreter
- Users can reason the same same way as they do with files, which is less mental gymnastics
- Users can keep track of what’s available to them simply by browsing the directory tree and filenames because file names are function names, which should be sensibly chosen.
- Just like users, MATLAB also leverage the file system for indexing available functions and defer loading the contents to the memory until it’s called at runtime, which means changes are reflected automatically.
Package/modules namespace models in MATLAB vs Python
MATLAB traditionally dumps all free functions (.m
files) available in its search paths into the root workspace. Users are responsible for not picking colliding names. Classes, namespaces and packages are after-thoughts in MATLAB while the OOP dogma is the central theme of Python, so obviously such practices are frowned upon.
RANT: OOP is basically a worldview formed by adding artificial man-made constructs (meanings such as agents, hierarchy, relationships) to the idea of bundling code (programs) and data (variables) in isolated packages controlled (scoped) by namespaces (which is just the lexer in your compiler enforcing man-made rules). The idea of code and data being the same thing came from Von Neumann Architecture: your hard drive or RAM doesn’t care what the bits stands for; it’s up to your processor and OS to exercise self-restraint. People are often tempted to follow rules too rigidly or not to take them seriously when what really matters is understanding where the rules came from, why they are useful in certain contexts and where they do not apply.
Packages namespaces are pretty much the skeleton of classes so the structure and syntax is the same for both. From my memory, it was at around 2015 that MATLAB started actively encouraging users (and their own internal development) to move away from the flat root workspace model and use packages to tuck away function names that are not immediately relevant to their interests and summon them through import
syntax as needed. This practice is mandatory (enforced) in Python!
However are a few subtle differences between the two in terms of the package/module systems:
- MATLAB does not have
from
statement becauseimport
do not have the option to expose the (nested tree of) package name to the workspace. It always dumps the leaf-node to the current workspace, the same way asfrom ... import
syntax is used in Python. - MATLAB does not have an optional
as
statement for you to give an alternative name to the package you just imported. In my opinion, Python has to provide theas
statement as an option to shorten package/module names because it was too aggressively tucking away commonly used packages (such asnumpy
) that forcing people to spell the informative names in full is going to be an outcry. - Unlike free functions (
.m
files), MATLAB classes are cached once the object is instantiated untilclear classes
or the like that gets rid of all instances in the workspace. Python’s module has the same behavior, which you need to unload withdel
(which is like MATLAB’sclear
). - Python’s modules are not classes, though most of the time they behave like MATLAB’s static classes. Because the lack of instantiated instances, you can reload Python modules with
importlib.reload()
. On the other hand, since MATLAB packages merely manages when the.m
files can get into the current scope (withimport
command), the file system still indexes the available function list. Changes in.m
file functions reflects immediately on the next call in MATLAB, yet Python has to reload the module to update the function names index because the only way to look at what functions are available is revisiting the contents of an updated.py
file! - MATLAB abstracts folder names (that starts with
+
symbol) as packages and functions as.m
files while Python abstracts the.py
file as a module (like MATLAB’s package) and the objects are the contents inside it. Therefore Python packages is analogous to the outer level of a double-packed (nested) MATLAB package. I’ll explain this in detail in the next sections.
Files AND directories are treated the same way in module hierarchy!
This comes with a few implications
- if you name your project
/myproj/myproj.py
with a functiondef myproj(),
which is a very usual thing most MATLAB users would do, your module is calledmyproj.myproj
and if you justimport myproj
, you will call your function asmyproj.myproj.myproj()
! - you can confuse Python module loader if you have a subfolder named the same as a
.py
file at the same level. The subfolder will prevail and the.py
file with the same name is shadowed!
The reason is that Python allows users to mix scripts, functions, classes in the same file and they classes or functions do not need to match the filenames in order for Python to find it, therefore the filename itself serves as the label for the collection (module) of functions, classes and other (script) objects inside! The directory is a collection of these files which itself is a collection, so it’s a two level nest because a directory containing a .py
file is a collection of collection!
On the other hand, in MATLAB, it’s one .m
file per (publicly exposed) function, classes or scripts, so the system registers and calls them by the filename, not really by how you named it inside. If you have a typo in your function name that doesn’t match your filename, your filename will prevail if there’s only one function there. Helper functions not matching the filename will not be exposed and it will have a static/file-local scope.
Packages in MATLAB are done in folders that starts with a +
symbol. Packages by default are not exposed to global namespaces in your MATLAB’s paths. They work like Python’s module so you also get them into your current workspace with import
. This means it’s not possible to define a module in a file like Python. Each filename exclusively represent one accessible function or classes in the package (no script variables though).
So in other words, there are no such thing called modules in MATLAB because the concept is called package. Python separated the two concepts because .py
file allowing a mixture of scripts, classes and loose functions formed a logical unit with the same structure as packages itself, so they need another name called module to separate folder-based collection (logical unit) and file-based collections (logical unit).
This is very counterintuitive at the surface (because it defeats the point of directories) if you don’t know Python allowing user to mix scripts, functions and classes in a file meant the file itself is a module/collection of executable contents.
from (package/module) import (package/module or objectS) <as (namespace)>
This syntax is super confusing, especially before we understand that
- packages has to be folders (folder form of modules)
- modules can be
.py
files as well as packages - packages/modules are technically objects
The hierarchy for the from import as
syntax looks like this:
package_folder > file.py > (obj1, obj2, ... )
This has the following implications:
from
strips the specified namespace soimport
dumps the node contents to root workspaceimport
withoutfrom
exposes the entire hierarchy to the root workspace.- functions, classes and variables in the scripts are ALL OBJECTS.
- if you do
import mymodule
, a functionf
inmymodule.py
can only be accessed throughmymodule.f()
, if you want to just call f() at the workspace, dofrom mymodule import f
These properties also shapes the rules for where wildcards are used in the statement:
from
cannot have wildcards because they are either a folder (package) or a file (module)import
is the only place that can have wildcards*
because it is only possible to load multiple objects from one.py
file.import
* cannot be used withoutfrom
statement because you need to at some point load a.py
file- it’s a dead end to do
from package import *
beacuse it’s trying to load the files to the root workspace which they are uncallalble. - it also does not make sense (nor possible) to follow
import *
withas
statement because there is no mechanism to map multiple objects into one object name
So the bottom line is that your from import as
statement has to somehow load a .py
file in order to be valid. You can only choose between these two usage:
- load the
.py
file withfrom
statement and pick the objects atimport
, or - skip the
from
statement andimport
the.py
file, not getting to choose the objects inside it.
as
statement can only work if you have only one item specified in import
, whether it’s the .py
file or the objects inside it. Also, if you understand the rationales above, you’ll see that these two are equivalent:
from package_A import module_file_B as namespace_C import package_A.module_file_B as namespace_C
because with as
statement, whatever node you have selected is accessed through the output namespace you have specified, so whether you choose to strip the path name structure
in the extracted output (i.e. use from
statement) is irrelevant since you are not using the package and module names in the root namespace anymore.
The behavior of from import as
is very similar to the choices you have to make extracting a zip file with nested folder structures, except that you have to make a mental substitution that a .py
file is analogous to a subfolder while the objects described in the .py
file is analogous to files in the said subfolder. Aargh!