Python Cheat Sheet

Built-in functions

  • breakpoint(): Python’s version of MATLAB’s keyboard() command
  • callable(): Like MATLAB’s isfunction() but it really checks if there’s a __call__ method
  • getattr()/hasattr(): MATLAB’s getfield()/isfield(). The 3rd parameter of getfield() is a shortcut to spit out a default if there’s no such field/attribute, which MATLAB doesn’t have
  • globals()/locals(): more convenient than MATLAB because the whole workspace (current variables) are accessed as a dictionary in Python by calling locals() and globals()
  • id(): memory address of the item where the variable (reference) is pointing to. Think of it as &x in C.
  • isinstance(): MATLAB’s isa()
  • next(): Python favors not actually computing the values until needed so instead it offers a generator (forward iterable) function that spits out one value at each time you kick it with next() and you can’t go back.
  • chr()/ord(): analogous to MATLAB’s char()/double() cast for characters
  • Python’s exponentiation is **, not ^ like most other languages (C does not have exponentiation symbol, and ^ was used for xor)
  • print(…, flush=false) allows a courtesy flush
  • repr(): MATLAB’s version of disp(), also overloadable standard interface
  • slice(): MATLAB’s equivalent of colon() special interface

Context Manager

@contextlib.contextmanager decorators basically splits a set-try-yield-finally boilerplate function into 3 parts: __enter__ (everything above yield), BODY (where the yield goes to) and __exit__ (everything below yield), since a with-as statement is a rigidly defined try-finally block, roughly like this:

with EXPR as f:
  BODY(using f)
__enter__: f=EVAL(EXPR)
  # f isn't evaluated till yield  
  yield f  # Goes to BODY
  __exit__: cleanup(f)


What Python does not do, and what are the alternatives

RAII: Using stack unwind to manage program flow

Python uses garbage collectors, so onCleanup() might often work, but it’s not guaranteed to. So any code based on that should not be in production

Answer: Context Manager, a glorified try-catch (more specifically try-finally) block with a rigid structure. It’s a pain in the butt and not fun to deal with if you want to deviate from the native ContextManager that came with the resource opener

‘switch’-case is back as ‘match’-case (the advanced uses are different)

New Python finally supports it, ‘switch’ in C is called ‘match‘ in Python and there are many handy and intuitive syntax just like in MATLAB! Horray!

If you try to do anything fancy with mutables in the cases, be careful about the side effects!

Pass by Variable (Copy-on-Write) like MATLAB

Variables are by large references in Python. Everything including integers are some sort of classes (which are in turn dictionaries with special treatment to certain key names). The garbage collector scans for the last guy using that part of memory not referencing it anymore before cleaning it.

Python even have one ‘None’ for the entire universe with a gazillion things going on pointing to the same memory address where None is stored (that’s why None is idoimatically checked by ‘is’ keyword which checks the address for speed instead of ‘==’ which actually verify the contents for speed). If you look up the reference count (see garbage collector) for commonly used numbers like 1s and 0s, there are thousands of ‘users’ of it!

With C++, it’s a mixture. Complex objects are usually passed by reference for performance reasons but simple structs and data types can be passed as variables (C/C++’s nomenclature calls the non-reference/pointers variables though technically references/pointers are just the same integers identified as addresses) that gets cloned and destroyed when they move across function (stack) boundaries.

In MATLAB, they want it industrial strength, so that’d rather not allow anything insidious/non-transparent to happen in your code by keeping it all pass-by-variable, that is everything is supposed to be treated as different copies as it crosses function boundaries. For performance reason, they figured if you passed a big matrix just for the function to read, MATLAB doesn’t really have to clone that so under the hood you can peak the same matrix that belong to the caller. Once your function changes the contents (they are pretending that it’s a separate copy so of course you can), MATLAB painfully makes a whole copy of it (copy-on-write) which you then have to lug the 2nd (modified) copy around when it travels past function boundaries.

Python takes this idea a lot further by having anything that’s exactly the same (including None or string literals) to point to the same object until you ask to change the contents, then it makes a new copy for you to change and point to the new copy specifically for the variable name you are referencing with.

Answer: The way Python prevents variables passed as parameters passed into a function from getting modified is to separate variables into mutable (lists, dicts, sets), and immutable (tuples, frozendict is a package right now, frozenset, numerics, strings) types. Anything immutable going past the function boundary gets their own local copy.

Classes Boundaries

MATLAB and C++ has stringent access control, but not in Python. There’s not even const correctness. Just signal your intention with variable naming schemes like all caps and __ prefixes.

C++ do not separate helper functions and class (non-instance) methods. What C++ called static members are really just glorified free functions and global variables tucked under the namespace that happened to be in the class (classes started off as namespace for structs then people add features like overloading and dispatch mechanisms). partitioning the global workspace. Whether a scoped helper function call a scoped variable that happened in the same namespace is nothing special to C++

Python does separate these two concepts though. Class method in Python (decorated by @classmethod), on the other hand, are equivalent to C++’s static methods which they are not allowed to touch anything instance-specific, but they can access anything class-specific. Helper functions, which is called ‘Static Method‘ (decorated by @staticmethod) in Python cannot even touch anything specific to the class.

Variable arguments

C++ doesn’t generally do variable arguments because it defeats polymorphism that uses a function signatures (which is a list of your argument types) to figure out which function to dispatch.

MATLAB uses cell to pack variable arguments. The common idioms are varargin{:} and [varargout{1:nargout}]. To accommodate variable arguments, MATLAB have to give up polymorphism but they still have a little bit of it left: they do dispatch based on the first argument type and it’s very useful in avoiding a lot of stupid switching by detecting data types: just use a consistent function name interface and have each data type implement its own method with the same name.

In Python, there’s no such thing as multiple outputs (return variables) on Python: you output a list and it always gets unpacked (just like MATLAB’s deal() function) when you type a list out on left hand side. If the left hand side is a singleton, it will get the full list that’s still packed. If you write out the elements (which makes the left hand side a list), the returned list will have elements assigned to the left hand side depending on your syntax.

This is often a point of agony deciding on output format when I develop MATLAB code. Apparently TMW wondered the same thing too because their own factory code is all over the place on this too. Most of the time it’s not a good idea to have a context-dependent (depending on how many outputs the user supplied) even if you can technically do that by detecting nargout in MATLAB.

Answer: My recommendation is to make sure the simple, most common case got priority, and stuff the juicy side info in packaged data structures (such as array or cells/monads/lists) and stick to a fixed output format whether you are in Python or MATLAB

Late Binding in Lambda / Anonymous Functions: Capture it!

This often throw people off in Python. MATLAB uses early binding, which means when you created that anonymous function (aka lambda), the free variables (parameters that are not running, aka the input arguments to the lambda/anonymous-function) captured the snapshot of the local workspace at the moment the lambda/anonymous-function was created!

Python on the other hand, uses the same approach as C++: late binding. This means you have to explicitly capture (make a snapshot copy) the free variables if you want to associate it with the values when the anonymous function (lambda) was created, not to wait until the lambda was actually called/used to look for what values to use in the free variables.

P = 612
# P was not captured, thus late binding
f = lambda x     : print{f'input/running:{x}, param/free:{P}')
# P was captured as p, thus early binding
g = lambda x, p=P: print{f'input/running:{x}, param/free:{p}')
P = 721
f(8964) # shows "input/running: 8964, param/free: 721"
g(8964) # shows "input/running: 8964, param/free: 612"


Useful Python tricks

This article is not for Pythonisms (like ContextManager), etc, the way things should normally be done in Python, but the more non-obvious way to solve problems or new features that are available specifically in Python.

Use / and * argument as separator for different forms of parameter entries!

There is special syntax to separate positional-only, positional/keyword, and keyword-only parameters.

Make a variant of existing class/object

This ninja technique is useful when you want to keep the object mostly the way it is but add/override a few things you think it didn’t do right without inheriting or use composition (hide a copy of object as a member) and write a proxy mirror for every member of it.

In C++, this situation is often used when you want to modify a concrete class that doesn’t have a virtual destructor, most notoriously STL which you are not supposed to inherit from (or else the client might pass a pointer to the parent/base so the child object’s destructors are not called as there are no vtables to keep track of which method to dispatch). In C++, this is often the few use cases that calls for private inheritance.

In Python, because everything is a recursive dict that specially named (magic) functions are recognized, there is a __getattr__ method that gets called whenever a member is accessed, which is the case when the member is called (in Python you simply get the functor as an attribute, aka value in the key-value pairs, in the dictionary and add brackets to call it). This means you can re-route what attributes (members) are returned simply by overloading this method!

If you can overload __getattr__, it also mean you can redefine the member interface of your entire class! So a strategy to make a class have the same exact interface as another class is to hide a copy/reference to the underlying class and re-route the __getattr__ to the underlying object’s __getattr__! Here’s the gist of it:

class M:
    def __init__(self, underlying_class):
          self.__obj = underlying_class  
    def __getattr__(self, attr):
        return getattr(self.__obj, attr)

This can be improved a little bit. Just pick a member name that won’t clash with the underlying object’s attributes/members and simply return the ‘hidden’ object when specifically requested, not self.__obj.__obj.

class M:
    def __init__(self, underlying_class):
          self.__obj = underlying_class  
    def __getattr__(self, attr):
        # Prevent self-calling        
        if attr != '__obj':
            return getattr(self.__obj, attr)
            return self.__obj

This is a very powerful luxury that makes Python so lovable if you are not here for industrial strength programming. MATLAB’s classes are hard-wired to your class definition .m file, not something you can update on the fly as you please. Any attempts to do so (aka breaking the safeguards) are Undocumented MATLAB territory where you mess with the Java under the hood and change the metadata property to fool the objects to do what you want.


Python packages, modules and imports

Python’s import structure is freaking confusing. Learning by examples (i.e. imitating example code) does not help understanding the logic of it, and there are a lot of possible invalid combinations that are dead ends. You need to understand the concepts below to use it confidently!

Just like C++ quirks, very often there’s valid reasoning behind this confusing Python design choice and it’s not immediately obvious. Each language cater certain set of use cases at the expense of making other scenarios miserable. That’s why there’s no best universal language for all projects. Know the trade-offs of the languages so you can pick the right tool for the job.

MATLAB’s one file per function/script design

MATLAB made the choice of having one file describe one exposed object/function/class/script so it maps directly into the mental model of file systems. This is good for both user’s sanity and have behavioral advantages for MATLAB’s interpreter

  1. Users can reason the same same way as they do with files, which is less mental gymnastics
  2. Users can keep track of what’s available to them simply by browsing the directory tree and filenames because file names are function names, which should be sensibly chosen.
  3. Just like users, MATLAB also leverage the file system for indexing available functions and defer loading the contents to the memory until it’s called at runtime, which means changes are reflected automatically.

Package/modules namespace models in MATLAB vs Python

MATLAB traditionally dumps all free functions (.m files) available in its search paths into the root workspace. Users are responsible for not picking colliding names. Classes, namespaces and packages are after-thoughts in MATLAB while the OOP dogma is the central theme of Python, so obviously such practices are frowned upon.

RANT: OOP is basically a worldview formed by adding artificial man-made constructs (meanings such as agents, hierarchy, relationships) to the idea of bundling code (programs) and data (variables) in isolated packages controlled (scoped) by namespaces (which is just the lexer in your compiler enforcing man-made rules). The idea of code and data being the same thing came from Von Neumann Architecture: your hard drive or RAM doesn’t care what the bits stands for; it’s up to your processor and OS to exercise self-restraint. People are often tempted to follow rules too rigidly or not to take them seriously when what really matters is understanding where the rules came from, why they are useful in certain contexts and where they do not apply.

Packages namespaces are pretty much the skeleton of classes so the structure and syntax is the same for both. From my memory, it was at around 2015 that MATLAB started actively encouraging users (and their own internal development) to move away from the flat root workspace model and use packages to tuck away function names that are not immediately relevant to their interests and summon them through import syntax as needed. This practice is mandatory (enforced) in Python!

However are a few subtle differences between the two in terms of the package/module systems:

  • MATLAB does not have from statement because import do not have the option to expose the (nested tree of) package name to the workspace. It always dumps the leaf-node to the current workspace, the same way as from ... import syntax is used in Python.
  • MATLAB does not have an optional as statement for you to give an alternative name to the package you just imported. In my opinion, Python has to provide the as statement as an option to shorten package/module names because it was too aggressively tucking away commonly used packages (such as numpy) that forcing people to spell the informative names in full is going to be an outcry.
  • Unlike free functions (.m files), MATLAB classes are cached once the object is instantiated until clear classes or the like that gets rid of all instances in the workspace. Python’s module has the same behavior, which you need to unload with del (which is like MATLAB’s clear).
  • Python’s modules are not classes, though most of the time they behave like MATLAB’s static classes. Because the lack of instantiated instances, you can reload Python modules with importlib.reload(). On the other hand, since MATLAB packages merely manages when the .m files can get into the current scope (with import command), the file system still indexes the available function list. Changes in .m file functions reflects immediately on the next call in MATLAB, yet Python has to reload the module to update the function names index because the only way to look at what functions are available is revisiting the contents of an updated .py file!
  • MATLAB abstracts folder names (that starts with + symbol) as packages and functions as .m files while Python abstracts the .py file as a module (like MATLAB’s package) and the objects are the contents inside it. Therefore Python packages is analogous to the outer level of a double-packed (nested) MATLAB package. I’ll explain this in detail in the next sections.

Files AND directories are treated the same way in module hierarchy!

This comes with a few implications

  • if you name your project /myproj/ with a function def myproj(), which is a very usual thing most MATLAB users would do, your module is called myproj.myproj and if you just import myproj, you will call your function as myproj.myproj.myproj()!
  • you can confuse Python module loader if you have a subfolder named the same as a .py file at the same level. The subfolder will prevail and the .py file with the same name is shadowed!

The reason is that Python allows users to mix scripts, functions, classes in the same file and they classes or functions do not need to match the filenames in order for Python to find it, therefore the filename itself serves as the label for the collection (module) of functions, classes and other (script) objects inside! The directory is a collection of these files which itself is a collection, so it’s a two level nest because a directory containing a .py file is a collection of collection!

On the other hand, in MATLAB, it’s one .m file per (publicly exposed) function, classes or scripts, so the system registers and calls them by the filename, not really by how you named it inside. If you have a typo in your function name that doesn’t match your filename, your filename will prevail if there’s only one function there. Helper functions not matching the filename will not be exposed and it will have a static/file-local scope.

Packages in MATLAB are done in folders that starts with a + symbol. Packages by default are not exposed to global namespaces in your MATLAB’s paths. They work like Python’s module so you also get them into your current workspace with import. This means it’s not possible to define a module in a file like Python. Each filename exclusively represent one accessible function or classes in the package (no script variables though).

So in other words, there are no such thing called modules in MATLAB because the concept is called package. Python separated the two concepts because .py file allowing a mixture of scripts, classes and loose functions formed a logical unit with the same structure as packages itself, so they need another name called module to separate folder-based collection (logical unit) and file-based collections (logical unit).

This is very counterintuitive at the surface (because it defeats the point of directories) if you don’t know Python allowing user to mix scripts, functions and classes in a file meant the file itself is a module/collection of executable contents.

from (package/module) import (package/module or objectS) <as (namespace)>

This syntax is super confusing, especially before we understand that

  1. packages has to be folders (folder form of modules)
  2. modules can be .py files as well as packages
  3. packages/modules are technically objects

The hierarchy for the from import as syntax looks like this:

package_folder > > (obj1, obj2, ... )

This has the following implications:

  • from strips the specified namespace so import dumps the node contents to root workspace
  • import without from exposes the entire hierarchy to the root workspace.
  • functions, classes and variables in the scripts are ALL OBJECTS.
  • if you do import mymodule, a function f in can only be accessed through mymodule.f(), if you want to just call f() at the workspace, do from mymodule import f

These properties also shapes the rules for where wildcards are used in the statement:

  • from cannot have wildcards because they are either a folder (package) or a file (module)
  • import is the only place that can have wildcards * because it is only possible to load multiple objects from one .py file.
  • import * cannot be used without from statement because you need to at some point load a .py file
  • it’s a dead end to do from package import * beacuse it’s trying to load the files to the root workspace which they are uncallalble.
  • it also does not make sense (nor possible) to follow import * with as statement because there is no mechanism to map multiple objects into one object name

So the bottom line is that your from import as statement has to somehow load a .py file in order to be valid. You can only choose between these two usage:

  • load the .py file with from statement and pick the objects at import, or
  • skip the from statement and import the .py file, not getting to choose the objects inside it.

as statement can only work if you have only one item specified in import, whether it’s the .py file or the objects inside it. Also, if you understand the rationales above, you’ll see that these two are equivalent:

from package_A import module_file_B as namespace_C
import package_A.module_file_B as namespace_C

because with as statement, whatever node you have selected is accessed through the output namespace you have specified, so whether you choose to strip the path name structure in the extracted output (i.e. use from statement) is irrelevant since you are not using the package and module names in the root namespace anymore.

The behavior of from import as is very similar to the choices you have to make extracting a zip file with nested folder structures, except that you have to make a mental substitution that a .py file is analogous to a subfolder while the objects described in the .py file is analogous to files in the said subfolder. Aargh!


Windows 10 Python Smart Aleck

Windows 10 comes with a default alias that if you type python anywhere in terminal, powershell, run, etc, It will run a stub that points you to getting it in Windows Store. WTF man! I hate these stubs that are nothing but advertising! People will know there’s Python available in the store if Python Software Foundation’s website announces it. There’s no need to hijack the namespace with a useless stub!

After I install Spyder 5.3.0, it started with a Windows console instead of a Python Interpreter console, so when I typed Python (Spyder 5.3.0 came with Python 3.8.10 in its subfolder), this damn App store stub came up:

When I tried to force a .exe exceution in Powershell, I saw this:

So there’s a way to disable this bugger off!

It’s not the first time Spyder not working as intended out of the box, but Microsoft’s overzealous promotion of their ‘good ideas’ causes grief and agony to people who simply want things done.