One thing I feel a little bit not quite as intuitive when I switch to Python is I constantly have to look up whether the method directly updates the contents or it’ll return a different object (of the same type) that I’ll have to overwrite the input variable myself.
An example would be strings and bytes object. replace() sounded like an updating method, but it’s actually a ‘const’ method (a term borrowed from C++ to say that the method does not have side-effects) that does not change the state of the object.
I initially thought this has to do with whether the object is immutable or not, but I tried it on bytearray objects (which is mutable), replace() behaves consistently with the identically named methods in other immutable objects (bytes object, string object): you’ll need to assign the output to self (basically bind the name to the temporary and throw away the original).
bts = b'test'
bts.replace('es', 'oas') # dumps the output to workspace (can be accessed by _) and do nothing else
bts = bts.replace('es', 'oas') # actually updates bts
Lambdas in Python does not play by the same rules as anonymous functions in MATLAB
MATLAB takes a snapshot of (capture) the workspace variables involved in the anonymous function AT the time the anonymous function handle is created, thus the captured values will live on afterwards (by definition a proper closure).
Lambda in Python is NOT closure! [EDIT: I’ll need to investigate the definition of closure more closely before I use the term here] The free variables involved in lambda expressions are simply read on-the-fly (aka, the last state) when the functor is executed.
It’s kind of a mixed love-and-hate situation for both. Either design choice will be confusing for some use cases. I was at first thrown off by MATLAB’s anonymous function’s full variable capture behavior, then after I get used to it, Python’s Lambda’s non-closure tripped me. Even in the official FAQ, it address the surprise that people are not getting what they expected creating lambdas in a for-loop.
To enable capture in Python, you assign the value you wanted to capture to a lambda input argument (aka, using a bound variable as an intermediary and initialize it with the free variable that needs to be captured), then use the intermediary in the expression. For example:
lambda: ser.close() # does not capture 'ser'
lambda s=ser: s.close() # 'ser' is captured by s.
I usually keep the usage of nested functions to the minimum, even in MATLAB, because effectively it’s kind of a compromised ‘global’ between nested levels, or a little bit like protected classes in C++. It breaks encapsulation (intentionally) for functions in your inner circle (nest).
It’s often useful for coding up GUI in MATLAB quick because you need to share access to the UI controls within the same group. For GUI that gets more complicated, I actually avoided nested functions altogether and used *appdata() to share UI object handles.
Functors of nested functions are closures in both MATLAB and Python! Only Lambdas in Python behave slightly differently.
The startup script is simply startup.m in whatever folder MATLAB start with.
Now how about Python? For plain Python (anything that you launch in command line, NOT Spyder though), you’ll need to ADD a new environment variable PYTHONSTARTUP to point to your startup script (same drill for Windows and Linux).
For Spyder, it’s Tools>Preferences>IPython console>Startup>”Run a file”:
but you don’t need that if you already have new environment variable PYTHONSTARTUP correctly setup.
To add paths in MATLAB, use the obviously named function addpath(). Supply the optional -end argument if you don’t want any potential shadowing (i.e. the folder to import has lower priority if there’s an existing function with the same name).
I generally avoid userpath() or the graphical tools because the results are sticky (persists between sessions). The best way is to exclusively manage your paths with startup.m so you always know what you are getting. If you want full certainty, you can start with restoredefaultpath() in MATLAB.
Python’s suggested these as equivalents of MATLAB’s addpath():
but just like MATLAB’s addpath() which works with strings only (not cellstr), these Python options do not work correctly with Python lists because the methods in sys.path are as primitive as doing [sys.path, new_stuff]:
This means you’ll end up with list of lists if you supplied Python lists as inputs to the above (MATLAB will throw an exception if you try to feed it with cellstr instead of polluting your path space with garbage)
This also means it doesn’t check for duplicates! It’ll keep stacking entries!
To address the first problem, we use sys.path.extend() instead. It’s like doing addpath(..., '-end') in MATLAB. If you want it to be inserted at the front (higher priority, shadows existing), you’ll need sys.path = list_of_new_paths + sys.path. For MATLAB, you can make a path string like DOS by using pathsep:
addpath(strjoin(cellstr_of_paths, pathsep)))
Note that sys.path.extend() expect iterables so if you feed it a string, which Python will consider it a list of characters, you will get a bunch of one character paths inserted!
On the other hand, DO NOT TRY to get around it in Python with the same trick like MATLAB by doing sys.path.append( ';'.join(path_list)). Python recognize sys.path as a list, NOT a one long string like MATLAB/Windows path, despite insert() and append() accepts only strings!
Aargh!
The second problem (which does NOT happen in MATLAB) is slightly more work. You’ll need to subtract out the existing paths before you add to it so that you won’t drag your system down by casually adding paths as you see fit. One way to do it:
You should organize your programs and libraries in a directory tree structure and use code to crawl the right branch into a path list! Don’t let the lack of built-in support to tempt you to organize files in a mess. Keep the visuals clean as mental gymnastics/overheads can seriously distract you from the real work such as thinking through the requirements and coming up with the right architecture and data structures. If you constantly need to jump a few hoops to do something, do it only once or twice using the proper way (aka, NOT copying-and-pasting boilerplate code), and reuse the infrastructure.
At my previous workplaces, they had dozens and dozens of MATLAB files including all laying flat in one folder. The first thing I did when I join a new team is showing everybody this idiom that recursively adds everything under the folder into MATLAB paths:
addpath(genpath())
Actually the built-in support for recursive directory search sucks for both MATLAB and Python. Most often what we need is just a list of full paths for a path pattern that we search recursively, basically dir/w/s *. None of them has this right out of the box. They both make you go through the comprehensive data structure returned (let it be tuples from os.walk() in Python or dir() in MATLAB) and so some manipulations to get to this form.
genpath() itself is slow and ugly. It’s basically a recursive wrapper around dir() that cleans up garbage like '.' and '..'. Instead of getting a newline character, a different row (as a char array) or a different cell (as cellstr), you get semi-colons (;) as pathsep in between. Nonetheless, I still use it because despite I have recursive path tools in my own libraries, I’ll need to load the library first in my startup file, which requires a recursive path tool like genpath(). This bootstraps me out of a chicken-and-egg problem without too much ugly syntax.
Most people will tell you to do a os.walk() and use listcomp to get it in the typical full path form, but I’m not settling for distracting syntax like this. People in the community suggested using glob for a relatively simple alternative to genpath()
Here’s a cleaner way:
def list_subfolders_recursively(p):
p = p + '/**/'
return glob.glob(p, recursive=True);
It’s also worth noting that Python follows Linux’s file search pattern where directory terminates with a filesep (/) while MATLAB’s dir() command follows the OS, which in Windows, it’s *..
Both MATLAB and Python uses ** to mean regardless of levels, but you’ll have to turn on the recursive=True in glob manually. ** is already implied to be recursive in MATLAB’s dir() command.
Considering there’s quite a bit of plumbing associated with weak set of sys.path methods provided in Python, I created a qpath.py next to my startup.py:
''' This is the quick and dirty version to bootstrap startup.py
Should use files.py that issue direct OS calls for speed'''
import sys
import glob
def list_subfolders_recursively(p):
p = p + '/**/'
return glob.glob(p, recursive=True);
def keep_only_new_set_of_paths(p):
return set(p)-set(sys.path)
def set_of_new_subfolders_recursively(p):
return keep_only_new_set_of_paths( list_subfolders_recursively(p) )
def add_paths_recursively_bottom(p):
sys.path.extend(set_of_new_subfolders_recursively(p));
def add_paths_recursively_top(p):
# operator+() does not take sets
sys.path = list(set_of_new_subfolders_recursively(p)) + sys.path;
In order to be able to import my qpath module at startup.py before it adds the path, I’ll have put qpath.py in the same folder as startup.py, and request startup.py to add the folder where it lives to the system path (because your current Python working folder might be different from PYTHONSTARTUP) so it recognizes qpath.py.
This is the same technique I came up with for managing localized dependencies in MATLAB: I put the dependencies under the calling function’s folder, and use the path of the .m file for the function as the anchor-path to add paths inside the function. In MATLAB, it’s done this way:
function varargout = f(varargin)
anchor_path = fileparts( mfilename('fullpath') );
addpath( genpath(fullfile(anchor_path, 'dependencies')) );
% Body code goes here
Analogously,
Python has __file__ variable (like the good old preprocessor days in C) in place of mfilename().
MATLAB’s mfilename('fullpath') always gives the absolute path, but Python’s __file__ is absolute if it’s is not in sys.path yet, and relative if it’s already in it.
So to ensure absolute path in Python, apply os.path.realpath(__file__). Actually this is a difficult feature to implement in MATLAB. It’s solved by a MATLAB FEX entry called GetFullPath().
Python os.path.dirname is the direct equivalent of fileparts() if you just take the first argument.
and in my startup.py (must be in the same folder as pathtools.py):
This way I can make sure all the paths are deterministic and none of the depends on where I start Python.
Now I feel like Python is as mature as Octave. It’s usable, but it’s missing a lot of thoughtful features compared to MATLAB. Python’s entire ecosystem like at least 10 years behind MATLAB in terms of user friendliness. However, Python made it up with some pretty advanced language features that MATLAB doesn’t have, but nonetheless, we are still stuck with quite a bit of boilerplate code in Python, which decreases the expressiveness of the language (I’m a proponent of self-documenting code: variable and function names and their organization should be carefully designed to tell the story; comments are reserved for non-obvious tricks)