MATLAB and Python paths

Posted on May 3, 2019 by admin

EDIT (2025-12-31): I finally bit the bullet and wrote an OS system call version which is faster. Please see the link to the Github page. It’s no longer called qpath but I left the code listing in the blog page to illustrate the thought process.

MATLAB’s path() is the analog to Python’s sys.path.

However, MATLAB’s path is stored as strings while Python’s sys.path is stored as lists (MATLAB’s analog of cells). The tradeoffs between strings vs list/cells (hetereogenous wrappers)

To add paths in MATLAB, use the obviously named function addpath(). Supply the optional -end argument if you don’t want any potential shadowing (i.e. the folder to import has lower priority if there’s an existing function with the same name).

I generally avoid userpath() or the graphical tools because the results are sticky (persists between sessions). The best way is to exclusively manage your paths with startup.m so you always know what you are getting. If you want full certainty, you can start with restoredefaultpath() in MATLAB.

Python’s suggested these as equivalents of MATLAB’s addpath():

sys.path.insert(0, folder_to_add_to_path)  # Add to the front
sys.path.append(folder_to_add_to_path)     # Add to the end

but just like MATLAB’s addpath() which works with strings only (not cellstr), these Python options do not work correctly with Python lists as inputs (typically needed for multiple paths) because lists’ methods in sys.path are as primitive as doing [sys.path, new_stuff]:

This means you’ll end up with list of lists if you supplied Python lists as inputs to the above
(MATLAB will throw an exception if you try to feed it with cellstr instead of polluting your path space with garbage)
This also means it doesn’t check for duplicates! It’ll keep stacking entries!

To address the first problem, we use sys.path.extend() instead. It’s like doing addpath(..., '-end') in MATLAB. If you want it to be inserted at the front (higher priority, shadows existing), you’ll need sys.path = list_of_new_paths + sys.path.

MATLAB is the other way round: if you have a cell (python’s version of list) of paths to enter, you can make a path string by using pathsep:

addpath(strjoin(cellstr_of_paths, pathsep)))

Note that sys.path.extend() expect list, sets or tuples so if you feed it a string, which Python will consider it a list of characters, you will get a bunch of one character paths inserted!

On the other hand, DO NOT TRY to get multipe path entries to work with insert/append by flattening the list of path into one string becaust sys.path is not a string and it’s not smart enough to auto switch depending on data types! In other words, sys.path.append( ';'.join(path_list)) is INCORRECT as it’ll add an invalid path when more than 1 path is in the path_list. Python recognize sys.path as a list, NOT a one long string like MATLAB/Windows path, despite insert() and append() accepts only strings!

MATLAB’s uses string, which is a much more efficient path representation, so many duplicate entries won’t slow down the system much. Path order do affect shadowing (which functions with the same name gets exposed) though.

The second problem (which does NOT happen in MATLAB) is slightly more work. You’ll need to subtract out the existing paths before you add to it so that you won’t drag your system down by casually adding paths as you see fit. One way to do it:

def keep_only_new_set_of_paths(p):
    return set(p)-set(sys.path)

You should organize your programs and libraries in a directory tree structure and use code to crawl the right branch into a path list! Don’t let the lack of built-in support to tempt you to organize files in a mess. Keep the visuals clean as mental gymnastics/overheads can seriously distract you from the real work such as thinking through the requirements and coming up with the right architecture and data structures. If you constantly need to jump a few hoops to do something, do it only once or twice using the proper way (aka, NOT copying-and-pasting boilerplate code), and reuse the infrastructure.

At my previous workplaces, they had dozens and dozens of MATLAB files including all laying flat in one folder. The first thing I did when I join a new team is showing everybody this idiom that recursively adds everything under the folder into MATLAB paths:

addpath(genpath())

Actually the built-in support for recursive directory search sucks for both MATLAB and Python. Most often what we need is just a list of full paths for a path pattern that we search recursively, basically dir/w/s *. None of them has this right out of the box. They both make you go through the comprehensive data structure returned (let it be tuples from os.walk() in Python or dir() in MATLAB) and massage the output to a simple path list.

genpath() itself is slow and ugly. It’s basically a recursive wrapper around dir() that cleans up garbage like '.' and '..'. Instead of getting a newline character, a different row (as a char array) or a different cell (as cellstr), you get semi-colons (;) as pathsep in between. Nonetheless, I still use genpath() because despite I have written faster recursive path tools in my own libraries, I’ll need to load the library first in my startup file, which requires a recursive path tool like genpath(). This bootstraps me out of a chicken-and-egg problem without too much ugly syntax.

Most people will tell you to do a os.walk() and use listcomp to get it in the typical full path form, but I’m not settling for distracting syntax like this. People in the community suggested using glob for a relatively simple alternative to genpath()

Here’s a cleaner way:

def list_subfolders_recursively(p):
    p = p + '/**/' 
    return glob.glob(p, recursive=True);

It’s also worth noting that Python follows Linux’s file search pattern where directory terminates with a filesep (/) while MATLAB’s dir() command follows the OS, which in Windows, it’s *..

Both MATLAB and Python uses ** to mean regardless of levels, but you’ll have to turn on the recursive=True in glob manually. ** is already implied to be recursive in MATLAB’s dir() command.

Considering there’s quite a bit of plumbing associated with weak set of sys.path methods provided in Python, I created a qpath.py next to my startup.py:

''' This is the quick and dirty version to bootstrap startup.py
Should use files.py that issue direct OS calls for speed'''

import sys
import glob

def list_subfolders_recursively(p):
    p = p + '/**/' 
    return glob.glob(p, recursive=True);

def keep_only_new_set_of_paths(p):
    return set(p)-set(sys.path)

def set_of_new_subfolders_recursively(p):
    return keep_only_new_set_of_paths( list_subfolders_recursively(p) )

def add_paths_recursively_bottom(p):
    sys.path.extend(set_of_new_subfolders_recursively(p));

def add_paths_recursively_top(p):
    # operator+() does not take sets
    sys.path = list(set_of_new_subfolders_recursively(p)) + sys.path;

In order to be able to import my qpath module at startup.py before it adds the path, I’ll have put qpath.py in the same folder as startup.py, and request startup.py to add the folder where it lives to the system path (because your current Python working folder might be different from PYTHONSTARTUP) so it recognizes qpath.py.

This is the same technique I came up with for managing localized dependencies in MATLAB: I put the dependencies under the calling function’s folder, and use the path of the .m file for the function as the anchor-path to add paths inside the function. In MATLAB, it’s done this way:

function varargout = f(varargin)
  anchor_path = fileparts( mfilename('fullpath') );
  addpath( genpath(fullfile(anchor_path, 'dependencies')) );
  % Body code goes here

Analogously,

Python has __file__ variable (like the good old preprocessor days in C) in place of mfilename().
MATLAB’s mfilename('fullpath') always gives the absolute path, but Python’s __file__ is absolute if it’s is not in sys.path yet, and relative if it’s already in it.
So to ensure absolute path in Python, apply os.path.realpath(__file__). Actually this is a difficult feature to implement in MATLAB. It’s solved by a MATLAB FEX entry called GetFullPath().
Python os.path.dirname is the direct equivalent of fileparts() if you just take the first output argument. os.path.basename is the 2nd output argument.

and in my startup.py (must be in the same folder as qpath.py):

import os
import sys

sys.path.append(os.path.dirname(os.path.realpath(__file__)))

import qpath

user_library_path = 'D:/Python/Libraries';
qpath.add_paths_recursively_bottom(user_library_path)

This way I can make sure all the paths are deterministic and none of the depends on where I start Python.

Getting pyinstaller 3.4 to work with Python 3.7

Posted on April 30, 2019 by admin

Python is an excellent language, but given that it’s free, it also comes with a lot of conspicuous loose-ends that you will not expect in commercially supported platforms like MATLAB.

Don’t expect everything to work right out of the box in Python. Everything is like 98% there, with the last 2% frustrate the heck out of you when you are rushing to get from point A to point B and you have to iron out a few dozen kinks before you can really start working.

When I tried use pyinstaller (v3.4) to compile my Python (v3.7) program into an executable, I ended up having to jump through a bunch of hoops:

pip install pyinstaller gives:

ModuleNotFoundError: No module named 'cffi'

Then I looked up and installed cffi
```
pip install cffi
```
After the dependency was addressed manually (it shouldn’t ) pip install pyinstaller worked

Then I tried to compile my first Python executable with pyinstaller, and I got this exception:

File "C:\Python37\lib\site-packages\win32ctypes\core\cffi\_advapi32.py", line 198

        ^
    SyntaxError: invalid syntax

I searched the exact string and learned that pyinstaller (v3.4) is not ready for Python 3.7 yet! How come pip installer didn’t check for it? I opened up the offending file and looked for line 198 and saw this:
```
c_creds.CredentialBlobSize = \

    ffi.sizeof(blob_data) - ffi.sizeof('wchar_t')
```
It’s a freaking line continuation character \ (actually the extraneous CR before CRLF) that rooster-blocked it.
I just deleted the line continuation and merged the two lines, and saved _advapi32.py, then I was able to compile my Python v3.7 code (using pyinstaller 3.4) with no issues.

This is not something you’ll experience as a MATLAB user. The same company, TMW, wrote the MATLAB compiler as well as the rest. The toolbox/packages are released together in one piece so breaking changes that causes failure for the most obvious use case are caught before they get out of the door.

Another example of breaking changes that I ran into: ipdb does not allow you to move cursor backward.

Again, this is the cost associated with free software and access to the latest updates and new features without waiting for April/October (it’s the MATLAB regular release cycle). If hassle and the extra engineering time far exceed licensing MATLAB licensing costs, MATLAB is a better choice, especially if software is just a chore to get your company from point A to point B, and you are willing to pay big bucks to get there quickly and reliably.

Even with free software on the table, your platform choice is always determined by:

how much your time is worth wrestling problems
how much flexibility do you need (for customizing to your needs)
how much you are willing to pay for the licenses and support

In any case, Python community did good work. Please consider sponsoring PyInstaller and PSF if you profit immensely from their work. But if your company already paid for MATLAB and has real MATLAB experts on it*, don’t switch completely to Python just for the sake of chasing the trendy thing. You will loose time. Lots of time! Precious engineering hours! Over really stupid things like the ones above that is totally not your fault! It’s better to dedicate a few days dealing with MATLAB-Python interface than wasting months over the clumsiness of Python if you have a long, complex workflow

Python’s largest value over MATLAB is that there’s a function developed for nearly every less than common situations (like all different variations of path and file management tools) so you don’t need MATLAB File Exchange to fill in the gap, but the downside is that Python, just like the rest of FOSS world (or Tektronix), has absolutely no sense of user experience studies: common operations are tucked under clumsy maneuvers such as making user jump through hoops to run a Python script in the interpreter:import do not execute the script after its cached, execfile was lile eval in MATLAB so they made it hell, or you need to import a library like runpy or runfile to do something basic like this. Why can’t they just add a native root-level function like runfile() and call it a day without waiting for IPhyton to provide something this basic? I know Python was is too eager to protect the namespace, but this comes at the expense of adding a lot of stupid maneuvers for really basic bread-and-butter stuff). Python is productive in the sense that there are many sophisticated features that are readily available that you don’t have to write your own library for the advanced cases, but the edge is like 1% of the use cases which I invest the time when I run into it: it’s not worth making 99% of my workflow miserable!

* Many people those who claimed decades of MATLAB under the belt didn’t really learn the advanced concepts through studying documentation, MATLAB blogs or get training to take it to the next level of super-productivity! You can tell if they have a lot of for-loops in situations that don’t absolutely need it (memory saving, injecting elements to LHS assignment, parallel computing, not loading too many files at the same time, drawing graphics (order matters), big tasks that needs to be resumed if it throws an error) and using cells everywhere when they should have used structs or more modern types (heterogeneous data structures) such as table(). Most of the modern, super-productive MATLAB language constructs happens since 2013, so I can safely say that MATLAB experiences accumulated before that aren’t too valuable for developing business logic part of the code (might be useful if you are writing tools and lower level MATLAB code like those in TMW).

MATLAB Techniques: variadic (variable input and output) arguments

Posted on December 5, 2017 by admin

I’ve seen a lot of ugly implementations from people trying to deal with variable number of input and output arguments. The most horrendous one I’ve seen so far came from MIT’s Physionet’s WFDB MATLAB Toolbox. Here’s a snippet showing how wfdbdesc.m handles variable input and output arguments:

function varargout=wfdbdesc(varargin)
% [siginfo,Fs,sigClass]=wfdbdesc(recordName)
...
%Set default pararamter values
inputs={'recordName'};
outputs={'siginfo','Fs','sigClass'};

for n=1:nargin
    if(~isempty(varargin{n}))
        eval([inputs{n} '=varargin{n};'])
    end
end
...
if(nargout>2)
    %Get signal class
    sigClass=getSignalClass(siginfo,config);
end
for n=1:nargout
    eval(['varargout{n}=' outputs{n} ';'])
end

The code itself reeks a very ‘smart’ beginner who didn’t RTFM. The code is so smart (shows some serious thoughts):

Knows to use nargout to control varargout to avoid the side effects when no output is requested
[Cargo cult practice]: (unnecessarily) track your variable names
[Cargo cult practice]: using varargin so it can be symmetric to varargout (also handled similarly). varargout might have a benefit mentioned above, but there is absolutely no benefit to use varargin over direct variable names when you are not forwarding or use inputParser().
[Cargo cult practice]: tries to be efficient to skip processing empty inputs. Judicially non-symmetric this time (not done to output variables)!

but yet so dumb (hell of unwise, absolutely no good reason for almost every ‘thoughtful’ act put in) at the same time. Definitely MIT: Make it Tough!

This code pattern is so wrong in many levels:

Unnecessarily obscuring the names by using varargin/varargout
Managing a list of variable names manually.
Loop through each item of varargin and varargout cells unnecessarily
Use eval() just to do simple cell assignments! Makes me cringe!

Actually, eval() is not even needed to achieve all the remaining evils above. Could have used S_in = cell2struct(varargin) and varargout=struct2cell(S_out) instead if one really wants to control the list of variable names manually!

The hurtful sins above came from not knowing a few common cell packing/unpacking idioms when dealing with varargin and varargout, which are cells by definition. Here are the few common use cases:

Passing variable arguments to another function (called perfect forwarding in C++): remember C{:} unpacks to comma separated lists!
```
function caller(varargin)
   callee(varargin{:});
```
Limiting the number of outputs to what is actually requested: remember [C{:}] on left hand side (assignment) means the outputs are distributed as components of C that would have been unpacked as comma separated lists, i.e. [C{:}] = f(); means [C{1}, C{2}, C{3}, ...] = f();
```
function varargout = f()
// This will output no arguments when not requested,
// avoiding echoing in command prompt when the call is not terminated by a semicolon
    [varargout{1:nargout}] = eig(rand(3));
```

You can directly modify varargin and varargout by cells without de-referencing them with braces!

function varargout = f(varargin)
// This one is effectively deal()
    varargout = varargin(1:nargout);
end

function varargout = f(C)
// This one unpacks each cell's content to each output arguments
    varargout = C(1:nargout);
end

One good example combining all of the above is to achieve the no-output argument example in #2 yet neatly return the variables in the workspace directly by name.

function [a, b] = f()
// Original way to code: will return a = 4 when "f()" is called without a semicolon
    a = 4;
    b = 2;
end

function varargout = f()
// New way: will not return anything even when "f()" is called without a semicolon
    a = 4;
    b = 2;
    varargout = manage_return_arguments(nargout, a, b);
end

function C = manage_return_arguments(nargs, varargin)
    C = varargin(1:nargs);
end

I could have skipped nargs in manage_return_arguments() and use evalin(), but this will make the code nastily non-transparent. As a bonus, nargs can be fed with min(nargout, 3) instead of nargout for extra flexibility.

With the technique above, wfdbdesc.m can be simply rewritten as:

function varargout = wfdbdesc(recordName)
% varargout: siginfo, Fs, sigClass
...
varargout = manage_return_arguments(nargout, siginfo, Fs, sigClass);

Unless you are forwarding variable arguments (with technique#1 mentioned above), input arguments can be (and should be) named explicitly. Using varargin would not help you avoid padding the unused input arguments anyway, so there is absolutely no good reason to manage input variables with a flexible list. MATLAB already knows to skip unused arguments at the end as long as the code doesn’t need it. Use exist('someVariable', 'var') instead.

MATLAB Practices: Code and variable transparency. eval() is one letter away from evil() Transparency means that all references to variables must be visible in the text of the code

Posted on December 5, 2017 by admin

The general wisdom about eval() is: do not use it! At least not until you are really out of reasonable options after consulting more than 3 experts on the newsgroups, forums and support@mathworks.com (if your SMS is current)!

Abusing eval() turns it into evil()!

The elves running inside MATLAB needs to be able to track your variables to reason through your code because:

it helps your code run much faster (eval() cannot be pre-compiled)
able to use parallel computing toolbox (it has to know absolutely for sure about any shared writes)
mlint can warn you about potentially pitfalls through code smell.
it keeps you sane while debugging!

This is called ‘transparency’: MATLAB has to see what you are doing every step of the way. According to MATLAB’s parallel computing toolbox’s documentation,

Transparency means that all reference to variables must be visible in the text of the code

which I used as a subtitle of this post.

The 3 major built-in functions that breaks transparency are:

eval(), evalc(): arbitrary code execution resulting in read and write access to its caller workspace.
assignin(): it poofs variables in its caller’s workspace!
evalin(): it breaks open the stack and read the variables in its caller’s workspace!

They should have been replaced by skillful use of dynamic field names, advanced uses of left assignment techniques, and freely passing variables as input arguments (remember MATLAB uses copy-on-write: nothing is copied if you just read it).

There are other frequently used native MATLAB functions (under certain usages) that breaks transparency:

load(): poof variables from the file to the workspace. The best practice is to load the file as a struct, like S=load('file.mat'); , which is fully transparent. Organizing variables into structs actually reduces mental clutter (namespace pollution)!

save(), who(), whos(): basically anything that takes variable names as input and act on the requested variable violates transparency because it’s has the effect of evalin(). I guess the save() function chose to use variable names instead of the actual variables as input because copy-on-write wasn’t available in early days of MATLAB. A example workaround would be:

function save_transparent(filename, varargin)
    VN = arrayfun(@inputname, (2:nargin)', 'UniformOutput', false);     
    if( any( cellfun(@isempty, VN) ) )
        error('All variables to save must be named. Cannot be temporaries');
    end
    
    S = cell2struct(varargin(:), VN, 1);
    save(filename, '-struct', 'S');
end

function save_struct_transparent(filename, S)
    save(filename, '-struct', 'S');
end

The good practices to avoid non-transparent load/save also reduces namespace pollution. For example, inputname() peeks into the caller to see what the variable names are, which should not be used lightly. The example above is one of the few uses that I consider justified. I’ve seen novice abusing inputname() because they were not comfortable with cells and structs yet, making it a total mindfuck to reason through.

Switch between 32-bit and 64-bit user written software like CVX

Posted on December 26, 2016 by admin

CVX is a very convenient convex optimization package that allows the user to specify the optimization objective and constraints directly instead of manually manipulating them (by various transformations) into forms that are accepted by commonly available software like quadprog().

What I want to show today is not CVX, but a technique to handle the many different versions of the same program targeted at each system architecture (32/64-bit, Windows/Mac/Linux). Here’s a snapshot of what’s available with cvx:

OS	32/64	mexext	Download links
Linux	32-bit	mexglx	cvx-glx.zip
Linux	64-bit	mexa64	cvx-a64.zip
Mac	32-bit	mexmaci	cvx-maci.zip
Mac	64-bit	mexmaci64	cvx-maci64.zip
Windows	32-bit	mexw32	cvx-w32.zip
Windows	64-bit	mexw64	cvx-w64.zip

You can download all packages for different architectures, but make a folder for each of them by their mexext() name. For example, 32-bit Windows’ implementation can go under /mexw32/cvx. Then you can programmatically initialize the right package for say, your startup.m file:

run( fullfile(yourLibrary, mexext(), 'cvx', 'cvx_startup.m') );

I intentionally put the /[mexext()] above /cvx, not the other way round because if you have many different software packages and want to include them in the path, you can do it in one shot without filtering for the platform names:

addpath( genpath( fullfile(yourLibrary, mexext()) ) );

You can consider using computer(‘arch’) in place of mexext(), but the names are different and you have to name your folders accordingly. For CVX, it happens to go by mexext(), so I naturally used mexext() instead.

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Rambling Nerd with a Plan

Hoi Wong's blog

Category Archives: MATLAB

MATLAB and Python paths

Getting pyinstaller 3.4 to work with Python 3.7

MATLAB Techniques: variadic (variable input and output) arguments

MATLAB Practices: Code and variable transparency. eval() is one letter away from evil() Transparency means that all references to variables must be visible in the text of the code

Switch between 32-bit and 64-bit user written software like CVX