Watch out for ‘const’ method in Python

One thing I feel a little bit not quite as intuitive when I switch to Python is I constantly have to look up whether the method directly updates the contents or it’ll return a different object (of the same type) that I’ll have to overwrite the input variable myself.

An example would be strings and bytes object. replace() sounded like an updating method, but it’s actually a ‘const’ method (a term borrowed from C++ to say that the method does not have side-effects) that does not change the state of the object.

I initially thought this has to do with whether the object is immutable or not, but I tried it on bytearray objects (which is mutable), replace() behaves consistently with the identically named methods in other immutable objects (bytes object, string object): you’ll need to assign the output to self (basically bind the name to the temporary and throw away the original).

bts = b'test'
bts.replace('es', 'oas')       # dumps the output to workspace (can be accessed by _) and do nothing else
bts = bts.replace('es', 'oas') # actually updates bts

 

Loading

Python startup management

The startup script is simply startup.m in whatever folder MATLAB start with.

Now how about Python? For plain Python (anything that you launch in command line, NOT Spyder though), you’ll need to ADD a new environment variable PYTHONSTARTUP to point to your startup script (same drill for Windows and Linux).

For Spyder, it’s Tools>Preferences>IPython console>Startup>”Run a file”:

but you don’t need that if you already have new environment variable PYTHONSTARTUP correctly setup.

 

Loading

MATLAB and Python paths

MATLAB’s path() is equal to Python’s sys.path().


To add paths in MATLAB, use the obviously named function addpath(). Supply the optional -end argument if you don’t want any potential shadowing (i.e. the folder to import has lower priority if there’s an existing function with the same name).

I generally avoid userpath() or the graphical tools because the results are sticky (persists between sessions). The best way is to exclusively manage your paths with startup.m so you always know what you are getting. If you want full certainty, you can start with restoredefaultpath() in MATLAB.


Python’s suggested these as equivalents of MATLAB’s addpath():

sys.path.insert(0, folder_to_add_to_path)
sys.path.append(folder_to_add_to_path)

but just like MATLAB’s addpath() which works with strings only (not cellstr), these Python options do not work correctly  with Python lists because the methods in sys.path are as primitive as doing [sys.path, new_stuff]:

  1. This means you’ll end up with list of lists if you supplied Python lists as inputs to the above
    (MATLAB will throw an exception if you try to feed it with cellstr instead of polluting your path space with garbage)
  2. This also means it doesn’t check for duplicates! It’ll keep stacking entries!

To address the first problem, we use sys.path.extend() instead. It’s like doing addpath(..., '-end') in MATLAB. If you want it to be inserted at the front (higher priority, shadows existing), you’ll need sys.path = list_of_new_paths + sys.path. For MATLAB, you can make a path string like DOS by using pathsep:

addpath(strjoin(cellstr_of_paths, pathsep)))

Note that  sys.path.extend() expect iterables so if you feed it a string, which Python will consider it a list of characters, you will get a bunch of one character paths inserted!

On the other hand, DO NOT TRY to get around it in Python with the same trick like MATLAB by doing sys.path.append( ';'.join(path_list)). Python recognize sys.path as a list, NOT a one long string like MATLAB/Windows path, despite insert() and append() accepts only strings!

Aargh!

The second problem (which does NOT happen in MATLAB) is slightly more work. You’ll need to subtract out the existing paths before you add to it so that you won’t drag your system down by casually adding paths as you see fit. One way to do it:

def keep_only_new_set_of_paths(p):
    return set(p)-set(sys.path)

You should organize your programs and libraries in a directory tree structure and use code to crawl the right branch into a path list! Don’t let the lack of built-in support to tempt you to organize files in a mess. Keep the visuals clean as mental gymnastics/overheads can seriously distract you from the real work such as thinking through the requirements and coming up with the right architecture and data structures. If you constantly need to jump a few hoops to do something, do it only once or twice using the proper way (aka, NOT copying-and-pasting boilerplate code), and reuse the infrastructure.

At my previous workplaces, they had dozens and dozens of MATLAB files including all laying flat in one folder. The first thing I did when I join a new team is showing everybody this idiom that recursively adds everything under the folder into MATLAB paths:

addpath(genpath())

Actually the built-in support for recursive directory search sucks for both MATLAB and Python.  Most often what we need is just a list of full paths for a path pattern that we search recursively, basically dir/w/s *. None of them has this right out of the box. They both make you go through the comprehensive data structure returned (let it be tuples from os.walk() in Python or dir() in MATLAB) and so some manipulations to get to this form.

genpath() itself is slow and ugly. It’s basically a recursive wrapper around dir() that cleans up garbage like '.' and '..'.  Instead of getting a newline character, a different row (as a char array) or a different cell (as cellstr), you get semi-colons (;) as pathsep in between. Nonetheless, I still use it because despite I have recursive path tools in my own libraries, I’ll need to load the library first in my startup file, which requires a recursive path tool like genpath(). This bootstraps me out of a chicken-and-egg problem without too much ugly syntax.


Most people will tell you to do a os.walk() and use listcomp to get it in the typical full path form, but I’m not settling for distracting syntax like this. People in the community suggested using glob for a relatively simple alternative to genpath()

Here’s a cleaner way:

def list_subfolders_recursively(p):
    p = p + '/**/' 
    return glob.glob(p, recursive=True);

It’s also worth noting that Python follows Linux’s file search pattern where directory terminates with a filesep (/) while MATLAB’s dir() command follows the OS, which in Windows, it’s *..

Both MATLAB and Python uses ** to mean regardless of levels, but you’ll have to turn on the recursive=True in glob manually. ** is already implied to be recursive in MATLAB’s dir() command.


Considering there’s quite a bit of plumbing associated with weak set of sys.path methods provided in Python, I created a qpath.py next to my startup.py:

''' This is the quick and dirty version to bootstrap startup.py
Should use files.py that issue direct OS calls for speed'''

import sys
import glob

def list_subfolders_recursively(p):
    p = p + '/**/' 
    return glob.glob(p, recursive=True);

def keep_only_new_set_of_paths(p):
    return set(p)-set(sys.path)

def set_of_new_subfolders_recursively(p):
    return keep_only_new_set_of_paths( list_subfolders_recursively(p) )

def add_paths_recursively_bottom(p):
    sys.path.extend(set_of_new_subfolders_recursively(p));

def add_paths_recursively_top(p):
    # operator+() does not take sets
    sys.path = list(set_of_new_subfolders_recursively(p)) + sys.path;

In order to be able to import my qpath module at startup.py before it adds the path, I’ll have put qpath.py in the same folder as startup.py, and request startup.py to add the folder where it lives to the system path (because your current Python working folder might be different from PYTHONSTARTUP) so it recognizes qpath.py.

This is the same technique I came up with for managing localized dependencies in MATLAB: I put the dependencies under the calling function’s folder, and use the path of the .m file for the function as the anchor-path to add paths inside the function. In MATLAB, it’s done this way:

function varargout = f(varargin)
  anchor_path = fileparts( mfilename('fullpath') );
  addpath( genpath(fullfile(anchor_path, 'dependencies')) );
  % Body code goes here

Analogously,

  • Python has __file__ variable (like the good old preprocessor days in C) in place of mfilename().
  • MATLAB’s  mfilename('fullpath') always gives the absolute path, but Python’s  __file__ is absolute if it’s is not in sys.path yet, and relative if it’s already in it.
  • So to ensure absolute path in Python, apply os.path.realpath(__file__). Actually this is a difficult feature to implement in MATLAB. It’s solved by a MATLAB FEX entry called GetFullPath().
  • Python os.path.dirname is the direct equivalent of fileparts() if you just take the first argument.

and in my startup.py (must be in the same folder as pathtools.py):

import os
import sys

sys.path.append(os.path.dirname(os.path.realpath(__file__)))

import pathtool

user_library_path = 'D:/Python/Libraries';
pathtool.add_paths_recursively_bottom(user_library_path)

This way I can make sure all the paths are deterministic and none of the depends on where I start Python.


Now I feel like Python is as mature as Octave. It’s usable, but it’s missing a lot of thoughtful features compared to MATLAB. Python’s entire ecosystem like at least 10 years behind MATLAB in terms of user friendliness. However, Python made it up with some pretty advanced language features that MATLAB doesn’t have, but nonetheless, we are still stuck with quite a bit of boilerplate code in Python, which decreases the expressiveness of the language (I’m a proponent of self-documenting code: variable and function names and their organization should be carefully designed to tell the story; comments are reserved for non-obvious tricks)

Loading

Getting pyinstaller 3.4 to work with Python 3.7

Python is an excellent language, but given that it’s free, it also comes with a lot of conspicuous loose-ends that you will not expect in commercially supported platforms like MATLAB.

Don’t expect everything to work right out of the box in Python. Everything is like 98% there, with the last 2% frustrate the heck out of you when you are rushing to get from point A to point B and you have to iron out a few dozen kinks before you can really start working.

When I tried use pyinstaller (v3.4) to compile my Python (v3.7) program into an executable, I ended up having to jump through a bunch of hoops:

  • pip install pyinstaller gives:
    ModuleNotFoundError: No module named 'cffi'
  • Then I looked up and installed cffi
    pip install cffi
  • After the dependency was addressed manually (it shouldn’t )  pip install pyinstaller worked
  • Then I tried to compile my first Python executable with pyinstaller, and I got this exception:
    File "C:\Python37\lib\site-packages\win32ctypes\core\cffi\_advapi32.py", line 198
    
            ^
        SyntaxError: invalid syntax
  • I searched the exact string and learned that pyinstaller (v3.4) is not ready for Python 3.7 yet! How come pip installer didn’t check for it? I opened up the offending file and looked for line 198 and saw this:
    c_creds.CredentialBlobSize = \
    
        ffi.sizeof(blob_data) - ffi.sizeof('wchar_t')

    It’s a freaking line continuation character \ (actually the extraneous CR before CRLF) that rooster-blocked it.

  • I just deleted the line continuation and merged the two lines, and saved _advapi32.py, then I was able to compile my Python v3.7 code (using pyinstaller 3.4) with no issues.

This is not something you’ll experience as a MATLAB user. The same company, TMW, wrote the MATLAB compiler as well as the rest. The toolbox/packages are released together in one piece so breaking changes that causes failure for the most obvious use case are caught before they get out of the door.

Another example of breaking changes that I ran into: ipdb does not allow you to move cursor backward.

Again, this is the cost associated with free software and access to the latest updates and new features without waiting for April/October (it’s the MATLAB regular release cycle). If hassle and the extra engineering time far exceed licensing MATLAB licensing costs, MATLAB is a better choice, especially if software is just a chore to get your company from point A to point B, and you are willing to pay big bucks to get there quickly and reliably.

Even with free software on the table, your platform choice is always determined by:

  • how much your time is worth wrestling problems
  • how much flexibility do you need (for customizing to your needs)
  • how much you are willing to pay for the licenses and support

In any case, Python community did good work. Please consider sponsoring PyInstaller and PSF if you profit immensely from their work. But if your company already paid for MATLAB and has real MATLAB experts on it*, don’t switch to Python just for the sake of chasing the trendy thing. You will loose time. Lots of time! Precious engineering hours! Over really stupid things like the ones above that is totally not your fault!  It’s better to dedicate a few days dealing with MATLAB-Python interface than wasting months over the clumsiness of Python if you have a long, complex workflow

Python’s largest value over MATLAB is that there’s a function developed for nearly every less than common situations (like all different variations of path and file management tools) so you don’t need MATLAB File Exchange to fill in the gap, but the downside is that Python, just like the rest of FOSS world (or Tektronix), has absolutely no sense of user experience studies: common operations are tucked under clumsy maneuvers such as making user jump through hoops to run a Python script in the interpreter (import do not execute the script after its cached, execfile was lile eval in MATLAB so they made it hell, or you need to import a freaking library like runpy to do something basic like this. I understand they cannot Why can’t they just add a root-level function like run() and call it a day? Python was just too eager to protect the namespace, and this comes at the expense of adding a lot of stupid maneuvers for really basic bread-and-butter stuff). Python is productive in the sense that there are many sophisticated features that are readily available that you don’t have to write your own library for the advanced cases, but the edge is like 1% of the use cases which I invest the time when I run into it: it’s not worth making 99% of my workflow miserable!

If you are into test instruments, MATLAB is like HP/Agilent/Keysight’s user interface while Python is like Tektronix’s UI where you’ll need to get to 4 levels of context menus to measure a freaking peak-to-peak voltage! Spending your smart people’s time to figure these stupid shit that came from poor UX (user experience) design is a poor investment! At least Python is free and has more advanced features; Tek charges similar to HP and offer nearly the same features yet it has absolutely no respect for the users’ time: they think all people have Stockholm syndrome after going through their steep learning curve for absolutely no benefits.

Why pay a junior engineer $50/hr for a whole week to go through the learning curve, plus months of maintenance work because they’ve used clumsy tools and clumsy methods, when I can code the same thing up in MATLAB in one hour for $500 and give you one page of easy to read code that you likely don’t have to debug through it down the line? If I designed a moderately complex system, most often I can get to the root cause of any bugs in around 15 minutes: the reason is that I spent my time thinking through the abstractions instead of diving into writing for-loops to hack through any common scenarios without first researching for the neatest mechanism. This way I don’t spend an afternoon writing for loops extracting a strings in the object properties of a categorical data type (a efficient way of marking complex repeating data by indexing unique items) and check if it’s really correct when I could have done some research and realize I can just cast it to a cell/monad/array of strings. The value of my experiences comes in knowing the right words and common lingo in many forms to describe many problems, like spotting a low level database operation implementation is just a RLE (run length encoding) in disguise. The hardest part of using search engines to find what you want quickly is knowing the right keywords. If you don’t know the concepts, you’d have jump into reinventing the wheel poorly when you could have used off-the-shelf proven code had you been able to frame the problem succinctly.

* Don’t be mislead by how many years of MATLAB one has! Majority of people those who claimed decades of MATLAB under the belt didn’t really learn the advanced concepts through studying documentation, MATLAB blogs or get training to take it to the next level of super-productivity! You can tell if they have a lot of for-loops for situations that doesn’t absolutely need it (memory saving, injecting elements to LHS assignment, parallel computing, not loading too many files at the same time, drawing graphics (order matters), big tasks that needs to be resumed if it throws an error) and using cells everywhere when they should have used structs or more modern types such as table(). Most of the modern, super-productive MATLAB language constructs happens since 2013, so I can safely say that MATLAB experiences accumulated before that aren’t too valuable for developing business logic part of the code (might be useful if you are writing tools and lower level MATLAB code like those in TMW).

Loading

Picking an IDE for Python

The native features in MATLAB are often very good most of the time, as I’ve yet to hear anybody spending time to shop for a IDE outside the official one.

Atom has the feel of Maple/MathCAD, and Jupyter Notebook has the feel of Mathematica. Spyder feels like MATLAB the most, but it’s hugely primitive.

IDLE is more miserable than a command prompt. It doesn’t even have the decency to recall command history with up arrow. It’s like freaking DOS before loading doskey.com. Not to mention that single clicking on the window won’t set the cursor to the active command line, which you have to scroll all the way down to click on the bottom line. WTF! I’d rather use the command prompt and give up meaningless syntax coloring.

IPython (in Spyder) is unbearably slow (compare to MATLAB’s editor which I consider slow to the extent that it’s marginally bearable for the interactive features it offers), but at least usable unlike IDLE, and most importantly the output display is pprint (pretty printer) formatted so it’s legible. Just type locals() and see what kind of sh*t Python spits out in IDLE/cmd.exe and you’ll see what I meant.

I simply cannot live without who/whos provided in IPython, but I still don’t like it showing the accessible functions/modules along with the variables (I know, Python doesn’t tell them apart). Nonetheless it’s still weak because these are automagics that doesn’t return the results as Python data (just print). Spyder’s ‘variable explorer’ is the only place I can find that doesn’t include loaded functions/modules. Python should have provided facilities to get the user-introduced variables exclusively and leave the modules to a different function like MATLAB’s import command that shows imported packages/classes.

However, pretty printer doesn’t even come close to MATLAB in terms of the amount of dirty work disp() did to format the text to make it easy to read. Keys in the dictionary shown in pretty printer in Python are not right-aligned like MATLAB struct so we can easily tell keys and values apart. For example:

MATLAB struct shows:
          name: 'S'
          size: [9 1]
         bytes: 7765
         class: 'struct'
        global: 0
        sparse: 0
       complex: 0
       nesting: [1×1 struct]
    persistent: 0

Python with Pretty Printer shows:
{'__name__': '__main__',
 '__doc__': 'Automatically created module for IPython interactive environment',
 '__package__': None,
 '__loader__': None,
 '__spec__': None,
 '__builtin__': <module 'builtins' (built-in)>,
 '__builtins__': <module 'builtins' (built-in)>,
 '_ih': ['', 'locals()'],
 '_oh': {},
 '_dh': ['C:\\Users\\Administrator'],
 'In': ['', 'locals()'],
 'Out': {},
 'get_ipython': <bound method InteractiveShell.get_ipython of <ipykernel.zmqshell.ZMQInteractiveShell object at 0x00000000059B7828>>,
 'exit': <IPython.core.autocall.ZMQExitAutocall at 0x5a3b198>,
 'quit': <IPython.core.autocall.ZMQExitAutocall at 0x5a3b198>,
 '_': '',
 '__': '',
 '___': '',
 '_i': '',
 '_ii': '',
 '_iii': '',
 '_i1': 'locals()'}

I often convert things to MATLAB dataset() because the disp() method is excellent, such as struct2dataset(ver()). table/disp() is nice, but I think they overdid it by defaulting to fancy rich-text that bold the header, which makes it a magnitude of orders slower, and it’s not using the limited visual space effectively to show more data. Python still has a lot more to do in the user-friendly department.

Loading