MATLAB features/conveniences not available in Python/C++

A lot of MATLAB’s conveniences over Python and vice versa stem from their design choices of what are first class citizens and what are afterthoughts.

MATLAB has its roots from scientific computing, so operations (use cases) that are natural to scientists comes first. Python is a great general purpose language but ultimately the motivation came from a computer science point of view. It will eventually get clumsy when Python tries to do everything MATLAB does.

Matrix (a stronger form of array) takes the center stage in MATLAB

In MATLAB, matrix is a first-class citizen. Unlike other languages that starts with singleton and containers are built to extend it to arrays (then to matrices), nearly everything is assumed to be a matrix in MATLAB so a singleton simply seen as a 1×1 matrix.

This is a huge reason why I love MATLAB over Python based on the language design.

Lists in Pythons are cell containers in MATLAB, so a list of numbers in Python is not the same as an array of doubles in MATLAB because the contents of [1, 2, 3.14] must the same type in MATLAB.

Non-uniform arrays like cells/lists are much slower because algorithms cannot take advantage of uniform data structure packed very locally (the underlying contents right next to each other) and do not need extra logic to make sure different types are handled correctly.

np.array() is an after thought so the syntax specifying a matrix is clumsy! The syntax is built on lists of lists (composition like arr[r][c] in C/C++). There used to be a way to use MATLAB’s syntax of separating rows of a matrix with a semicolon ‘;’ with np.matrix('') through a string (which clearly is not native and code transparent).

Given that np.matrix is deprecated, this option is out of window. One might think np.array would take similar syntax, but fuck no! If you typed a string of MATLAB style matrix syntax, np.array will treat it as if you entered an arbitrary (Unicode) string, which is a scalar (0 dimension).

For my own use, I extracted the routine numpy.matrix used to parse MATLAB style 2D matrix definition strings as a free function. But the effort wasted to get over these drivels are the hidden costs of using Python over MATLAB.

import ast

''' 
mat2list() below is _convert_from_string() copied right off
https://github.com/numpy/numpy/blob/main/numpy/matrixlib/defmatrix.py
because Numpy decided to phase out np.matrix yet choose not to
transplant this important convenience feature to ndarray
'''

def mat2list(data):
    for char in '[]':
        data = data.replace(char, '')

    rows = data.split(';')
    newdata = []
    for count, row in enumerate(rows):
        trow = row.split(',')
        newrow = []
        for col in trow:
            temp = col.split()
            newrow.extend(map(ast.literal_eval, temp))
        if count == 0:
            Ncols = len(newrow)
        elif len(newrow) != Ncols:
            raise ValueError("Rows not the same size.")
        newdata.append(newrow)
    return newdata

Numpy array is really cornering users to keep track of the dimensions by providing at least 2 pairs of brackets for matrices! No brackets = singleton, 1 pair of brackets [...] = array (1D), 2 pairs/levels of brackets [ [row1], [row2], ... [rowN] ] = matrix (2D). Python earns an expletive from me each time when I type in a matrix!

Slices are not first class citizens in Python

Slices in Python are roughly equivalent to colon operator in MATLAB.

However, in MATLAB, the colon operator is native down to the core so you can create a row matrix of equally spaced numbers without surrounding context. end keyword, as a shortcut to get the length (which happen to be the last index due to 1-based indexing) of the dimension when indexing, obviously do not make sense (and therefore invalid) for colon in free form.

Python on the other hand, uses slice object for indexing. Slice object can be instantiated anywhere (free form) but buidling it from the colon syntax is exclusively handled inside the square brakcet [] acess operator known as the __getitem__ dunder method. Slice objects are simpler than range as it’s not iterable so it’s not useful to generate a list of numbers like colon operator in MATLAB. In other words, Python reserved the colon syntax yet does not have the convenience of generating equally spaced numbers like MATLAB does. Yuck!

Since everything is a matrix (dimension >= 2) in MATLAB, there’s no such thing as 0 dimension (scalar) and 1 dimension (array/vector) as in Numpy/Python.

Transposes in Python makes no sense for 1D-arrays so it’s a nop. A 1D-array is promoted into a row vector when interacting with 2D arrays / matrices), while slices makes no sense with singletons.

Because of this, you don’t get to just say 3:6 in Python and get [3,4,5] (which in MATLAB it’s really {3,4,5} because lists in Python are heterogeneous containers like cells. The 3:5 in MATLAB gives out a genuine matrix like those used in numpy).

You will have to cast range(3,6), which is an iterator, into a list, aka list(range(3,6)) if the function you call with it does not recognize iterators but instead want a generated list stored in memory.

This is one of the big conveniences (compact syntax) that are lost with Python.

More Operator Overloading

Transposes in Numpy is an example where CS people don’t get exposed to scientific computing enough to know which use case is more common:

MATLABNumpyMeaning
a.'a.transpose() or a.Ttranspose of a
a'a.conj().transpose() or a.conj().Tconjugate transpose (Hermitian) of a
https://numpy.org/devdocs/user/numpy-for-matlab-users.html

Complex numbers are often not CS people’s strong suit. Whenever we do a ‘transpose’ with a physical meaning or context attached to it, we almost always mean Hermitian (conjugate transpose)! Most often the matrix is real anyway so many of us got lazy and call simply call it it transpose (a special case), so it’s easy to overlook this if one design/implement do not have a lot of firsthand experience with complex matrices in your math.

MATLAB is not cheap on symbols and overloaded an operator for transposes, with the shorter version being the most frequent use case (Hermitian). In Python you are stuck with calling methods instead of typing these commonly used operators in scientific computing like they are equations.

At least Python can do better by implementing a a.hermitian() and a.H method. But judging that the foresight isn’t there, the community that developed it are likely not the kind of people sophisticated enough in complex numbers to call conjugate transposes Hermitians.

Conventions that are more natural to scientific computing than programming

Slices notation in Python put the step as the last (3rd) parameter, which makes perfect sense from the eyes of a programmer because it’s messy to have the second parameter mean step or end point depending on whether there’s one colon or two. By placing the step parameter consistently as the 3rd argument, the optional case is easier to program.

To people who think in math, it’s more intuitive when you specify a slice/range in the order you draw the dots on a numbered line: you start with a starting point, then you’ll need the step-size to move onto the next point, then you’ll need to know when to stop. This is why it’s start:step:stop in MATLAB.

Python’s slice start:stop_exclusive:step convention reads like “let’s draw a line with a starting point and end points, then we figure out what points to put in between”. It’s usually mildly unpleasant to people who parse what they read on the fly (not buffering until the whole sentence is complete) because a 180 degree turn can appear at the end of a sentence (which happens a lot with Japanese or Reverse-Polish-Notation).

Be careful that the end points in Python’s slice and C++’s STL .end() are exclusive (open), which means the exact endpoint is not included. 0-based index systems (Python an C++) love to specify “one-past-last” instead of the included end points because it happens to align with the total count N: there are N points from [0, N-1] (note N-1 is inclusive, or a closed end) which is equivalent to [0, N), where N is an open end, for integers. This half-open (or open-end) convention avoids painfully typing -1 all over the place in most use cases in a 0-based indexing system.

0-based indexing is convenient when doing modulo (which is more common with programmers) while 1-based indexing matches our intuition of natural numbers (which starts from 1, bite me. lol) so when we count to 5, there are 5 items total. My oscilloscope don’t call the first channel Channel 0 and I work with floats more than I work with modulo, so 1-based indexing has a slight edge for my use cases.

MATLAB autoextends when assigning index out of range, not Python

This is one behavior I really hated Python for it, with passion. Enough for me to keep lean towards MATLAB.

In MATLAB, I simply assign the result x{3} = 4 even when the list x starts with an empty cell x={} and MATLAB will be smart enough to autoextend the list. Python will give you a nasty IndexError: list assignment index out of range.

I pretty much have to preallocate my list with [None] * target_list_size. MATLAB are pretty tight-assed when it comes now allowing syntax/behaviors that allows users to hurt themselves in insidious ways, yet they figured if you expanded a matrix that you didn’t intend to, soon you’ll find out when the dimensions mismatch.

Note that numpy array has the same behavior (refuses to autoexpand array when assigned an index out of the current range).

No consistent interface for concatenation in Python

In MATLAB, if you have a cell of tables C, you can simply vertically concatenate them simply with vertcat(C{:}), because MATLAB has a consistent interface for vertical concatenation, which is what the operator [;] calls.

Note that cell unpack {:} in MATLAB translate to comma separated list, putting a square bracket over commas like as [C{:}] is horzcat(C{:}) because it’s [,].

Python doesn’t have such consistent interface. Lists are concatenated by + operator while Dataframes are concatenated by pd.concat(list_or_tuples_of_dataframes, ...), as + in Dataframes means elementwise application of + (whatever + means to the pair of elements involved).

I just had a simple use case where I have a list containing dataframes corresponding to tests on each channel (index) that I’ll run the experiments on one by one. They don’t need to be run in order nor all of the tests need to be completed before I collect (vertically stack) the dataframes into one big dataframe!

Vertically stacking such a list of Dataframes is nightmare. The developers haphazardly added a check that throws an error for pandas.concat() if everything in the list is None, which throws an error “ValueError: All objects passed were None“!

If I haven’t run any tests yet, attempting to collect an aggregrate table should return None instead of throwing a fucking ValuerError! Checks for attempting to do a nop depending on the source data should be left for users! It’s safer to do less and let users expand on it than nannying and have users painfully undo your unwanted goodwill!

How empties are handled in each data type like cell or table() is a important part for a consistent generic interface to make sure different data type work together (cast or overload automatically) seamlessly. TMW support showed me a very detailed thought process on what to do when the row is empty (length 0) or column is empty (length 0) in our discussion getting into the implementation details or dataset/table (heterogenous data type). I just haven’t seen the thoughtfulness in Python (lists), Numpy (array) or Pandas (dataframe) yet.

Now that with the poorly thought out extra check in pd.concat(), I have to check if the list is all None. I often do not jump to listcomp or maps if there’s a more intuitive way as listcomps/maps are shorthands for writing for-loops instead a expressing specific concept, such as list.count(NaN)==len(list) or set(list)=={'None'}.

Dataframe broke list.count() with TypeError: unhashable type: 'DataFrame' because Dataframe has no __hash__, because it’s mutable (can do in-place operation through reference).

Then dataframe breaks set() with ValueError: The truth value of a DataFrame is ambiguous..., because the meaning of == changed from object comparsion (which returns a simple boolean) to elementwise comparsion (that returns a non-singleton structure with the same shape/frame as the Dataframe itself).

Aargh!

Loading

Concepts in C++ that does not apply to Python or MATLAB

Static Data Members

In Python/MATLAB, data members are called properties. It’s called (data) member in C++. I’ll use these names interchangably when comparing these languages.

Python and MATLAB have the concept static methods, but static properties (data members) doesn’t really exist in either language.

Python’s properties has a split personality (class AND instance), so it’s not like C++ that you choose between class XOR instance. Therefore I call those class variables because in C++, static variables do not have split personality: either you are classwise (static) or not. In C++ (or MATLAB) you don’t have both cases sharing the same variable name so a class variable can be shadowed by an instance variable.

As for MATLAB, there’s no Static property. The only classwise properties allowed is Constant property. TMW (creator of MATLAB) decided not to allow non-Const classwise/static properties because of this web of rules:

  • A.B = C can either mean
    1) [Variable] write a new struct A with field B, or
    2) [Class] attempt to write to property B of class (not instance of) A.
  • If a class (definition) A is loaded into the current workspace, allowing case #2 might throw users who intended to make a struct A with field B out of nowhere off.
  • MATLAB gives variables higher priorities than function/classes, so case #2 has to be struck down to make it unambigious.
  • By making A.B, a classwise access to field B, either read only (Constant) or tied to instances a=A(); a.B=C, MATLAB avoided the situation of A.B=C while A is a class (case#2) so A.B=C is unambiguously writing to a struct field (case#1).

I know. This is quite lame. Matrices are first class citizens in MATLAB while classes are an after-thought that isn’t really a thing until 2008. You win some and you lose some.

The official TMW workaround is to use a Static method getter/setter (not dependent properties because it only works for instances or if what it dependended are exclusively Constant properties) with persistent variable holding the internally data that’s meant to be static. This is very convoluted and it sucks. I’d call it a weakness of MATLAB.

Constant properties in MATLAB are static const (classwise-only)

In C++, const properties (data members) can be either instance bound or classwise (static), which means instances could be initialized with a different set of constants.

Instance-wise constant data members is possible in C++ through (member) initializer lists, which happens to be the only route for private constants as public constants could also be list (including brace) initialized (in newer C++ such as C++11).

Constructor is not a first-hand initializer in C++ (directly assigning memory with predefined values). Members are fully constructed in C++ (just not necessarily with the values you wanted) by the time you get inside the constructor function. Therefore, in C++, constants and references (things that cannot be changed) for instances must be done in (member) initializer lists right before the constructor.

Only static const can be optionally defined directly at the class with = sign since C++11. Before then it only worked for integers as it was enum under the hood with a primative compiler design. Otherwise you forward declare the data member (without specifying the details) in the class C such as static const T x, then define it (assign the value) like T C::x = 42 outside the class definition.

In MATLAB, Constant properties are classwise-only (in fact, it’s the only kind of Static property allowed as discussed in the first section). You simply declare the value at the class definition with = sign just like the fast way to type static const in C++11 because there is no concept of (member) initializer list in MATLAB so you can modify how consts and refs are stamped out, not remodel them in the constructor after they are already made.

Accessing constant properties with an object instance has to be a shortcut for classwise constant in MATLAB, while it depends on whether the const is static in C++.

In Python, everybody is a consenting adult. Scream your constants in ALL CAPS and hope everybody act like a gentleman and not touch them. Lol

Static Native Getters/Setters

C++ does not have native getter or setter using a data member’s name so the said data member do not store states but instead act as a proxy (potentially interacting to other state-holding data members).

Let’s say the variable with a native getter/setter in question is x.

In MATLAB, this feature is called Dependent properties, where you define members under properties (Dependent) and the getter function is named function get.x(self, ...).

In Python, @property decorator mangles your function with the same name as the dependent member, aka def x(self, ...) so the functior can call the corresponding getter/setter without the function call round brackets ().

However, both in MATLAB and Python, dependent data members are mainly aimed for instances (objects created), not class-wise!

In Python, it’s simply impossible to stack @staticmethod or @classmethod with @property. I tried doing that and Python just declared the function name (dependent property name) a property object so when I call it (without round brackets for function calls of course), it merely shows “<property at 0x...>‘.

The case of MATLAB is a lot weirder. It’s a web of rules for that are based on seemingly unrelated reasons which created a catch-22 to mean dependent properties are instance-bound without means to access it classwise:

  1. MATLAB doesn’t have classwise properties that are not Const properties (aka no Static data members/properties) to avoid breaking backward compatibility (see the first section of this post for the web of rules that caused it).
  2. In C++’s lingo, MATLAB has no (mutable) static data member. The only nearest thing is (immutable) static const data member (which is called Constant property in MATLAB).
  3. MATLAB does not have instance-wise constants (see above)
  4. Dependent property in MATLAB is neither classwise or instance-wise, because the concept itself is a method (function member) pretending to be a property (data member). More precisely, it’s an extra level of indirection that dispatches getters or setters depending on whether it’s ‘read’ or ‘written’
  5. MATLAB not treating properties (data member) and methods (function member) the same way (when it comes to whether it’s class-bound or instance-bound) created an dilemma (identity crisis) for Dependent property.
  6. Despite Dependent property is really a method in disguise, its getters/setters not allowed to take on a Static role like other methods because it claimed to be a property (data member) so it’s stuck following the more restrictive rules that applies to properties (aka no static)

Without knowing the underlying rules above, the implications are less than intuitive (makes perfect sense if you know what constraints MATLAB is working under that made it nearly impossible for a sensible use case where the user simply wants to create a shortcut for computing with classwise/Const properties):

  1. MATLAB’s rules requires any property that are not Constant to be accessed exclusively through instances.
  2. A Dependent property is not a Const property, so it’s considered an instance-based member.
  3. This implies Dependent data member can only be called from an instance (regardless of whether you’re trying to do it from within the same class or outside the class).
  4. Since Dependent property is defined to be instance-bound, the instance object (self) is passed to getters/setters as the first argument and the definition of the getters/setters has to expect it so it’d be like function get.x(self, ...), just like all instance-bound methods.
  5. There’s nothing that says such getters/setters must use the self passed to it. However, if you use constant properties, it doesn’t matter where you call from the self (object instance) or the class name since constants are classwise-only in MATLAB to both syntax refers to the same thing.

RAII available in C++ and MATLAB but not Python

Garbage collector is a concept only when you are allowed to make multiple aliases for the same underlying data, and don’t want to meticulously keep track of who’s ultimately responsible for winding it down and when.

In particular, garbage collector has the lattitude to not clean the object right away when the reference count goes down to zero. shared_ptr (an automatic memory management technique) promptly cleans the object when the reference count touches zero. Therefore garbage collector can procrastinate painful release (cleanup) process so the program doesn’t have to frequently stumble to do the cleanup.

C++ choose not to use garbage collectors (as class destructors running at a non-deterministic time breaks RAII). Python choose to embrace it at the expense of breaking RAII. MATLAB is kind of in between yet the way MATLAB does it does not break RAII, but it’s not obvious.

MATLAB language has a very unique design choice (mental model) that users see/reason variables as deep copies of stack objects, so there are no concept of aliases (let along reassignable aliases) in the first place to need garbage collection in user-servicable mental model for memory management.

There are people talking about JVM’s garbage collection in MATLAB, but that garbage collector only handle Java objects (which I almost never use unless it’s through Yair Altman’s code). Anything else is handled by MATLAB’s engine.

Since MATLAB’s engine manages how the underlying data are shared, along with allocating and freeing, some people argue that it’s garbage collection.

The popular mental picture of garbage collection is reference counting WITHOUT tracking the real owner (often the first creator): when the last guy drops the link (reference) to the underlying data, the data becomes orphaned and is ready to be garbage collected.

For conventional garbage collectors, timing of object destruction (when the destructor is called) is not deterministic because the object does not die with its original creator. The ‘ownership’ is effectively transferred to the last user of the underlying data.

TMW specifically said they are not garbage collecting (at least for classes since the article is for classes), so whatever they are doing under the hood is not exactly garbage collection in the conventional sense (https://www.mathworks.com/company/technical-articles/inside-matlab-objects-in-r2008a.html)

Given MATLAB is using copy-on-write, and Loren dropped a clue that the copy-on-write won’t be triggered for the entire struct when only one field is changed (only the changed field is copied):

This would imply that if MATLAB does anything close to garbage collecting or managing allocation and deallocation, it can only be done on whatever pieces (my guess would be PODs, simple native data types) that doesn’t involve a user-defined destructor.

Decoupling automatic memory management from classes has the advantage of keeping RAII because user-defined destructors are called deterministically. It’s only after the destructors are unwounded down to the simple data types (some classes contain other complex objects, so their destructors are chain-called deterministically) with no more user-defined destructors attached to it (down to the leaves of a tree), the automatic memory management mechanism can decide how long it’d keep these simple chunks (if somebody else is still using it, an extra copy doesn’t need to be made).

In Python, everything is perceived as (class) objects, even the PODs. MATLAB and C++ distinguishes between PODs and user-defined classes. This means if Python choose to do garbage collection, it has to do it to classes with user-defined destructor as well, thus breaking RAII.

Loading

‘Only in Python’ Series (1): ClassMethod

‘Only in Python’ series talks about the constructs that are more specific to Python than other mainstream languages. Of course there are some overlaps with other language, so my choices would be language behavior/characteristics screams Python whenever I see it.


Python also access the class (not instance) by its own name (just like MATLAB) followed by dot operator. In C++, class-wise (not instance-wise) are always accessed through SRO, aka Scope Resolution Operator ::.

There isn’t really a concept for class method despite Python has a @classmethod decorator. The syntactic sugar merely for the use case where a static function happens to access to static members of the same class through its own class’s name.

Static methods and static data members are pretty much glorified namespaces (just like packages and modules in Python) that isn’t tied to any object instances. If a function wants to access the class-wise members (attributes/properties), simply call them by the name of the class instead of object instances of it.

Btw, C++ implicitly passes *this internally and not rely on the function prototype, aka first argument, to pass self around, so you don’t have this problem.

MATLAB does not have native support for static data member but it has static methods. (Btw, in MATLAB, attributes means the access modifiers to methods and properties block, not members of a class like in Python). It’s smart enough to figure out if a method is marked static so it’s exempt from receiving the object itself as the first argument.

Python is a little primative in this department. Calling a method from an object instances always imply passing self as the first argument in the call, whether you want it (instance methods) or not (static/class-wise methods).

If you call a method by the class name, no implict first argument was passed.


To address this mandatory self issue, Python simply use decorators (function wrappers) to wrestle the passed self to fit your purposes.

@staticmethod effectively tosses out the implied first argument (self) passed by method calls from object instances since it’s by definition not needed/wanted.

@classmethod effectively takes the implied first argument (self) and replace it with its class name, aka type(self) as cls so your function prototype expects the first argument. I said effectively because if you call the @classmethod-decorated method by class name, which self is not passed, the deocrator still produces the variable cls to match the first argument in your function prototype.

Here’s an example:

class A:
  a = 3
  def __init__(self):
    self.a = 42

  @classmethod
  def print_a_cls(cls):
    print(A.a)
    print(cls.a)
  # Both obj_a.print_a_cls() and A.print_a_cls() prints 3 twice
  # @classmethod basically swap the passed 'self' argument and replace the first arg
  # with type(self) which is the class's name A here

  @staticmethod
  def print_a_static():
    print(A.a)
  # Both obj_a.print_a_static() and A.print_a_static() prints 3 
  # @staticmethod basically absorbs and discard the first argument 'self' if passed

  def print_a_instance(self):
    print(self.a)

  def print_a_classwise():
    print(A.a)
  # obj_a.print_a_classwise()
  # "TypeError: A.print_a_classwise() takes 0 positional arguments but 1 was given"
  # A.print_a_classwise() prints 3

If you don’t use these decorators, methods without self as the first argument will work only if you call them by the class’s name. If you try to call it through an object instance, it’ll rightfully refuse to run because the call supplies self as an argument which your function isn’t ready to accept.

Loading

How missing keys are handled in Dictionary (Hashtables) in C++, Python and MATLAB

C++

In C++ (STL), the default behavior to touching (especially reading) a missing key (whether it’s map or unordered_map, aka hashtable) is that the new key will be automatically inserted with the default value for the type (say 0 for integers).

I have an example MapWithDefault template wrapper implementation that allows you to endow the map (during construction) with a default value to be returned when the key you’re trying to read does not exist

C++ has the at() method but it involves throwing an exception when the key is not found. However enabling exceptions is a material performance overhead burden in C++.

MATLAB

MATLAB’s hashtables are done with containers.Map() and with more modern MATLAB, dictionary objects (R2020b and on), unless you want to stick to valid MATLAB variable names as keys and exploit dynamic fieldnames.

Neither containers.Map() or dictionary have a default value mechanism when a key is not found. It will just throw an error if the key you are trying to read does not exist. Use iskey()/isKey() method to check if the key exist first and decide to read it or spit out a default value.

Python

Natively Python dictionaries will throw a Key error exception if the requested key in [] operator (aka __getitem__()) do not already exist.

Use .get(key, default) method if you want to assume a default value if the key is not found. The .get() method does not throw an exception: the default is None if not specified.

If you want C++’s default behavior of reading a new key means inserting the said key with a default, you have to explicitly import collections package and use defaultdict. I wouldn’t recommend this as the behavior is not intuitive and likely confusing in insidious ways.

There’s a simiar approach to my MapWithDefault in Python dictionaries: subclass from dict and define your own __missing__ dunder/magic method that returns a default when a key is missing, then use the parent (aka dict)’s constructor to do an explicit (type/class) conversion for existing dict object into your child class object that has __missing__ implemented.

Despite this approach is a little like my MapWithDefault, the __missing__ approach has more flexibility like allowing the default value to be dependent on the query key string, but it comes at the expense of making up one different class, not instance per different default values.

Monkey-patching instance methods is frowned upon in Python. So if you want the default value to tie to instances, the mechanism needs to be redesigned.

Loading

We use ContextManager (“with … as” statement) in Python because Python’s fundamental language design (garbage collecting objects) broke RAII

[TLDR] Python doesn’t have RAII. C++ and MATLAB allows RAII. You can have a proper RAII only if destructor timing is 100% controllable by the programmer.

Python uses Context Manager (with ... as idiom) to address the old issue of opening up a resource handler (say a file or network socket) and automatically close (free) it regardless of whether the program quit abruptly or it gracefully terminates after it’s done with the resource.

Unlike destructors in C++ and MATLAB, which registers what to do (such as closing the resource) when the program quits or right before the resource (object) is gone, Python’s Context Manager is basically rehasing the old try-block idea by creating a rigid framework around it.

It’s not that Python doesn’t know the RAII mechanism (which is much cleaner), but Python’s fundamental language design choices drove itself to a corner so it’s stuck micro-optimizing the try-except/catch-finally approach of managing opened resourecs:

  • Everything is seen as object in Python. Even integers have a ton of methods.
    MATLAB and C++ treats POD, Plain Old Data, such as integers separately from classes
  • Python’s garbage collector controls the timing of when the destructor of any object is called (del merely decrement the reference count).
  • MATLAB’s do not garbage-collect objects so the destructor timing is guaranteed.
  • C++ has no garbage collection so the destructor timing is guaranteed and managed by the programmer.

Python cannot easily exclude garbage collecting classes (which breaks RAII) because fundamentally everything are classes (dictionaries potentially with callables) in Python.

This is one of the reasons why I have a lot of respects for MATLAB for giving a lot of consideration for corner cases (like what ’empty’ means) in their language design decisions. Python has many excellent ideas but not enough thoughts was given to how these ideas interact, producing what side effects.


Pythons documentation says out loud right what it does: with ... as ... is effectively a rigidly defined try-except-finally block:

Context Manager heavily depends on resource opener function (EXPR) to return a constructed class instance that implements __exit__ and __enter__, so if you have a C external library imported to Python, like python-ft4222, likely you have to write in your context manager in full when you write your wrapper.


Typically the destructor should check if the resource is already closed first, then close it if it wasn’t already closed. Take io.IOBase as an example:

However, this is only a convenience when you are at the interpreter and can live with the destructor called with a slight delay.

To make sure your code work reliably without timing bugs, you’ll need to explicitly close it somewhere other than at a destructor or rely on object lifecycle timing. The destructor can acts as a double guard to close it again if it hasn’t, but it should not be relied on.


The with ... as construct is extremely ugly, but it’s one of the downsides of Python that cannot be worked around easily. It also makes it difficult for users to retry acquiring a resource because one way or another retrying involves injecting the retry logic in __enter__. It’s not that much typographic savings using with ... as over try-except-finally block if you don’t plan to recycle th contextmanager and the cleanup code is a one-liner.

Loading