Pandas DataFrame in Python (1): Disadvantage of using attributes (dot notation) to access columns. Use `[]` (getitem) operator instead

There are two ways to access columns in DataFrame. The preferred way is by square brackets (indexing into it like a dictionary), while it’s tempting to use the neater dot notation (treating columns like an attribute), my recommendation is don’t!

Python has dictionaries that handles arbitary labels well while it doesn’t have dynamic field names like MATLAB do. This puts DataFrame at a disadvantage developing dot notation syntax while the dictionary syntax opens up a lot of possibilities that are worth giving up dot notation for. The nature of the language design makes the dot notation very half-baked in Python and it’s better to avoid it altogether

Reason 1: Cannot create new columns with dot notation

UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access

Reason 2: Only column names that doesn’t happen to be valid Python attribute names AND DataFrame do not have any method with the same name can be accessed through dot notation.

Take an example of dataframe constructed from device info dictionaries created by the package pyft4222. I added a column called 'test me' to a table converted from the dictionary of device info. The tabe T looks like this:

I tried dir() on the table and noticed:

  • The column name "test me" did not appear anywhere, not even mangled. It has a space in between so it’s not a valid attribute or variable name, so this column is effectively hidden from the dot notation
  • flags is an internal attribute of DataFrame and it was not overriden by the data column flags when called by the dot notation. This means the flags column was also shadowed in (aka hidden to) the dot notation as there were no mangled name for it either

Even more weird is that getattr() works for columns with non-qualified attribute name like test me (despite the dot notation cannot access it because of the lack of dynamic field names syntax yet test me doesn’t show up in dir()). getattr(T, 'flags') still gets the DataFrame’s internal attribute flags instead of the column called flags as expected.

Loading

Dictionary of equivalent/analogous concepts in programming languages

CommonCC++MATLABPython
Variable arguments<stdarg.h>
T f(...)
Packed in va_arg
Very BAD!

Cannot overload
when signatures are uncertain.
varargin
varargout

Both packed as cells.

MATLAB does not have named arguments
*args (simple, stored as tuples)

**kwargs (specify input by keyword, stored as a dictionary)
Referencing
N/A
operator[](_) is for references
subsindex
subsassgn


[_] is for concat
{_} is for (un)pack
__getitem__()
__setitem__()
Default
values
N/ASupportedNot supported.
Manage with inputParser() or
newer arguments
Non-intuitive static data behavior. Stick to None or immutables.
Name-Value
Argument
Matching
Old way:
.., 'PropName', Value
and parse varargin

Since R2021a:
Name=Value
options in arguments
Name=Value
**kwargs
Major
Dimension
RowRowColumnRow (Native/Numpy)
Column for Pandas
ConstnessconstconstOnly in classesN/A (Consenting adults)
Variable
Aliasing
PointersReferencesNO! Rely on Copy-on-write
(No in-place functions*)

Handle classes under limited circumstances
References
= assignmentCopy one
element
Values: Copy
References: Bind
New Copy
Copy-on-write
NO VALUES
Bind references only
(could be to unnamed objects)
Chained
access operators
N/ADifficult to operator overload it rightDifficult to get it right. MATLAB had some chaining bugs with dataset() as well.Chains correctly natively
Assignment
expressions
(assignment evaluates to assigned lvalue)
==N/ANamed Expression :=
Version ManagementverLessThan()
isMATLABReleaseOlderThan
virtenv (Virtual Environment)
Exponentiation<math.h>
pow()
<cmath>
pow()
^**
Stream
(Conveyor belt mechanism. Saves memory)
I/O (std, file, sockets)
iterator in
STL containers
MATLAB doesn’t do references. Just increment indices.iterators (uni-directional only)
iter(): __iter__()
next(): __next__()
Loopingfor(init, cont_cond, next)C-style

for(auto running: iterable)
for k = array to iterate
list-comp

for (index, thing) in enumerate(lists)
Since MATLAB doesn’t do references, iterators (by extension generators) and functions that do in-place operations do not make sense (unless you bend it very hard with anti-patterns such as handles and dbstack).

Data Types

CommonCC++MATLABPython
SetsN/Astd::setOnly set operations, not set data type{ , , ...}
Dictionariesstd::unordered_map– Dynamic fieldnames
(qualified varnames as keys)
containers.Map() or dictionary() since R2022b
Dictionaries
{key:value}
(Native)
Heterogeneous containerscells {}lists (mutable)
tuples (immutable)
Structured
Heterogeneous containers
table()
dataset() [Old]

Mix in classes
Pandas Dataframe
Array,
Matrices &
Tensors
Native [ , ; , ]Numpy/PyTorch
Recordsstructclass
(members)
dynamic field (structs)
properties (class)

getfield()/setfield()
No structs
(use dicts)

attribute (class)
getattr()/setattr()
Type deductionN/AautoNativeNative
Type extractionN/Adecltype() for compile time (static)

typeid() for RTTI (runtime)
class()type()
Native sets operations in Python are not stable and there’s no option to use stable algorithm like MATLAB does. Consider installing orderly-set package.

Array Operations

CommonMATLABPython
Repeatrepmat()[] * N
np.repeat()
Logical IndexingNativeList comprehension
Boolean Indexing (Numpy)
Equally spaced numbersInternally colon():
start:step:end

linspace/logspace
range(begin, past_end, step)
produces an iterator

list(range()) or tuple(range())
iterates to realize the vector
Equally spaced indexingMATLAB has no generators,
so produced vector only
[start:past_end:step] is internally
slice() which produces a slice object, not range/lists/tuple. Faster but not iterable
Shallow copyDeep copy-on-writeSlice: x = y[:]
copy.copy()
Deep copyDeep copy-on-writecopy.deepcopy()

Editor Syntax

CommonCC++MATLABPython
Commenting/* ... */

// (only for newer C)
// (single line)

/* ... */ (block)
% (single line)

(Block):
%{
...
%}
# (single line)

""" or '''
is docstring which might be undersirably picked up
Reliable multi-line
commenting
(IDE)
Ctrl+(Shift)+R(Windows), / (Mac or Linux)[Spyder]:
Ctrl+1(toggle), 4(comment), 5(uncomment)
Code cell
(IDE)
%%[Spyder]:
# %%
Line
Continuation
\\...\
Console
Precision
format%precision (IPython)
Clear variablesclear / clearvars%reset -sf (IPython)
Macros only make sense in C/C++. This makes code less transparent and is frowned upon in higher level programming languages. Even its use in C++ should be limited. Use inline functions whenever possible.

Python is messy about the workspace, so if you just delete

Object Oriented Programming Constructs

CommonC++MATLABPython
Getters
Setters
No native syntax.

Name mangle (prefix or suffix) yourself to manage
Define methods:
get.x
set.x
Getter:
@property
def x(self): ...


Setter:
@x.setter
def x(self, value): ...
DeletersMembers can’t be
changed on the fly
Members can’t be
changed on the fly
Deleter (removing attributes
dynamically by del)
Overloading
(Dispatch function by signature)
OverloadingOverload only by
first argument
@overload (Static type)
@singledispath
@multipledispatch
Initializing class variablesInitializer Lists
Constructor
ConstructorConstructor
ConstructorClassName()
Does not return
(*this is implicit)
obj=ClassName(...)
MUST output the constructed object
__init__(self, ...)
Object to be constructed is 1st argument
Destructor~ClassName()delete()__del__()
Special
methods
Special member functions(no name)
method that control specific behaviors
Magic/Dunder methods
Operator overloadingoperatoroperator methods to defineDunder methods
Resource
Self-cleanup
RIAAonCleanup(): make a dummy object with cleanup operation as destructor to be removed when it goes out of scopewith Context Managers
Naming for the object itselfClass: (class’s own name by SRO ::)
Instance: *this
Class: (class’s own name)
Instance: obj (or any output name defined in constructor)
Class: cls
Instance: self
(Recommended PEP8 names)
Python allows adding members (attributes) on the fly with setattr(), which includes methods. MATLAB’s dynamicprops allows adding properties (data members) on the fly with addprop

onCleanup() does not work reliably on Python because MATLAB’s object destructor time is deterministic (MATLAB specifically do not garbage collect user objects to avoid this mess. It only garbage collects PODs) while Python leaves it up to garbage collector.

*this is implicitly passed in C++ and not spelled out in the method declaration. The self object must be the first argument in the instance method’s signature/prototype for both MATLAB and Python.

Functional Programming Constructs

CommonC++MATLABPython
Function as
variable
Functors
(Function Objects)
operator()
Function HandleCallables
(Function Objects)
__call__()
Lambda
Syntax
Lambda
[capture](inputs) {expr} -> optional trailing return type
Anonymous Function
@(inputs) expr
Lambda
lambda inputs: expr
Closure
(Early binding): an
instance of function objects
Capture [] only as necessary.

Early binding [=] is capture all.
Early binding ONLY for anonymous functions (lambda).

Late binding for function handles to loose or nested functions.
Late binding* by default, even for Lambdas.

Can capture Po through default values
lambda x,P=Po: x+P
(We’re relying users to not enter the captured/optional input argument)
Concepts of Early/Late Binding also apply to non-lambda functions. It’s about when to access (usually read) the ‘global’ or broader scope (such as during nested functions) variables that gets recruited as a non-input variable that’s local to the function itself.

An instance of a function object is not a closure if there’s any parameter that’s late bound. All lambdas (anonymous functions) in MATLAB are early bound (at creation).

The more proper way (without creating an extra optional argument that’s not supposed to be used, aka defaults overridden) to convert late binding to early binding (by capturing variables) is called partial application, where you freeze the parameters (to be captured) by making them inputs to an outer layer function and return a function object (could be lambda) that uses these parameters.

The same trick (partial application) applies to bind (capture) variables in simple/nested function handles in MATLAB which do behave the same way (early binding) like anonymous functions (lambda).

Currying is partial application one parameter at a time, which is tedious way to stay faithful to pure functional programming.

List comprehension is a shorthand syntax for transform/map() and copy_if/remove_if/filter() in one shot, but not accumulate/reduce(). MATLAB and C/C++ does not have listcomp, but listcomp is not specific to Python. Even Powershell has it.

Listcomp syntax, if wrapped in round brackets like (x**x for x in range(5)), gives a generator. Wrapping in square bracket is the shortcut of casting the generator into a list, so [x**x for x in range(5)] is the same as list(x**x for x in range(5)).

Coroutines / Asynchronous Programming

MATLAB natively does not support coroutines.

CommonC++20Python
GeneratorsInput IteratorsFunctions that yield value_to_spit_out_on_next
(Implicitly return a generator/functor with iter and next)
CoroutinesFunctions that value_accepted_from_outside = yield
Send value to the continuation by g.send(user_input)

async/await (native coroutines)

Matrix Arrays

The way Numpy requires users to specify matrices with a bracket for every row drives me nuts. Not only there’s a lot of typing, the superfulous brackets reinforce C’s idea of row-major which is horrendous to people with a proper math background who see matrices as column-major \mathbf{A}_{r,c}. Pytorch is the same.

Once you are trained in APL/MATLAB’s matrix world-view, you’ll discover going back to the world where matrices aren’t first class citizens is clumsy AF.

With Python, you lose the clutter free readability where your MATLAB code is one step away from the matrix equations in your scientific computing work, despite a lot of the features that addresses frequent use patterns are implemented earlier in Python than MATLAB.

Don’t believe those who haven’t lived and breathed MATLAB tell you Python is strictly superior. No it isn’t. They just didn’t know what they were missing as they haven’t made the intellectual leap in MATLAB yet. Python is very convenient as a swiss-army knife but scientific computing is an afterthought in Python’s language design.

The only way to use MATLAB-like semi-colon to change rows only works for np.matrix() type, which they plan to deprecate. For now one can cast matrix into array like np.array(np.matrix(matrix_string)).

Even numpy’s ndarray (or matrix to be deprecated) are CONCEPTUALLY equivalent to a matrix of cells in MATLAB. There isn’t native numerical matrices like in MATLAB that doesn’t have the overhead of unpacking arbitrary data types. You don’t want to do numerical matrices in MATLAB with cell matrices as it’s insanely slow.

You get away without the unpacking penalty in Numpy if all the contents of the ndarray happens to have the same dtype (such as numerical), aka known to be uniform. In other words, MATLAB’s matrices are uniform if it’s formed by [] and heterogeneous if formed by {}, while for Python [] is context-dependent, kept track of by dtype.

ConceptMATLABNumpy
Construction[8,9;6,4]np.array([[8,9],[6,4]])
Size by dimensionsize()A.shape
Concatenate
within existing dimensions
[A;B] or vertcat()
[A,B] or horzcat()
cat(dim, A, B, ...)
np.vstack()
np.hstack()
np.concatenate(list, dim)
Concatenate expanding
to 3D (expand in last dimension)
cat(3, A, B, ...)np.dstack()
‘d’ for depth (3rd dimension)
Concatenate
expanding dimensions
cat(newdim, A, B, ...)
then permute()
np.stack([A, ..], expand_at_axis)
np.array([A, ..]) expands at first
dimension as outermost bracket
refers to first dimension
Tilingrepmat()np.tile()
Fill with same valuerepmat()np.full()
Fill with ones/zerosones(), zeros()np.ones(), np.zeros()
Fill minicking another
array’s size
repmat(x, size(B))
ones(x, size(B))

zeros(x, size(B))
np.full_like(B, x)
np.ones_like(B)
np.zeros_like(B)
PreallocateAny of the above
(Must be initialized)
np.empty()
np.empty_like()
UNINITIALIZED
repelem() is just repmat() with the repetition by axes vector expanded out as variable input arguments one per dimension. Using ones vector to broadcast a singleton instead of repmat() is horrendously inefficient and non-intuitive.

Heterogeneous Data Structures

Heterogeneous Data Structures are typically column major as it is a concept that derives from Structs of Arrays (SoA) and people typically expect columns to have the same data type from spreadsheets.

While Pandas offers a lot of useful features that I’ve easily implemented with wrappers in MATLAB, the indexing syntax of Pandas/Python is awkward and confusing. It’s due to the nature that matrix is a first-class citizen in MATLAB while it’s an afterthought in Python.

Python does not have the { } cell pack/unpack operator in MATLAB, so in Pandas, you select the Series object (think of it as a supercharged list with conveniences such as handling missing values and keeping track of row/column labels) then call its .values attribute.

However, Pandas is a lot more advanced than MATLAB in terms of using multiple columns as keys and have more tools to exploit multi-key row names (row names not mandatory in MATLAB but mandatory in Pandas). In the old days I had to write my own MATLAB function with unique(.., 'rows') exploit its index output to build unique keys under the hood.

ConceptMATLABPython (Pandas
Dataframe)
RowsObservations (dataset())
Row (table())
Rows
index
ColumnsVariablesColumns
Select rows/columnsT(rows, cols)T.loc[r, col_name]
T.iloc[r,c]

Caveats:

– single index
(not wrapped in list)
have content extracted

iloc on LHS cannot
expand table but loc can, but it can only inject 1 row

– can get index number of names by T.get_loc() to use with T.iloc[]
Remove rows/columnsT(rows, cols) = []T.drop(index=rows, columns=cols)
Optionally: inplace=True
del T[rows, cols] does NOT work
Extract one columnT{:, c}T[c].values
Extract one entryT{r, c}T.at[r,col_name]
T.iat[r,c]

Faster than loc/iloc
Show first few rowsT(1:5, :)T.head()
Drop duplicate rowsunique(T, 'stable')T.drop_duplicates()
Ordinalcategorical()
ordinal()
Categorical()
Index()
Getting column names/labelsT.Properties.VariableNames
(returns cellstr() only)
T.columns
(returns Index() or RangeIndex())
Getting row
names/labels
T.Properties.RowNamesT.index
Transpose tablerows2vars()T.transpose()
Move columns
by name
movevars() since R2023a
Rename columnsrenamevars() since R2020aT.rename(columns={source:target})
Rename rowsModify
T.Properties.RowNames
T.rename(index={source:target})
Use column as row indicesT.Properties.RowNames = T.cellstr_variablename
If multiple columns are needed, need to combine them into one column using some user rules
T.set_index(column_to_use)
Dataframe allows multiple columns as row index keys
Reorder or partial selectionT[rows, cols]T.reindex(columns=..., index=...)
New labels will autofill by NaN
Select columnsT[:, cols]T[list_of_cols]
Pick column by data typeT[:, varfun(...)]T.select_dtypes(include=[list of type names])
Pick column by string matchT[:, varfun(...)]T.filter(like=str_to_match)
Blindly concatenate columns of 2 tables[T1, T2]

If you defined optional rownames, they must match. You can delete it with T.Properties.RowNames = {}
Pandas assign row indices (labels) by default.

Mismatched row labels do not combine in the same row. Consider reset_index() or overwrite the row indices of one table with another, like
pd.concat([T1, T2.set_index(T1.index)]
Blindly
concatenate rows of 2 tables
[T1; T2]pd.concat([T1, T2], ignore_index=True)
Format exportwritetable().to_*()
MATLAB tables does not support ranging through column names (such as 'apple':'grapes') yet Pandas DataFrame support it. I don’t think it’s fine to use it in the interpreter to poke around, but this is just asking for confusing logic bugs when the columns are moved around and the programmer has a false sense of security knowing exactly what’s where because they are using only names.

Dataframe is a little smarter than MATLAB’s table() in terms of managing column names and indices as it’s tracked with Index() type which is the same idea as MATLAB’s ordinal() ordered categorical type, where uniques names are mapped to unique indices and it’s the indices under the hood. This is how 'apple':'grapes' can work in Python but not MATLAB.

MATLAB T.Properties.VariableNames is a little clumsy. I usually implement a consistent interface called varnames() that’d output the same cellstr() headings whether it’s struct, dataset or table objects.

MATLAB’s table() by default do not make up row names. Pandas make up row names by default sequentially.

MATLAB table() do requires qualified string characters as variable names. Dataframe doesn’t care what labels you use as long as Index() takes it. It can get confusing because you can have a number 1 and ‘1’ as column headers at the same time and they look the same when displayed in the console.

Loading

Spyder traps for MATLAB users (1): By default, Spyder’s F5/Run executes the script from clean workspace.

This is another example of open source projects not going through a comprehensive use case study before changing the default behavior, which end up pulling the rug on some users.

This time it’s Spyder’s good-intentions trying to proactively prevent user mistakes (such as not keeping track of the workspace) throwing the people who meticulously understand their workspace off.

I was working on a FT4222 device which should not be opened again if it’s already opened, aka the ft4222 class object exists. So naturally like in MATLAB, at the top of the script I check if the device object already exist and only create/open it when it’s not already there, like this:

if 'dev' in locals():
    pass
else:
    print('Branch')
    dev = ft4222.openByDescription('FT4222 A')

To my surprise it doesn’t work. 'dev' in locals() always return False every time I press F5, despite when I check again after the script runs, the variable is indeed in there and 'dev' in locals() returns True. WTF?!

Turns out I was not alone! Somebody had the exact same idiom as I did. Spyder 4 changed the default behavior, and we are supposed to manually check this dialog box entry so the scripts do not run off a clean slate when we press F5!

Spyder 5
Spyder 6

It’s an extremely terrible idea to have the IDE muck with the state by default. In MATLAB, if we want the script to start with clean state, we either put clear at the top of the script or clearvars -except to keep the variable.

It’s even harder to catch the new default insidious behavior of Spyder given it runs the script from a clean slate from F5/Run then dump the values to the workspace. It’s now a merge between pre-existing variables in the local() workspace and the results of the script from from a blank state!

The people who decided change to this default behaveior certainly didn’t think through this and rushed to do the obvious to please the careless programmers. If a programmer made a mistake by re-running the script without clearing the workspace and was impacted by the dirty variables, they can always reset everything and get out of this (and learn they should clean up the dirty state through the experience), however, somebody who know what they are doing will not be able to figure out what they did wrong until they search for a behavior that looked more like a bug from Spyder/Python! It’s just horrible design choice! MATLAB doesn’t casually to throw users off like this. Damn!


Also I looked into code cells #%% (MATLAB has the equivalent %%), but there’s another annoyance in Spyder: block commenting through """ or ``` pairs is interpreted as output string from runcelll()! In other words, runcelll() outputs docstrings! So every time you execute the cell, the code you comments will be concatenated into one long raw string with escape characters and pollute your console screen! Damn!


Spyder annoyances (3): The shortcut key Ctrl+D to reset console doesn’t work unless there’s nothing half typed in the console.

Loading

Python Cheat Sheet

Built-in functions

  • breakpoint(): Python’s version of MATLAB’s keyboard() command
  • callable(): Like MATLAB’s isfunction() but it really checks if there’s a __call__ method
  • getattr()/hasattr(): MATLAB’s getfield()/isfield(). The 3rd parameter of getfield() is a shortcut to spit out a default if there’s no such field/attribute, which MATLAB doesn’t have
  • globals()/locals(): more convenient than MATLAB because the whole workspace (current variables) are accessed as a dictionary in Python by calling locals() and globals()
  • id(): memory address of the item where the variable (reference) is pointing to. Think of it as &x in C.
  • isinstance(): MATLAB’s isa()
  • next(): Python favors not actually computing the values until needed so instead it offers a generator (forward iterable) function that spits out one value at each time you kick it with next() and you can’t go back.
  • chr()/ord(): analogous to MATLAB’s char()/double() cast for characters
  • Python’s exponentiation is **, not ^ like most other languages (C does not have exponentiation symbol, and ^ was used for xor)
  • print(…, flush=false) allows a courtesy flush
  • repr(): MATLAB’s version of disp(), also overloadable standard interface
  • slice(): MATLAB’s equivalent of colon() special interface

Context Manager

@contextlib.contextmanager decorators basically splits a set-try-yield-finally boilerplate function into 3 parts: __enter__ (everything above yield), BODY (where the yield goes to) and __exit__ (everything below yield), since a with-as statement is a rigidly defined try-finally block, roughly like this:

with EXPR as f:
  BODY(using f)
__enter__: f=EVAL(EXPR)
try:
  # f isn't evaluated till yield  
  yield f  # Goes to BODY
finally:
  __exit__: cleanup(f)

Loading

What Python does not do, and what are the alternatives

RAII: Using stack unwind to manage program flow

Python uses garbage collectors, so onCleanup() might often work, but it’s not guaranteed to. So any code based on that should not be in production

Answer: Context Manager, a glorified try-catch (more specifically try-finally) block with a rigid structure. It’s a pain in the butt and not fun to deal with if you want to deviate from the native ContextManager that came with the resource opener

‘switch’-case is back as ‘match’-case (the advanced uses are different)

New Python finally supports it, ‘switch’ in C is called ‘match‘ in Python and there are many handy and intuitive syntax just like in MATLAB! Horray!

If you try to do anything fancy with mutables in the cases, be careful about the side effects!

Pass by Variable (Copy-on-Write) like MATLAB

Variables are by large references in Python. Everything including integers are some sort of classes (which are in turn dictionaries with special treatment to certain key names). The garbage collector scans for the last guy using that part of memory not referencing it anymore before cleaning it.

Python even have one ‘None’ for the entire universe with a gazillion things going on pointing to the same memory address where None is stored (that’s why None is idoimatically checked by ‘is’ keyword which checks the address for speed instead of ‘==’ which actually verify the contents for speed). If you look up the reference count (see garbage collector) for commonly used numbers like 1s and 0s, there are thousands of ‘users’ of it!

With C++, it’s a mixture. Complex objects are usually passed by reference for performance reasons but simple structs and data types can be passed as variables (C/C++’s nomenclature calls the non-reference/pointers variables though technically references/pointers are just the same integers identified as addresses) that gets cloned and destroyed when they move across function (stack) boundaries.

In MATLAB, they want it industrial strength, so that’d rather not allow anything insidious/non-transparent to happen in your code by keeping it all pass-by-variable, that is everything is supposed to be treated as different copies as it crosses function boundaries. For performance reason, they figured if you passed a big matrix just for the function to read, MATLAB doesn’t really have to clone that so under the hood you can peak the same matrix that belong to the caller. Once your function changes the contents (they are pretending that it’s a separate copy so of course you can), MATLAB painfully makes a whole copy of it (copy-on-write) which you then have to lug the 2nd (modified) copy around when it travels past function boundaries.

Python takes this idea a lot further by having anything that’s exactly the same (including None or string literals) to point to the same object until you ask to change the contents, then it makes a new copy for you to change and point to the new copy specifically for the variable name you are referencing with.

Answer: The way Python prevents variables passed as parameters passed into a function from getting modified is to separate variables into mutable (lists, dicts, sets), and immutable (tuples, frozendict is a package right now, frozenset, numerics, strings) types. Anything immutable going past the function boundary gets their own local copy.

Classes Boundaries

MATLAB and C++ has stringent access control, but not in Python. There’s not even const correctness. Just signal your intention with variable naming schemes like all caps and __ prefixes.

C++ do not separate helper functions and class (non-instance) methods. What C++ called static members are really just glorified free functions and global variables tucked under the namespace that happened to be in the class (classes started off as namespace for structs then people add features like overloading and dispatch mechanisms). partitioning the global workspace. Whether a scoped helper function call a scoped variable that happened in the same namespace is nothing special to C++

Python does separate these two concepts though. Class method in Python (decorated by @classmethod), on the other hand, are equivalent to C++’s static methods which they are not allowed to touch anything instance-specific, but they can access anything class-specific. Helper functions, which is called ‘Static Method‘ (decorated by @staticmethod) in Python cannot even touch anything specific to the class.

Variable arguments

C++ doesn’t generally do variable arguments because it defeats signature-based method overloading that uses a function signatures (which is a list of your argument types) to figure out which function to dispatch.

MATLAB uses cell to pack variable arguments. The common idioms are varargin{:} and [varargout{1:nargout}]. To accommodate variable arguments, MATLAB have to give up method overloading but they still have a little bit of it left: they do dispatch based on the first argument type and it’s very useful in avoiding a lot of stupid switching by detecting data types: just use a consistent function name interface and have each data type implement its own method with the same name.

In Python, there’s no such thing as multiple outputs (return variables) on Python: you output a list and it always gets unpacked (just like MATLAB’s deal() function) when you type a list out on left hand side. If the left hand side is a singleton, it will get the full list that’s still packed. If you write out the elements (which makes the left hand side a list), the returned list will have elements assigned to the left hand side depending on your syntax.

This is often a point of agony deciding on output format when I develop MATLAB code. Apparently TMW wondered the same thing too because their own factory code is all over the place on this too. Most of the time it’s not a good idea to have a context-dependent (depending on how many outputs the user supplied) even if you can technically do that by detecting nargout in MATLAB.

Answer: My recommendation is to make sure the simple, most common case got priority, and stuff the juicy side info in packaged data structures (such as array or cells/monads/lists) and stick to a fixed output format whether you are in Python or MATLAB

Late Binding in Lambda / Anonymous Functions: Capture it!

This often throw people off in Python. MATLAB uses early binding, which means when you created that anonymous function (aka lambda), the free variables (parameters that are not running, aka the input arguments to the lambda/anonymous-function) captured the snapshot of the local workspace at the moment the lambda/anonymous-function was created!

Python on the other hand, uses the same approach as C++: late binding. This means you have to explicitly capture (make a snapshot copy) the free variables if you want to associate it with the values when the anonymous function (lambda) was created, not to wait until the lambda was actually called/used to look for what values to use in the free variables.

P = 612
# P was not captured, thus late binding
f = lambda x     : print{f'input/running:{x}, param/free:{P}')
# P was captured as p, thus early binding
g = lambda x, p=P: print{f'input/running:{x}, param/free:{p}')
P = 721
f(8964) # shows "input/running: 8964, param/free: 721"
g(8964) # shows "input/running: 8964, param/free: 612"

Loading