Librarize! Free variables/functions school of thought (as compared to OOP)

Posted on April 19, 2025 by admin

When programming in C++, I prefer to stick to free functions and refactor everything generic into libraries. However it doesn’t sound like the norm for now. I’m glad after sawing this video that that I’m not the only one who prefers free functions.

This lecture explains why prefering free functions instead of jumping to cram everything into classes aligns with OOP doctrines, but that’s not how I came up with this idea on my own.

TLDR: My whole thesis of preferring free functions is based on

there’s no reason to reinvent the wheel badly by not identifying generic operations and factor it out as calls to generic libraries not tied to the business logic!
my observation that data mambers are globalist (global variable) style of programming sugarcoated by containing the namespace pollution with class scopes!

The lecture suggested putting some part of business logic code as free functions too, right after class definitions. I didn’t really think of that because I often refactor aggressively enough that there’s not much left to pollute the class’s namespace.

If you can refactor your code very well, the top level code should be so succinct that it pretty much reads the business logic without the noise from implementation details, to the extent that non-programmers can develop a picture of what your code does without knowing the intricate mechanics of the programming language.

Background

Class is a mental model built on Von Neuman architecture suggesting that data (variable) and program (functions) aren’t very different after all.

Structs provides a compact way to bundle different variables into one logical unit. It’s more of an eye candy.

Given the Von Neuman’s view that program (function pointers in reality) is treated as a data (variable), a struct can bundle programs and data.

Then people naturally made up the fabric of classes, calling a bundle of actions (program) anda state (data) an object to mimic our daily observations.

Making certain variables (fields) in struct callable requires a little special treatment that could be done in the compiler. This is the very primitative form of classes.

I said state here because data members in OOP naturally encourage people to frame the program in terms of shared states instead of tightly controlled data flow by passing arguments in function calls.

Functional programming avoids states which makes it a polar opposite way of structuring your program and data structures than OOP. It’s not even passing data, but chain acting on data.

Direct delivery (local variables) vs Sharing access (data members)

Shared states is what globalist programmers (pun intended) are doing with non-local variables.

With local variables passed through arguments in function calls, you hand the item (say a letter) to the intended recipient (the function you call) directly. It’s point-to-point delivery: simple and predictable.

With data members or nonlocal variables, you leave your message in the dropbox (non-local scope or data member scope) and hope for the best (the right people will pick it up and nobody messed with it in between).

People say globals are evil, but they are missing the point by thinking it’s just the breadth of namespace pollution that makes it evil. It’s actually the dropbox mechanism that make your program fragile and defenseless against domestic (namescope) violence (unwanted data mucking).

Enclosing globals into data members only limits the potential public violence into potential domestic violence (pun intended).

Classes merely put a lock in the dropbox and give ALL people in the same family/Class (methods) the same key for private members. Public means unlocked.

Think of data members scope as a fridge at home. Methods are the people at home. Whoever that has the home key can put food in the fridge, mess with it, or eat it.

Within the same class/family, there are no finer controls over who (methods) can touch which food item (data member) as long as they are in the same trusted level.

So if you didn’t refactor your class composition (not hieracy, as hierachy likely exposes more data members to more methods that doesn’t need it) to only allow intended methods to access only the data members they need, you are not encapsulating tightly.

“Only allow intended methods to access only the data members they need” is hell of hard. This is the same as saying that you need to devise a complicated composition hieracy and manage the interfaces between them so every class involved is a complete bipartite graph between data members and methods!

For example, if method R only needs (A,B,C), it should not have access to D, so D needs to be factored out into another class. If data member B is needed only by methods (P,Q), it should not be in the same family as R. Then you have to manage the interfaces between these classes. Yikes!

If you get this far to encapsulate properly, you might as well factor as many method as you can into directly passing variables in free functions. In reality we don’t really go this far and stop somewhere when the bipartite graph is dense enough so our mind can keep track of all possible parties (which methods are involved with what data members) within the class.

Compromise / Solution?

There are applications like GUI objects that it’s just a total pain in the ass to explicitly pass data through arguments over every event (callback) so state holding (data members) makes more sense. Eventually we need to leave something to data members even if we did our best to factor free functions out of the class. GUI is one of the use cases where I won’t shame myself for abusing non-local scopes.

Hope C++ will eventually come up with a compile-time contract syntax (method access control) that allows users to define what methods are allowed to touch which data member (potentially read/write as well) such as:

private:
   bool a : method1, method2(r), method3(w)
   char b : method2(rw), method3(r)

I don’t think we need to go that far to micromanage the other direction where methods declare which data members it could touch, as it takes extra work to check if the two directions agree. It’s methods that could go rogue on data. Data could not go rogue on methods unless they were abused/poisoned.

When to use classes

OOP is a useful idea but I would not over-objectify, like writing a class to hold 5 constants or organize a collection of loosely coupled or unrelated generic helper functions into a class (it should be organized into packages or namespaces). Over-objectifying reeks cargo cult programming.

My primary approach to program design is self-documenting. I prefer presenting the code in a way (not just syntax, but the architecture and data structure) that’s the easiest to understand without material sacrifices to performance or maintainability. I use classes when the problem statement happened to naturally align with the features classes has to offer without a lot of mental gymnastics to frame it as classes.

My decision process goes roughly like this:

If a problem naturally screams data type (like matrices), which is heady on operator overloading, I’d use classes in a heartbeat as data types are mathematical objects.
Then I’ll look into whether the problem is heavy on states. In other words, if it’s necessary for one method to drop something in the mailbox for another method to pick it up without meeting each other (through parameter passing calls), I’ll consider classes.
If the problem statement screams natural interactions between objects, like a chess on the chessboard, I’d consider classes even if I don’t need OOP-specific features

Do not abuse OOP to hide bad practices

The last thing I want to use OOP as a tool for:

Hiding sloppy generic methods that is correct only given the implicit assumptions (specific to the class) that are not spelt out, like sorting unique 7 digit phone numbers.
Abusing data members to sugarcoat the practice of casually using globals and static all over the place (poor encapsulation) as if you’d have done it in a C program.

1) Free functions for generic operations

The first one is an example that calls for free functions. Instead of writing a special sort function that makes the assumption that the numbers are unique and 7 math digits. A free function bitmap_sort() should be written and put in a sorting library if there isn’t any off-the-shelf package out there.

In the process of refactoring parts of your program out into generic free functions (I just made up the word ‘librarize’ to mean this), you go through the immensely useful mental process of

Explicitly understanding the exact assumptions that specifically applies to your problem that you want to take advantage of or work around the restrictions. You can’t be sure that your code is correct no matter how many tests you’ve written if you aren’t even clear about under what inputs your code is correct and what unexpected inputs will break the assumptions.
Discover the nomenclature for the concept you are using
Knowing the nomenclature, you have a very good chance of finding work already done on it so you don’t have to reinvent the wheel … poorly
If the concept hasn’t been implemented yet, you can contribute to code reuse that others and your future self can take advantage of.
Decoupling generic operation from business logic (the class itself) allows you to focus on your problem statement and easily swap out the generic implementation, whether it’s for debugging or performance improvement, or hand the work over to others without spending more time explaining what you wanted than writing the code yourself.

This is much better than jumping into writing a half-assed implementation of an idea that you haven’t fully understood the quirks (assumptions) and hide it as an internal method.

Decouple, decouple, decouple

By hiding the good stuff (a generic piece of code which is a useful algorithm) inside the class, you are merely luring the people dealing with your code base to go all the way out to break your intended access controls in order to get to the juicy implementation.

Don’t be embarassed if your first attempt in the generic code/algorithm is primative and it applies to very narrow input conditions! Instead of hiding the ’embarssing’ implementation inside a class that tolerates it due to the assumptions given the class’s context, you can simply add suffix to your function names to explain the limitations.

Making the name longer discourages people from expanding its usage beyond what you intended. It also helps code reviewers catch people using your function when the restrictions denoted by the suffix of the function name clearly doesn’t apply to the said use case.

Later on when people develop a more generic version (likely by learning from you), they can make the function name shorter after removing the restriction suffix. If your function turned out to be more generic than you’ve planned for, they can always write a wrapper that calls your function under the hood.

For example, during your exploration, you could start with sort_unique_phone_numbers() in your library. Later on somebody expanded on it and call it sort_unique_numbers(). Eventually some of you realized it’s bitmap_sort(). This way you don’t have to worry about colliding with the name sort() which is way more general.

If you factor the juicy implementation (of a generic idea) out of your class, others can happily use your library of free functions, leaving your internal implementations alone instead of trying questionable maneuvers to exploit it which lead to code cruft and debugging hell.

You learn a new concept well rather than repeating similar gruntwork over and over and it doesn’t benefit anybody else, and you likely have to debug the tangled mess when you run into a corner case because you didn’t understand the assumptions well enough to decouple a generic operation from the class.

Overload free functions!

Polymorphism in OOP is a lot broader than just function overloading.

virtuals has to be an OOP thing because it’s is run-time polymorphism, which make sense only with inheritance.

Templates (which also applies to free functions), and function overloading (which also applies to free function) are compile-time polymorphism.

Polymorhpishm isn’t exclusive to OOP the way Bajrne defined it. C++ can overload free functions! You don’t need to put things into classes just because you want a context (signature) dependent dispatch (aka compiler figuring which version of the function with the same name to call).

I frequently overload free functions in MATLAB (by the type of the first argument) so it sound natural to me. You can’t do this natively in Python without wrestling with deocrators.

Program organization strategies

If you are really edgy about namespace pollution, just use namespace (but not class) for your free functions.

Here’s an example where I put half-baked generic library functions in C_tools and free functions that exclusively deal with the said class as C_free:

#include <iostream>

namespace C_tools { 
  int double_singleton(int x) { return 2*x; }
};

class C {
  public:
    C() : a(64) {}
    void update(int x) { a = x; }
    int a_doubled() { return C_tools::double_singleton(a); }
    int get_a() { return a; }
  private:
    int a;
};

namespace C_free {
  // It's a reference, so this changes the state
  void double_a(C& c) {
    c.update(c.a_doubled());
  }
};

int main() {
    C c; // Internally a is 64
    // a_doubled() returns the 2*a, giving 128
    std::cout << c.a_doubled() << std::endl;

    // Free generic helpers for others to use too
    // 2*89 = 178
    std::cout << C_tools::double_singleton(89) << std::endl;

    // Free functions acting on instances
    C_free::double_a(c);  // a=2*64=128
    std::cout << c.get_a() << std::endl;
    C_free::double_a(c);  // a=2*128=256
    std::cout << c.get_a() << std::endl;
    return 0;
}

The same namespace can be scattered in your code. So you can order them by the dependency in your code. There’s no reason to forward declare to keep the same namespace in one block.

I intentionally didn’t overload a free function in the example above to show that if your code is not really up to par to be generalized alongside with other code with the same idea and same name, it’s better to just use different names and not let C++ dispatch by context/signature.

In the following example,

I merged C_tools to C_free to demonstrate that the same namespaces do not have to stay in one block. (Despite separating the two namespaces intended for different set of audience is a better practice)
Now I merged C_tools are merged into C_free, I’ll take advantage of it to demonstrate free function overloading. Now there are two candidates of C_free:sqaure(): one takes an int while the other takes in a reference to class C, aka class C&.

#include <iostream>

namespace C_free { 
  int square(int x) { return x*x; }
};

class C {
  public:
    C() : a(64) {}
    void update(int x) { a = x; }
    int a_squared() { return C_free::square(a); }
    int get_a() { return a; }
  private:
    int a;
};

namespace C_free {
  // It's a reference, so this changes the state
  void square(C& c) {
    c.update(c.a_squared());
  }
};

int main() {
    C c; // Internally a is 64
    std::cout << c.a_squared() << std::endl; // 64*64=4096

    // Free generic helpers for others to use too
    std::cout << C_free::square(89) << std::endl; // 89*89=7921

    // Free functions acting on instances
    C_free::square(c);  // c.a=64*64=4096
    std::cout << c.get_a() << std::endl;
    C_free::square(c);  // c.a=4096*4096=16777216
    std::cout << c.get_a() << std::endl;
    return 0;
}

2) Classes are not excuses to hide unnecessary uses of global/statics

Data members in classes are namespace-scoped version of global/static variables that could be optionally localized/bound to instances. Private/Public access specifiers in C++ expanded on global/file scope variables switched through static modifier (file scope) back in C days.

If you don’t think it’s a good habit to sprinkle global scope all over the place in C, try not to go wild using more data members than necessary either.

Data members give an illusion that you are encapsulating better and ended up incentivising less defensive programming practices. Instead of not polluting in the first place (designing your data flow using the mentality of global variables), it merely contained the pollution with namespace/class scopes.

For example, if you want to pass a message (say an error code) directly from one method to another and NOBODY else (other methods) are involved, you simply pass the message as an input argument.

Globals or data members are more like a mechanism that you drop a letter to a mailbox instead of handing it your intended recipient and hope somehow the right person(s) will reach it and the right recipient will get it. Tons of things can go wrong with this uncontrolled approach: somebody else could intercept it or the intended recipient never knew the message is waiting for him.

With data members, even if you marked them as private, you are polluting the namespace of your class’s scope (therefore not encapsulating properly) if there’s any method that can easily access data members that it doesn’t need.

How I figured this out on my own based on my experience in MATLAB

Speaking of insidious ways to litter your program design the globalist mentality (pun intended), data members are not the only offenders. Nested functions (not available in C++ but available in modern MATLAB and Python) is another hack that makes you FEEL less guilty structuring your program in terms of global variables. Everything visible one level above the nested function is relatively global to the nested function. You are literally polluting the variable space of the nested function with local variables of the function one level above, which is a lot more disorganized than data members that you kind of acknowledge what you’ve signed up for.

Librarize is the approach I came up with for MATLAB development: keep a folder tree of user MATLAB classes and free functions organized in sensible names. Every time I am tempted to reinvent the wheel, I try to think of the best name for it. If the folder with the same name exist, chances are I already did something similar before and just needed a little reminder. This way I always have high quality in-house generic functions (which I could expand the use cases with backward compatibility as needed).

This approach works because I’m confident with my ability to naturally come up with sensible names consistently. When I did research in undergrad, the new terminologies I came up with happened to coincide with wavelets before I studied wavelets, as in hindsight what I was doing was pretty much the same idea as wavelets except it doesn’t have the luxury of an orthogonal basis.

If a concept has multiple names, I often drop breadcrumbs with dummy text files suggesting the synonym or write a wrapper function with a synonymous name to call the implemented function.

C++ could simply overload free functions by signatures, but not too many people know MATLAB can overload free functions too! MATLAB’s function overload is polymorphic (the decision on which version to dispatch) by ONLY BY THE FIRST ARGUMENT.

MATLAB supports variable arguments which defeats the concept of signatures. So be grateful that at least it can overload by the first argument. Python doesn’t even overload by the first arugment.

It’s a very advanced technique I came up with which allow the same code to work for many different data types, doing generics without templates available in C++.

I also understand that commercial development are often rushed so not everybody could afford the mental energy to do things properly (like considering free functions first). All I’m saying is that there’s a better way than casually relying on data members more than needed, and using data member should have the same stench as using global variables: it might be the right thing to do in some cases, but most often not.

If you see people sticking to free functions, please consider the merits of it before jumping to judge them. It’s easy to tell based on the application (based on how ‘hard’ they try, aka bending things to fit a paradigm) if people are doing things one way or the other religiously or they adapt to the problem they are solving.

Dictionary of equivalent/analogous concepts in programming languages

Posted on February 19, 2025 by admin

Common	C	C++	MATLAB	Python
Variable arguments	`<stdarg.h>` `T f(...)` Packed in `va_arg`	Very BAD! Cannot overload when signatures are uncertain.	`varargin` `varargout` Both packed as cells. MATLAB does not have named arguments	`args` (simple, stored as tuples) `*kwargs` (specify input by keyword, stored as a dictionary)
Referencing	N/A	`operator[]`	`(_)` is for references `subsindex subsassgn` `[_]` is for concat `{_}` is for (un)pack	`__getitem__() __setitem__()`
Default values	N/A	Supported	Not supported. Manage with `inputParser()` or newer `arguments`	Non-intuitive static data behavior. Stick to `None` or immutables.
Name-Value Argument Matching			Old way: `.., 'PropName', Value` and parse `varargin` Since R2021a: `Name=Value` `options` in `arguments`	`Name=Value` `**kwargs`
Major Dimension	Row	Row	Column	Row (Native/Numpy) Column for Pandas
Constness	`const`	`const`	Only in classes	N/A (Consenting adults)
Variable Aliasing	Pointers	References	NO! Rely on Copy-on-write (No in-place functions*) Handle classes under limited circumstances	References
`=` assignment	Copy one element	Values: Copy References: Bind	New Copy Copy-on-write	NO VALUES Bind references only (could be to unnamed objects)
Chained access operators	N/A	Difficult to operator overload it right	Difficult to get it right. MATLAB had some chaining bugs with `dataset()` as well.	Chains correctly natively
Assignment expressions (assignment evaluates to assigned lvalue)	`=`	`=`	N/A	Named Expression `:=`
Version Management			`verLessThan()` `isMATLABReleaseOlderThan`	`virtenv` (Virtual Environment)
Exponentiation	`<math.h>` `pow()`	`<cmath>` `pow()`	`^`	`**`
Stream (Conveyor belt mechanism. Saves memory)	I/O (std, file, sockets)	`iterator` in STL containers	MATLAB doesn’t do references. Just increment indices.	iterators (uni-directional only) `iter(): __iter__()` `next(): __next__()`
Looping	for(init, cont_cond, next)	C-style for(auto running: iterable)	for k = array to iterate	list-comp for (index, thing) in enumerate(lists)

Since MATLAB doesn’t do references, iterators (by extension generators) and functions that do in-place operations do not make sense (unless you bend it very hard with anti-patterns such as handles and dbstack).

Data Types

Common	C	C++	MATLAB	Python
Sets	N/A	`std::set`	Only set operations, not set data type	`{ , , ...}`
Dictionaries		`std::unordered_map`	– Dynamic fieldnames (qualified varnames as keys) – `containers.Map()` or `dictionary()` since R2022b	Dictionaries `{key:value}` (Native)
Heterogeneous containers			cells `{}`	lists (mutable) tuples (immutable)
Structured Heterogeneous containers			`table()` `dataset()` [Old] Mix in classes	Pandas Dataframe
Array, Matrices & Tensors			Native `[ , ; , ]`	Numpy/PyTorch
Records	struct	class (members)	dynamic field (structs) properties (class) `getfield()/setfield()`	No structs (use dicts) attribute (class) `getattr()/setattr()`
Type deduction	N/A	`auto`	Native	Native
Type extraction	N/A	`decltype()` for compile time (static) `typeid()` for RTTI (runtime)	`class()`	`type()`
Categorical Arrays		`categorical()` Previously `ordinal()/nominal()`	`pd.cut(x, bins, labels)`

Native sets operations in Python are not stable and there’s no option to use stable algorithm like MATLAB does. Consider installing orderly-set package.

Array Operations

Common	MATLAB	Python
Repeat	`repmat()`	`[] * N` `np.repeat()`
Logical Indexing	Native	List comprehension Boolean Indexing (Numpy)
Equally spaced numbers	Internally `colon()`: `start:step:end` `linspace`/`logspace`	`range(begin, past_end, step)` produces an iterator `list(range())` or `tuple(range())` iterates to realize the vector
Equally spaced indexing	MATLAB has no generators, so produced vector only	`[start:past_end:step]` is internally `slice()` which produces a slice object, not range/lists/tuple. Faster but not iterable
Shallow copy	Deep copy-on-write	Slice: `x = y[:]` `copy.copy()`
Deep copy	Deep copy-on-write	`copy.deepcopy()`

Editor Syntax

Common	C	C++	MATLAB	Python
Commenting	`/* ... */` `//` (only for newer C)	`//` (single line) `/* ... */` (block)	`%` (single line) (Block): `%{ ... %}`	`#` (single line) `"""` or `'''` is docstring which might be undersirably picked up
Reliable multi-line commenting (IDE)			Ctrl+(Shift)+`R`(Windows), `/` (Mac or Linux)	[Spyder]: Ctrl+`1`(toggle), `4`(comment), `5`(uncomment)
Code cell (IDE)			`%%`	[Spyder]: `# %%`
Line Continuation	`\`	`\`	`...`	`\`
Console Precision			`format`	`%precision` (IPython)
Clear variables			`clear` / `clearvars`	`%reset -sf` (IPython)

Macros only make sense in C/C++. This makes code less transparent and is frowned upon in higher level programming languages. Even its use in C++ should be limited. Use inline functions whenever possible.

Python is messy about the workspace, so if you just delete

Object Oriented Programming Constructs

Common	C++	MATLAB	Python
Getters Setters	No native syntax. Name mangle (prefix or suffix) yourself to manage	Define methods: `get.x` `set.x`	Getter: `@property def x(self): ...` Setter: `@x.setter def x(self, value): ...`
Deleters	Members can’t be changed on the fly	Members can’t be changed on the fly	Deleter (removing attributes dynamically by `del`)
Overloading (Dispatch function by signature)	Overloading	Overload only by first argument	`@overload` (Static type) `@singledispath @multipledispatch`
Initializing class variables	Initializer Lists Constructor	Constructor	Constructor
Constructor	`ClassName()` Does not return (`*this` is implicit)	`obj=ClassName(...)` MUST output the constructed object	`__init__(self, ...)` Object to be constructed is 1st argument
Destructor	`~ClassName()`	`delete()`	`__del__()`
Special methods	Special member functions	(no name) method that control specific behaviors	Magic/Dunder methods
Operator overloading	`operator`	operator methods to define	Dunder methods
Resource Self-cleanup	RIAA	`onCleanup()`: make a dummy object with cleanup operation as destructor to be removed when it goes out of scope	`with` Context Managers
Naming for the object itself	Class: (class’s own name by SRO `::`) Instance: `*this`	Class: (class’s own name) Instance: `obj` (or any output name defined in constructor)	Class: `cls` Instance: `self` (Recommended PEP8 names)

Python allows adding members (attributes) on the fly with setattr(), which includes methods. MATLAB’s dynamicprops allows adding properties (data members) on the fly with addprop

onCleanup() does not work reliably on Python because MATLAB’s object destructor time is deterministic (MATLAB specifically do not garbage collect user objects to avoid this mess. It only garbage collects PODs) while Python leaves it up to garbage collector.

*this is implicitly passed in C++ and not spelled out in the method declaration. The self object must be the first argument in the instance method’s signature/prototype for both MATLAB and Python.

Functional Programming Constructs

Common	C++	MATLAB	Python


Function as variable	Functors (Function Objects) `operator()`	Function Handle	Callables (Function Objects) `__call__()`
Lambda Syntax	Lambda `[capture](inputs) {expr} -> optional trailing return type`	Anonymous Function `@(inputs) expr`	Lambda `lambda inputs: expr`
Closure (Early binding): an instance of function objects	Capture `[]` only as necessary. Early binding `[=]` is capture all.	Early binding ONLY for anonymous functions (lambda). Late binding for function handles to loose or nested functions.	Late binding* by default, even for Lambdas. Can capture `Po` through default values `lambda x,P=Po: x+P` (We’re relying users to not enter the captured/optional input argument)

Concepts of Early/Late Binding also apply to non-lambda functions. It’s about when to access (usually read) the ‘global’ or broader scope (such as during nested functions) variables that gets recruited as a non-input variable that’s local to the function itself.

An instance of a function object is not a closure if there’s any parameter that’s late bound. All lambdas (anonymous functions) in MATLAB are early bound (at creation).

The more proper way (without creating an extra optional argument that’s not supposed to be used, aka defaults overridden) to convert late binding to early binding (by capturing variables) is called partial application, where you freeze the parameters (to be captured) by making them inputs to an outer layer function and return a function object (could be lambda) that uses these parameters.

The same trick (partial application) applies to bind (capture) variables in simple/nested function handles in MATLAB which do behave the same way (early binding) like anonymous functions (lambda).

Currying is partial application one parameter at a time, which is tedious way to stay faithful to pure functional programming.

List comprehension is a shorthand syntax for transform/map() and copy_if/remove_if/filter() in one shot, but not accumulate/reduce(). MATLAB and C/C++ does not have listcomp, but listcomp is not specific to Python. Even Powershell has it.

Listcomp syntax, if wrapped in round brackets like (x**x for x in range(5)), gives a generator. Wrapping in square bracket is the shortcut of casting the generator into a list, so [x**x for x in range(5)] is the same as list(x**x for x in range(5)).

Coroutines / Asynchronous Programming

MATLAB natively does not support coroutines.

Common	C++20	Python
Generators	Input Iterators	Functions that `yield value_to_spit_out_on_next` (Implicitly return a generator/functor with `iter` and `next`)
Coroutines		Functions that `value_accepted_from_outside = yield` Send value to the continuation by `g.send(user_input)` `async`/`await` (native coroutines)

Matrix Arrays

The way Numpy requires users to specify matrices with a bracket for every row drives me nuts. Not only there’s a lot of typing, the superfulous brackets reinforce C’s idea of row-major which is horrendous to people with a proper math background who see matrices as column-major $\mathbf{A}_{r,c}$ . Pytorch is the same.

Once you are trained in APL/MATLAB’s matrix world-view, you’ll discover going back to the world where matrices aren’t first class citizens is clumsy AF.

With Python, you lose the clutter free readability where your MATLAB code is one step away from the matrix equations in your scientific computing work, despite a lot of the features that addresses frequent use patterns are implemented earlier in Python than MATLAB.

Don’t believe those who haven’t lived and breathed MATLAB tell you Python is strictly superior. No it isn’t. They just didn’t know what they were missing as they haven’t made the intellectual leap in MATLAB yet. Python is very convenient as a swiss-army knife but scientific computing is an afterthought in Python’s language design.

The only way to use MATLAB-like semi-colon to change rows only works for np.matrix() type, which they plan to deprecate. For now one can cast matrix into array like np.array(np.matrix(matrix_string)).

Even numpy’s ndarray (or matrix to be deprecated) are CONCEPTUALLY equivalent to a matrix of cells in MATLAB. There isn’t native numerical matrices like in MATLAB that doesn’t have the overhead of unpacking arbitrary data types. You don’t want to do numerical matrices in MATLAB with cell matrices as it’s insanely slow.

You get away without the unpacking penalty in Numpy if all the contents of the ndarray happens to have the same dtype (such as numerical), aka known to be uniform. In other words, MATLAB’s matrices are uniform if it’s formed by [] and heterogeneous if formed by {}, while for Python [] is context-dependent, kept track of by dtype.

Concept	MATLAB	Numpy
Construction	`[8,9;6,4]`	`np.array([[8,9],[6,4]])`
Size by dimension	`size()`	`A.shape`
Concatenate within existing dimensions	`[A;B]` or `vertcat()` `[A,B]` or `horzcat()` `cat(dim, A, B, ...)`	`np.vstack()` `np.hstack()` `np.concatenate(list, dim)`
Concatenate expanding to 3D (expand in last dimension)	`cat(3, A, B, ...)`	`np.dstack()` ‘d’ for depth (3rd dimension)
Concatenate expanding dimensions	`cat(newdim, A, B, ...)` then `permute()`	`np.stack([A, ..], expand_at_axis)` `np.array([A, ..])` expands at first dimension as outermost bracket refers to first dimension
Tiling	`repmat()`	`np.tile()`
Fill with same value	`repmat()`	`np.full()`
Fill with ones/zeros	`ones(), zeros()`	`np.ones(), np.zeros()`
Fill minicking another array’s size	`repmat(x, size(B)) ones(x, size(B))` `zeros(x, size(B))`	`np.full_like(B, x)` `np.ones_like(B)` `np.zeros_like(B)`
Preallocate	Any of the above (Must be initialized)	`np.empty()` `np.empty_like()` UNINITIALIZED

repelem() is just repmat() with the repetition by axes vector expanded out as variable input arguments one per dimension. Using ones vector to broadcast a singleton instead of repmat() is horrendously inefficient and non-intuitive.

Heterogeneous Data Structures

Heterogeneous Data Structures are typically column major as it is a concept that derives from Structs of Arrays (SoA) and people typically expect columns to have the same data type from spreadsheets.

While Pandas offers a lot of useful features that I’ve easily implemented with wrappers in MATLAB, the indexing syntax of Pandas/Python is awkward and confusing. It’s due to the nature that matrix is a first-class citizen in MATLAB while it’s an afterthought in Python.

Python does not have the { } cell pack/unpack operator in MATLAB, so in Pandas, you select the Series object (think of it as a supercharged list with conveniences such as handling missing values and keeping track of row/column labels) then call its .values attribute.

However, Pandas is a lot more advanced than MATLAB in terms of using multiple columns as keys and have more tools to exploit multi-key row names (row names not mandatory in MATLAB but mandatory in Pandas). In the old days I had to write my own MATLAB function with unique(.., 'rows') exploit its index output to build unique keys under the hood.

Concept	MATLAB	Python (Pandas Dataframe)

Rows	Observations (`dataset()`) Row (`table()`)	Rows index
Columns	Variables	Columns
Select rows/columns	`T(rows, cols)`	`T.loc[r, col_name]` `T.iloc[r,c]` Caveats: – single index (not wrapped in list) have content extracted – `iloc` on LHS cannot expand table but `loc` can, but it can only inject 1 row – can get index number of names by `T.get_loc()` to use with `T.iloc[]`
Remove rows/columns	`T(rows, cols) = []`	`T.drop(index=rows, columns=cols)` Optionally: `inplace=True` `del T[rows, cols]` does NOT work
Extract one column	`T{:, c}`	`T[c].values`
Extract one entry	`T{r, c}`	`T.at[r,col_name]` `T.iat[r,c]` Faster than `loc/iloc`
Show first few rows	`T(1:5, :)`	`T.head()`
Drop duplicate rows	`unique(T, 'stable')`	`T.drop_duplicates()`


Ordinal	`categorical()` `ordinal()`	`Categorical()` `Index()`
Getting column names/labels	`T.Properties.VariableNames` (returns `cellstr()` only)	`T.columns` (returns `Index()` or `RangeIndex()`)
Getting row names/labels	`T.Properties.RowNames`	`T.index`
Transpose table	`rows2vars()`	`T.transpose()`

Move columns by name	`movevars()` since R2023a
Rename columns	`renamevars()` since R2020a	`T.rename(columns={source:target})`
Rename rows	Modify `T.Properties.RowNames`	`T.rename(index={source:target})`
Use column as row indices	`T.Properties.RowNames` = `T.cellstr_variablename` If multiple columns are needed, need to combine them into one column using some user rules	`T.set_index(column_to_use)` Dataframe allows multiple columns as row index keys
Reorder or partial selection	`T[rows, cols]`	`T.reindex(columns=..., index=...)` New labels will autofill by `NaN`
Select columns	`T[:, cols]`	`T[list_of_cols]`
Pick column by data type	T[:, `varfun(...)]`	`T.select_dtypes(include=[list of type names])`
Pick column by string match	T[:, `varfun(...)]`	`T.filter(like=str_to_match)`
Blindly concatenate columns of 2 tables	`[T1, T2]` If you defined optional rownames, they must match. You can delete it with `T.Properties.RowNames = {}`	Pandas assign row indices (labels) by default. Mismatched row labels do not combine in the same row. Consider `reset_index()` or overwrite the row indices of one table with another, like `pd.concat([T1, T2.set_index(T1.index)]`
Blindly concatenate rows of 2 tables	`[T1; T2]`	`pd.concat([T1, T2], ignore_index=True)`

Format export	`writetable()`	`.to_*()`

MATLAB tables does not support ranging through column names (such as 'apple':'grapes') yet Pandas DataFrame support it. I don’t think it’s fine to use it in the interpreter to poke around, but this is just asking for confusing logic bugs when the columns are moved around and the programmer has a false sense of security knowing exactly what’s where because they are using only names.

Dataframe is a little smarter than MATLAB’s table() in terms of managing column names and indices as it’s tracked with Index() type which is the same idea as MATLAB’s ordinal() ordered categorical type, where uniques names are mapped to unique indices and it’s the indices under the hood. This is how 'apple':'grapes' can work in Python but not MATLAB.

MATLAB T.Properties.VariableNames is a little clumsy. I usually implement a consistent interface called varnames() that’d output the same cellstr() headings whether it’s struct, dataset or table objects.

MATLAB’s table() by default do not make up row names. Pandas make up row names by default sequentially.

MATLAB table() do requires qualified string characters as variable names. Dataframe doesn’t care what labels you use as long as Index() takes it. It can get confusing because you can have a number 1 and ‘1’ as column headers at the same time and they look the same when displayed in the console.

”

Python Cheat Sheet

Posted on August 21, 2023 by admin

Built-in functions

breakpoint(): Python’s version of MATLAB’s keyboard() command
callable(): Like MATLAB’s isfunction() but it really checks if there’s a __call__ method
getattr()/hasattr(): MATLAB’s getfield()/isfield(). The 3rd parameter of getattr() is a shortcut to spit out a default if there’s no such field/attribute, which MATLAB doesn’t have
globals()/locals(): more convenient than MATLAB because the whole workspace (current variables) are accessed as a dictionary in Python by calling locals() and globals()
id(): memory address of the item where the variable (reference) is pointing to. Think of it as &x in C. MATLAB doesn’t do alias or pointers.
isinstance(): MATLAB’s isa()
next(): Python favors not actually computing the values until needed so instead it offers a generator (forward iterable) function that spits out one value at each time you kick it with next() and you can’t go back. MATLAB does not do iterators.
chr()/ord(): analogous to MATLAB’s char()/double() cast for characters
Python’s exponentiation is **, not ^ like most other languages (C does not have exponentiation symbol, and ^ was used for xor)
print(..., flush=false) allows a courtesy flush
repr(): MATLAB’s version of disp(), also overloadable standard interface
slice(): MATLAB’s equivalent of colon() special interface

Context Manager

@contextlib.contextmanager decorators basically splits a set-try-yield-finally boilerplate function into 3 parts: __enter__ (everything above yield), BODY (where the yield goes to) and __exit__ (everything below yield), since a with-as statement is a rigidly defined try-finally block, roughly like this:

with EXPR as f:
  BODY(using f)

__enter__: f=EVAL(EXPR)
try:
  # f isn't evaluated till yield  
  yield f  # Goes to BODY
finally:
  __exit__: cleanup(f)

… more to come as I have came across noteworthy cases.

What Python does NOT do, and what are the alternatives

Posted on August 21, 2023 by admin

RAII: Using stack unwind to manage program flow

Python uses garbage collectors, so onCleanup() might often work, but it’s not guaranteed to. So any code based on that should not be in production

Answer: Context Manager, a glorified try-catch (more specifically try-finally) block with a rigid structure. It’s a pain in the butt and not fun to deal with if you want to deviate from the native ContextManager that came with the resource opener

‘switch’-case is back as ‘match’-case (the advanced uses are different)

New Python finally supports it, ‘switch’ in C is called ‘match‘ in Python and there are many handy and intuitive syntax just like in MATLAB! Horray!

If you try to do anything fancy with mutables in the cases, be careful about the side effects!

Pass by Variable (Copy-on-Write) like MATLAB

Variables are by large references in Python. Everything including integers are some sort of classes (which are in turn dictionaries with special treatment to certain key names). The garbage collector scans for the last guy using that part of memory not referencing it anymore before cleaning it.

Python even have one ‘None‘ for the entire universe with a gazillion things going on pointing to the same memory address where None is stored (that’s why None is idoimatically checked by ‘is‘ keyword which checks the address for speed instead of ‘==‘ which actually verify the contents for speed). If you look up the reference count (see garbage collector) for commonly used numbers like 1s and 0s, there are thousands of ‘users’ of it!

With C++, it’s a mixture. Complex objects are usually passed by reference for performance reasons but simple structs and data types can be passed as variables (C/C++’s nomenclature calls the non-reference/pointers variables though technically references/pointers are just the same integers identified as addresses) that gets cloned and destroyed when they move across function (stack) boundaries.

In MATLAB, they want it industrial strength, so that’d rather not allow anything insidious/non-transparent to happen in your code by keeping it all pass-by-variable, that is everything is supposed to be treated as different copies as it crosses function boundaries. For performance reason, they figured if you passed a big matrix just for the function to read, MATLAB doesn’t really have to clone that so under the hood you can peak into the same matrix that really belonged to the caller through the input variable name. Once your function changes the contents (they are pretending that it’s a separate copy so of course you can), MATLAB painfully makes a whole copy of it (copy-on-write) which you then have to lug the 2nd (modified) copy around when it travels past function boundaries.

Python takes this idea a lot further by having anything that’s exactly the same (including None or string literals) to point to the same object until you ask to change the contents, then it makes a new copy for you to change and point to the new copy specifically for the variable name you are referencing with.

Answer: The way Python prevents variables passed as function arguments from getting modified is to separate variables into mutable (lists, dicts, sets), and immutable (tuples, frozendict is a package right now, frozenset, numerics, strings) types. Anything immutable going past the function boundary gets their own local copy.

Classes Boundaries

MATLAB and C++ has stringent access control, but not in Python. There’s not even const correctness. Just signal your intention with variable naming schemes like all caps and __ prefixes.

C++ does not separate helper functions and class (non-instance) methods. What C++ called static members are really just glorified free functions and global variables tucked under the namespace that happened to be in the class to be accessed by SRO (Scope Resolution Operator ::). Any function, regardless of their association, can access classwise/static members by SRO anywhere.

Python does separate these two concepts though. Class method in Python (decorated by @classmethod), on the other hand, are equivalent to C++’s static methods which they are not allowed to touch anything instance-specific, but they can access anything class-specific. Helper functions, which is called ‘Static Method‘ (decorated by @staticmethod) in Python cannot even touch anything specific to the class.

Variable arguments

C++ doesn’t generally do variable arguments because it defeats signature-based method overloading that uses a function signatures (which is a list of your argument types) to figure out which function to dispatch.

MATLAB uses cell to pack variable arguments. The common idioms are varargin{:} and [varargout{1:nargout}]. To accommodate variable arguments, MATLAB have to give up method overloading but they still have a little bit of it left: they do dispatch based on the first argument’s type and it’s very useful in avoiding a lot of stupid switching by detecting data types: just use a consistent function name interface and have each data type implement its own method with the same name.

In Python, there’s no such thing as multiple outputs (return variables) on Python: you output a list and it always gets unpacked (just like MATLAB’s deal() function) when you type a list out on left hand side. If the left hand side is a singleton, it will get the full list that’s still packed. If you write out the elements (which makes the left hand side a list), the returned list will have elements assigned to the left hand side depending on your syntax.

This is often a point of agony deciding on output format when I develop MATLAB code. Apparently TMW wondered the same thing too because their own factory code is all over the place on this too. Most of the time it’s not a good idea to have a context-dependent (depending on how many outputs the user supplied) even if you can technically do that by detecting nargout in MATLAB, like [varargout{1:nargout}] = f(...).

Answer: My recommendation is to make sure the simple, most common case got priority, and stuff the juicy side info into packaged data structures (such as array or cells/lists) and stick to a fixed output format whether you are in Python or MATLAB

Late Binding in Lambda / Anonymous Functions: Capture it!

This often throw people off in Python. MATLAB uses early binding, which means when you created that anonymous function (aka lambda), the free variables (parameters that are not running, aka the input arguments to the lambda/anonymous-function) captured the snapshot of the local workspace at the moment the lambda/anonymous-function was created!

Python on the other hand, uses the same approach as C++: late binding. This means you have to explicitly capture (make a snapshot copy) the free variables if you want to associate it with the values when the anonymous function (lambda) was created, not to wait until the lambda was actually called/used to look for what values to use in the free variables.

P = 612
# P was not captured, thus late binding
f = lambda x     : print{f'input/running:{x}, param/free:{P}')
# P was captured as p, thus early binding
g = lambda x, p=P: print{f'input/running:{x}, param/free:{p}')
P = 721
f(8964) # shows "input/running: 8964, param/free: 721"
g(8964) # shows "input/running: 8964, param/free: 612"

Useful Python (party?) tricks

Posted on August 21, 2023 by admin

This article is not for Pythonisms (like ContextManager), etc, the way things should normally be done in Python, but the more non-obvious way to solve problems or new features that are available specifically in Python.

Use / and * argument as separator for different forms of parameter entries!

There is special syntax to separate positional-only, positional/keyword, and keyword-only parameters.

Make a variant of existing class/object

This ninja technique is useful when you want to keep the object mostly the way it is but add/override a few things you think it didn’t do right without inheriting or use composition (hide a copy of object as a member) and write a proxy mirror for every member of it.

In C++, this situation is often used when you want to modify a concrete class that doesn’t have a virtual destructor, most notoriously STL which you are not supposed to inherit from (or else the client might pass a pointer to the parent/base so the child object’s destructors are not called as there are no vtables to keep track of which method to dispatch). In C++, this is often the few use cases that calls for private inheritance.

In Python, because everything is a recursive dict that specially named (magic) functions are recognized, there is a __getattr__ method that gets called whenever a member is accessed, which is the case when the member is called (in Python you simply get the functor as an attribute, aka value in the key-value pairs, in the dictionary and add brackets to call it). This means you can re-route what attributes (members) are returned simply by overloading __getattr__!

If you can overload __getattr__, it also mean you can redefine the member interface of your entire class! So a strategy to make a class have the same exact interface as another class is to hide a copy/reference to the underlying class and re-route the __getattr__ to the underlying object’s __getattr__! Here’s the gist of it:

class M:
    def __init__(self, underlying_class):
          self.__obj = underlying_class  
    def __getattr__(self, attr):
        return getattr(self.__obj, attr)

This can be improved a little bit. Just pick a member name that won’t clash with the underlying object’s attributes/members and simply return the ‘hidden’ object when specifically requested, not self.__obj.__obj.

class M:
    def __init__(self, underlying_class):
          self.__obj = underlying_class  
    def __getattr__(self, attr):
        # Prevent self-calling        
        if attr != '__obj':
            return getattr(self.__obj, attr)
        else:
            return self.__obj

MATLAB’s classes are hard-wired to your class definition .m file, not something you can update on the fly as you please. Any attempts to do so (aka breaking the safeguards) are Undocumented MATLAB territory where you mess with the Java under the hood and change the metadata property to fool the objects to do what you want.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Rambling Nerd with a Plan

Hoi Wong's blog

Category Archives: Programming

Librarize! Free variables/functions school of thought (as compared to OOP)

Background

Direct delivery (local variables) vs Sharing access (data members)

Compromise / Solution?

When to use classes

Do not abuse OOP to hide bad practices

1) Free functions for generic operations

Decouple, decouple, decouple

Overload free functions!

Program organization strategies

2) Classes are not excuses to hide unnecessary uses of global/statics

How I figured this out on my own based on my experience in MATLAB

Dictionary of equivalent/analogous concepts in programming languages

Data Types

Array Operations

Editor Syntax

Object Oriented Programming Constructs

Functional Programming Constructs

Coroutines / Asynchronous Programming

Matrix Arrays

Heterogeneous Data Structures

Python Cheat Sheet

Context Manager

What Python does NOT do, and what are the alternatives

RAII: Using stack unwind to manage program flow

‘switch’-case is back as ‘match’-case (the advanced uses are different)

Pass by Variable (Copy-on-Write) like MATLAB

Classes Boundaries

Variable arguments

Late Binding in Lambda / Anonymous Functions: Capture it!

Useful Python (party?) tricks

Use / and * argument as separator for different forms of parameter entries!

Make a variant of existing class/object