Dictionary of equivalent/analogous concepts in programming languages

Posted on February 19, 2025 by admin

Common	C	C++	MATLAB	Python
Variable arguments	`<stdarg.h>` `T f(...)` Packed in `va_arg`	Very BAD! Cannot overload when signatures are uncertain.	`varargin` `varargout` Both packed as cells. MATLAB does not have named arguments	`args` (simple, stored as tuples) `*kwargs` (specify input by keyword, stored as a dictionary)
Referencing	N/A	`operator[]`	`(_)` is for references `subsindex subsassgn` `[_]` is for concat `{_}` is for (un)pack	`__getitem__() __setitem__()`
Default values	N/A	Supported	Not supported. Manage with `inputParser()` or newer `arguments`	Non-intuitive static data behavior. Stick to `None` or immutables.
Name-Value Argument Matching			Old way: `.., 'PropName', Value` and parse `varargin` Since R2021a: `Name=Value` `options` in `arguments`	`Name=Value` `**kwargs`
Major Dimension	Row	Row	Column	Row (Native/Numpy) Column for Pandas
Constness	`const`	`const`	Only in classes	N/A (Consenting adults)
Variable Aliasing	Pointers	References	NO! Rely on Copy-on-write (No in-place functions*) Handle classes under limited circumstances	References
`=` assignment	Copy one element	Values: Copy References: Bind	New Copy Copy-on-write	NO VALUES Bind references only (could be to unnamed objects)
Chained access operators	N/A	Difficult to operator overload it right	Difficult to get it right. MATLAB had some chaining bugs with `dataset()` as well.	Chains correctly natively
Assignment expressions (assignment evaluates to assigned lvalue)	`=`	`=`	N/A	Named Expression `:=`
Version Management			`verLessThan()` `isMATLABReleaseOlderThan`	`virtenv` (Virtual Environment)
Exponentiation	`<math.h>` `pow()`	`<cmath>` `pow()`	`^`	`**`
Stream (Conveyor belt mechanism. Saves memory)	I/O (std, file, sockets)	`iterator` in STL containers	MATLAB doesn’t do references. Just increment indices.	iterators (uni-directional only) `iter(): __iter__()` `next(): __next__()`
Looping	for(init, cont_cond, next)	C-style for(auto running: iterable)	for k = array to iterate	list-comp for (index, thing) in enumerate(lists)

Since MATLAB doesn’t do references, iterators (by extension generators) and functions that do in-place operations do not make sense (unless you bend it very hard with anti-patterns such as handles and dbstack).

Data Types

Common	C	C++	MATLAB	Python
Sets	N/A	`std::set`	Only set operations, not set data type	`{ , , ...}`
Dictionaries		`std::unordered_map`	– Dynamic fieldnames (qualified varnames as keys) – `containers.Map()` or `dictionary()` since R2022b	Dictionaries `{key:value}` (Native)
Heterogeneous containers			cells `{}`	lists (mutable) tuples (immutable)
Structured Heterogeneous containers			`table()` `dataset()` [Old] Mix in classes	Pandas Dataframe
Array, Matrices & Tensors			Native `[ , ; , ]`	Numpy/PyTorch
Records	struct	class (members)	dynamic field (structs) properties (class) `getfield()/setfield()`	No structs (use dicts) attribute (class) `getattr()/setattr()`
Type deduction	N/A	`auto`	Native	Native
Type extraction	N/A	`decltype()` for compile time (static) `typeid()` for RTTI (runtime)	`class()`	`type()`
Categorical Arrays		`categorical()` Previously `ordinal()/nominal()`	`pd.cut(x, bins, labels)`

Native sets operations in Python are not stable and there’s no option to use stable algorithm like MATLAB does. Consider installing orderly-set package.

Array Operations

Common	MATLAB	Python
Repeat	`repmat()`	`[] * N` `np.repeat()`
Logical Indexing	Native	List comprehension Boolean Indexing (Numpy)
Equally spaced numbers	Internally `colon()`: `start:step:end` `linspace`/`logspace`	`range(begin, past_end, step)` produces an iterator `list(range())` or `tuple(range())` iterates to realize the vector
Equally spaced indexing	MATLAB has no generators, so produced vector only	`[start:past_end:step]` is internally `slice()` which produces a slice object, not range/lists/tuple. Faster but not iterable
Shallow copy	Deep copy-on-write	Slice: `x = y[:]` `copy.copy()`
Deep copy	Deep copy-on-write	`copy.deepcopy()`

Editor Syntax

Common	C	C++	MATLAB	Python
Commenting	`/* ... */` `//` (only for newer C)	`//` (single line) `/* ... */` (block)	`%` (single line) (Block): `%{ ... %}`	`#` (single line) `"""` or `'''` is docstring which might be undersirably picked up
Reliable multi-line commenting (IDE)			Ctrl+(Shift)+`R`(Windows), `/` (Mac or Linux)	[Spyder]: Ctrl+`1`(toggle), `4`(comment), `5`(uncomment)
Code cell (IDE)			`%%`	[Spyder]: `# %%`
Line Continuation	`\`	`\`	`...`	`\`
Console Precision			`format`	`%precision` (IPython)
Clear variables			`clear` / `clearvars`	`%reset -sf` (IPython)

Macros only make sense in C/C++. This makes code less transparent and is frowned upon in higher level programming languages. Even its use in C++ should be limited. Use inline functions whenever possible.

Python is messy about the workspace, so if you just delete

Object Oriented Programming Constructs

Common	C++	MATLAB	Python
Getters Setters	No native syntax. Name mangle (prefix or suffix) yourself to manage	Define methods: `get.x` `set.x`	Getter: `@property def x(self): ...` Setter: `@x.setter def x(self, value): ...`
Deleters	Members can’t be changed on the fly	Members can’t be changed on the fly	Deleter (removing attributes dynamically by `del`)
Overloading (Dispatch function by signature)	Overloading	Overload only by first argument	`@overload` (Static type) `@singledispath @multipledispatch`
Initializing class variables	Initializer Lists Constructor	Constructor	Constructor
Constructor	`ClassName()` Does not return (`*this` is implicit)	`obj=ClassName(...)` MUST output the constructed object	`__init__(self, ...)` Object to be constructed is 1st argument
Destructor	`~ClassName()`	`delete()`	`__del__()`
Special methods	Special member functions	(no name) method that control specific behaviors	Magic/Dunder methods
Operator overloading	`operator`	operator methods to define	Dunder methods
Resource Self-cleanup	RIAA	`onCleanup()`: make a dummy object with cleanup operation as destructor to be removed when it goes out of scope	`with` Context Managers
Naming for the object itself	Class: (class’s own name by SRO `::`) Instance: `*this`	Class: (class’s own name) Instance: `obj` (or any output name defined in constructor)	Class: `cls` Instance: `self` (Recommended PEP8 names)

Python allows adding members (attributes) on the fly with setattr(), which includes methods. MATLAB’s dynamicprops allows adding properties (data members) on the fly with addprop

onCleanup() does not work reliably on Python because MATLAB’s object destructor time is deterministic (MATLAB specifically do not garbage collect user objects to avoid this mess. It only garbage collects PODs) while Python leaves it up to garbage collector.

*this is implicitly passed in C++ and not spelled out in the method declaration. The self object must be the first argument in the instance method’s signature/prototype for both MATLAB and Python.

Functional Programming Constructs

Common	C++	MATLAB	Python


Function as variable	Functors (Function Objects) `operator()`	Function Handle	Callables (Function Objects) `__call__()`
Lambda Syntax	Lambda `[capture](inputs) {expr} -> optional trailing return type`	Anonymous Function `@(inputs) expr`	Lambda `lambda inputs: expr`
Closure (Early binding): an instance of function objects	Capture `[]` only as necessary. Early binding `[=]` is capture all.	Early binding ONLY for anonymous functions (lambda). Late binding for function handles to loose or nested functions.	Late binding* by default, even for Lambdas. Can capture `Po` through default values `lambda x,P=Po: x+P` (We’re relying users to not enter the captured/optional input argument)

Concepts of Early/Late Binding also apply to non-lambda functions. It’s about when to access (usually read) the ‘global’ or broader scope (such as during nested functions) variables that gets recruited as a non-input variable that’s local to the function itself.

An instance of a function object is not a closure if there’s any parameter that’s late bound. All lambdas (anonymous functions) in MATLAB are early bound (at creation).

The more proper way (without creating an extra optional argument that’s not supposed to be used, aka defaults overridden) to convert late binding to early binding (by capturing variables) is called partial application, where you freeze the parameters (to be captured) by making them inputs to an outer layer function and return a function object (could be lambda) that uses these parameters.

The same trick (partial application) applies to bind (capture) variables in simple/nested function handles in MATLAB which do behave the same way (early binding) like anonymous functions (lambda).

Currying is partial application one parameter at a time, which is tedious way to stay faithful to pure functional programming.

List comprehension is a shorthand syntax for transform/map() and copy_if/remove_if/filter() in one shot, but not accumulate/reduce(). MATLAB and C/C++ does not have listcomp, but listcomp is not specific to Python. Even Powershell has it.

Listcomp syntax, if wrapped in round brackets like (x**x for x in range(5)), gives a generator. Wrapping in square bracket is the shortcut of casting the generator into a list, so [x**x for x in range(5)] is the same as list(x**x for x in range(5)).

Coroutines / Asynchronous Programming

MATLAB natively does not support coroutines.

Common	C++20	Python
Generators	Input Iterators	Functions that `yield value_to_spit_out_on_next` (Implicitly return a generator/functor with `iter` and `next`)
Coroutines		Functions that `value_accepted_from_outside = yield` Send value to the continuation by `g.send(user_input)` `async`/`await` (native coroutines)

Matrix Arrays

The way Numpy requires users to specify matrices with a bracket for every row drives me nuts. Not only there’s a lot of typing, the superfulous brackets reinforce C’s idea of row-major which is horrendous to people with a proper math background who see matrices as column-major $\mathbf{A}_{r,c}$ . Pytorch is the same.

Once you are trained in APL/MATLAB’s matrix world-view, you’ll discover going back to the world where matrices aren’t first class citizens is clumsy AF.

With Python, you lose the clutter free readability where your MATLAB code is one step away from the matrix equations in your scientific computing work, despite a lot of the features that addresses frequent use patterns are implemented earlier in Python than MATLAB.

Don’t believe those who haven’t lived and breathed MATLAB tell you Python is strictly superior. No it isn’t. They just didn’t know what they were missing as they haven’t made the intellectual leap in MATLAB yet. Python is very convenient as a swiss-army knife but scientific computing is an afterthought in Python’s language design.

The only way to use MATLAB-like semi-colon to change rows only works for np.matrix() type, which they plan to deprecate. For now one can cast matrix into array like np.array(np.matrix(matrix_string)).

Even numpy’s ndarray (or matrix to be deprecated) are CONCEPTUALLY equivalent to a matrix of cells in MATLAB. There isn’t native numerical matrices like in MATLAB that doesn’t have the overhead of unpacking arbitrary data types. You don’t want to do numerical matrices in MATLAB with cell matrices as it’s insanely slow.

You get away without the unpacking penalty in Numpy if all the contents of the ndarray happens to have the same dtype (such as numerical), aka known to be uniform. In other words, MATLAB’s matrices are uniform if it’s formed by [] and heterogeneous if formed by {}, while for Python [] is context-dependent, kept track of by dtype.

Concept	MATLAB	Numpy
Construction	`[8,9;6,4]`	`np.array([[8,9],[6,4]])`
Size by dimension	`size()`	`A.shape`
Concatenate within existing dimensions	`[A;B]` or `vertcat()` `[A,B]` or `horzcat()` `cat(dim, A, B, ...)`	`np.vstack()` `np.hstack()` `np.concatenate(list, dim)`
Concatenate expanding to 3D (expand in last dimension)	`cat(3, A, B, ...)`	`np.dstack()` ‘d’ for depth (3rd dimension)
Concatenate expanding dimensions	`cat(newdim, A, B, ...)` then `permute()`	`np.stack([A, ..], expand_at_axis)` `np.array([A, ..])` expands at first dimension as outermost bracket refers to first dimension
Tiling	`repmat()`	`np.tile()`
Fill with same value	`repmat()`	`np.full()`
Fill with ones/zeros	`ones(), zeros()`	`np.ones(), np.zeros()`
Fill minicking another array’s size	`repmat(x, size(B)) ones(x, size(B))` `zeros(x, size(B))`	`np.full_like(B, x)` `np.ones_like(B)` `np.zeros_like(B)`
Preallocate	Any of the above (Must be initialized)	`np.empty()` `np.empty_like()` UNINITIALIZED

repelem() is just repmat() with the repetition by axes vector expanded out as variable input arguments one per dimension. Using ones vector to broadcast a singleton instead of repmat() is horrendously inefficient and non-intuitive.

Heterogeneous Data Structures

Heterogeneous Data Structures are typically column major as it is a concept that derives from Structs of Arrays (SoA) and people typically expect columns to have the same data type from spreadsheets.

While Pandas offers a lot of useful features that I’ve easily implemented with wrappers in MATLAB, the indexing syntax of Pandas/Python is awkward and confusing. It’s due to the nature that matrix is a first-class citizen in MATLAB while it’s an afterthought in Python.

Python does not have the { } cell pack/unpack operator in MATLAB, so in Pandas, you select the Series object (think of it as a supercharged list with conveniences such as handling missing values and keeping track of row/column labels) then call its .values attribute.

However, Pandas is a lot more advanced than MATLAB in terms of using multiple columns as keys and have more tools to exploit multi-key row names (row names not mandatory in MATLAB but mandatory in Pandas). In the old days I had to write my own MATLAB function with unique(.., 'rows') exploit its index output to build unique keys under the hood.

Concept	MATLAB	Python (Pandas Dataframe)

Rows	Observations (`dataset()`) Row (`table()`)	Rows index
Columns	Variables	Columns
Select rows/columns	`T(rows, cols)`	`T.loc[r, col_name]` `T.iloc[r,c]` Caveats: – single index (not wrapped in list) have content extracted – `iloc` on LHS cannot expand table but `loc` can, but it can only inject 1 row – can get index number of names by `T.get_loc()` to use with `T.iloc[]`
Remove rows/columns	`T(rows, cols) = []`	`T.drop(index=rows, columns=cols)` Optionally: `inplace=True` `del T[rows, cols]` does NOT work
Extract one column	`T{:, c}`	`T[c].values`
Extract one entry	`T{r, c}`	`T.at[r,col_name]` `T.iat[r,c]` Faster than `loc/iloc`
Show first few rows	`T(1:5, :)`	`T.head()`
Drop duplicate rows	`unique(T, 'stable')`	`T.drop_duplicates()`


Ordinal	`categorical()` `ordinal()`	`Categorical()` `Index()`
Getting column names/labels	`T.Properties.VariableNames` (returns `cellstr()` only)	`T.columns` (returns `Index()` or `RangeIndex()`)
Getting row names/labels	`T.Properties.RowNames`	`T.index`
Transpose table	`rows2vars()`	`T.transpose()`

Move columns by name	`movevars()` since R2023a
Rename columns	`renamevars()` since R2020a	`T.rename(columns={source:target})`
Rename rows	Modify `T.Properties.RowNames`	`T.rename(index={source:target})`
Use column as row indices	`T.Properties.RowNames` = `T.cellstr_variablename` If multiple columns are needed, need to combine them into one column using some user rules	`T.set_index(column_to_use)` Dataframe allows multiple columns as row index keys
Reorder or partial selection	`T[rows, cols]`	`T.reindex(columns=..., index=...)` New labels will autofill by `NaN`
Select columns	`T[:, cols]`	`T[list_of_cols]`
Pick column by data type	T[:, `varfun(...)]`	`T.select_dtypes(include=[list of type names])`
Pick column by string match	T[:, `varfun(...)]`	`T.filter(like=str_to_match)`
Blindly concatenate columns of 2 tables	`[T1, T2]` If you defined optional rownames, they must match. You can delete it with `T.Properties.RowNames = {}`	Pandas assign row indices (labels) by default. Mismatched row labels do not combine in the same row. Consider `reset_index()` or overwrite the row indices of one table with another, like `pd.concat([T1, T2.set_index(T1.index)]`
Blindly concatenate rows of 2 tables	`[T1; T2]`	`pd.concat([T1, T2], ignore_index=True)`

Format export	`writetable()`	`.to_*()`

MATLAB tables does not support ranging through column names (such as 'apple':'grapes') yet Pandas DataFrame support it. I don’t think it’s fine to use it in the interpreter to poke around, but this is just asking for confusing logic bugs when the columns are moved around and the programmer has a false sense of security knowing exactly what’s where because they are using only names.

Dataframe is a little smarter than MATLAB’s table() in terms of managing column names and indices as it’s tracked with Index() type which is the same idea as MATLAB’s ordinal() ordered categorical type, where uniques names are mapped to unique indices and it’s the indices under the hood. This is how 'apple':'grapes' can work in Python but not MATLAB.

MATLAB T.Properties.VariableNames is a little clumsy. I usually implement a consistent interface called varnames() that’d output the same cellstr() headings whether it’s struct, dataset or table objects.

MATLAB’s table() by default do not make up row names. Pandas make up row names by default sequentially.

MATLAB table() do requires qualified string characters as variable names. Dataframe doesn’t care what labels you use as long as Index() takes it. It can get confusing because you can have a number 1 and ‘1’ as column headers at the same time and they look the same when displayed in the console.

”

Programming Techniques: Bit hackery

Posted on August 29, 2023 by admin

Sean Eron Anderson (Stanford CS graphics lab)’s bit twiddling pages often shows a bunch of neat bit tricks, but it’s more like a recipe book than a unified way to summarize the common concepts behind them. Here’s my attempt. This page will get updated as I got the time and more useful insights collected.

Concept: Two ways to get two’s complement

This is the basics of most bit hacks below. Sometimes the definition itself is a bit trick on its own.

\begin{align} \overline{-x}+1=x \\ -x = \overline{x}+1 \end{align}

the above reads: to flip sign, flip bits then add one.

\begin{align} \overline{x} = -x-1 \\ \end{align}

the above reads: if you flip the bits, you are getting the negative of it subtracted by 1. e.g. ~4 = -5, ~5 = -6, …, ~(-5) = 4, ~(-4) = 3, …

\begin{align} x = \overline{-x-1} \\ \end{align}

the above reads: any number can be represented by its negative minus -1, then bit-flipped.

\begin{align} \overline{-x} = x-1 \\ -x = \overline{x-1} \end{align}

reads: to flip sign, subtract one first then flip bits

So it means to change signs, you can choose to subtract one first then flip bits or flip bits first then add one.

-x=\overline{x-1} = \overline{x}+1

Let’s try it with 2 instead of 1:

\overline{x-2} = \overline{(x-1)-1} = \overline{x-1}+1 = (\overline{x}+1)+1 = \overline{x}+2

You can generalize it to an arbitrary number by subtracting -1 more under the bar on the left hand side and you will get +1 more on the right hand side. Every extra -1 under the bar (bit flips) shows up as +1 outside the bar (bit flips).

\overline{x-3} = \overline{(x-2)-1} = \overline{x-2}+1 = (\overline{x}+2)+1 = \overline{x}+3

This matches the observation that complement schemes (one’s or two’s) both have increasing magnitude move in opposite directions for positive and negative numbers. Look at this table:

Unsigned	Binary	Two’s Complement
7	111	-1
6	110	-2
5	101	-3
4	100	-4
3	011	3
2	010	2
1	001	1
0	000	0

A very important observation that’d be used over and over blow is that in two’s complement, -1 is always a mask of all binary ‘1’s regardless of the word width.

This rule can also read as: magnitude offsets goes in opposite directions

\begin{align} -x+(n-1)=\overline{x-n} = \overline{x}+n \end{align}

Note that this is NOT distributing bit-flip to two addition/subtraction despite it resembles it with an important distinction that the sign of n changed without turning into (n-1). If it were to distribute, you’ll get (n-2) on the left-hand-side instead of the (n-1) term because the -1 would have been counted twice under distribution.

Bit flips simply doesn’t distribute over the 4 basic (algebraic field) operations. The two’s complement offset is done once and only once when you change the overall representation no matter how many components you break it down into. It’s merely done to shift over the -0 in one’s complement so there’s an extra space for an extra negative number $-2^n$$ which its positive counterpart $+2^n$$ is not representable without starting a new digit.

Note to self: the INT_MIN is just the sign bit of ‘1’ followed by all zeros after.

Concept: XOR can be used for bit flips or check for bit changes

Concept: Top bit holds the sign

Sounds simple, but if you keep in mind that (x<0) is really asking to see if the top bit is 1, you can check if two numbers has opposite signs without bit shifting it down by simply XOR-ing them (anything below the top bit are ignored) and use (x^y)<0 to check for the resulting top bit is 1, which signals that the sign bits are different.

Concept: Sign extensions (the top/sign bit gets drag-copied when right shifted)

When you right shift (in signed integers), the top (sign) bit gets drag-copied (sign extended) by the number of bits you right shifted. (Obviously for signed integers, right shifts are zero-filled)

Can exploit this to

drag the top (sign) bit all the way down to the bottom (so you either get all 1s or 0s) to provide a conditional mask based on the sign (see below)

1??????? \gg 7 \textnormal{ (i.e. type bit width - 1)} = 11111111_2 = -1_{10} \\ 0??????? \gg 7 \textnormal{ (i.e. type bit width - 1)} = 00000000_2 = +0_{10} \\

Signed extensions also means a negative number will stay negative and a positive number will stay positive if you right shift

Sign extension behavior is not guaranteed by 1987 ANSI C, but it’s standard on pretty much anything more modern than that. Just make sure anything that uses this behavior are inlined (so the implementation can be easily swapped out), well documented/commented, and platform checks/switches are in place, and there’s a way to quickly check with the slower but platform independent implementation.

Concept: Getting a bit mask of a 1s (if true) and all 0s (if false)

The ability to convert a logic evaluation (condition) that gives

\begin{align*} 00000001_2 &= +1_{10} & \mathrm{(true)}\\ 00000000_2 &= +0_{10} & \mathrm{(false)} \end{align*}

into a conditional mask that gives

\begin{align*} 11111111_2 &= -1_{10} & \mathrm{(true)}\\ 00000000_2 &= +0_{10} & \mathrm{(false)} \end{align*}

is the basis of many branchless ‘drop/keep this if that’ operations.

This can also be achieved by

putting a minus sign in front, such as -(cond) that will convert a (+1, 0) into (-1, 0), or
more efficiently exploiting sign extensions by dragging the top bit to the bottom (by right shifting by the type’s bit length-1)
computing absolute using the two’s complement’s definition of flip all bits and add 1: drag out a mask that shows that sign, which happens to be a do nothing if all 0s and flip all bits if all 1s in an xor, while the mask of all 1s, which is -1, when subtracted, becomes +1 needed to finish the two’s complement (and that’s subtract by 0 for already positive value).

Concept: $2^n - 1$ sets all binary digits below it to a stream of 1s

When you count binary numbers up, you must exhaust all the lower digits by filling them with all 1s before you get to advance to (set) a new digit on the left of them. For example,

\begin{align*} 1000_2 &= +8_{10}\\ 0111_2 &= +7_{10} \end{align*}

This can be exploited to create bit-masks that preserves all digits on the left of the first ‘1’ seen from the right (LSB), ‘0’ at that lowest (LSB) set bit (aka ‘1’), and all ‘1’s below it.

\begin{align*} 0110,1000_2 &= +104_{10}\\ 0110,0111_2 &= +103_{10} \end{align*}

Binary digits are are the (1 or 0) coefficients of a linear combination of powers of 2. Having a loner ‘1’ (aka everything else is 0) means the number is a power of 2.

Being the lone ‘1’ bit in the number means every bit above it must be zero. Any ‘1’s above the right-most ‘1’ means it’s not the loner, hence not a power of 2.

If you subtract 1 from the power-of-2 number, only all bits below (not including) the line ‘1’ bit becomes 1, and that ‘1’ bit position become zero, and as mentioned before, all bits above it are 0s by definition since the ‘1’ we are working on is a loner.

Since the digits in $2^n$ and $2^n - 1$ are mutually exclusive (see example below)

\begin{align*} 0000,1000_2 &= +8_{10} = 2^3 \\ 0000,0111_2 &= +7_{10} = 2^3 -1 \end{align*}

we can be sure that if we AND them we must get 0 (because one of them has to be zero) and if we XOR them we must get 1. But which one to use?

\begin{align*} 0000,1000_2 &= +8_{10} &=& 2^3 \\ 0000,0111_2 &= +7_{10} &=& 2^3 -1 \\ 0000,1111_2 &= +15_{10} &=& 2^3 \textnormal{ or } (2^3 -1)\\ 0000,1111_2 &= +15_{10} &=& 2^3 \textnormal{ xor } (2^3 -1)\\ 0000,0000_2 &= +0_{10} &=& 2^3 \textnormal{ and } (2^3 -1)\\ \end{align*}

Let’s also check for the non-power of two case

\begin{align*} 0110,1000_2 &= +104_{10}\\ 0110,0111_2 &= +103_{10}\\ 0110,1111_2 &= +111_{10} &=& 104_{10} \textnormal{ or } 103_{10}\\ 0000,1111_2 &= +15_{10} &=& 104_{10} \textnormal{ xor } 103_{10}\\ 0110,0000_2 &= +96_{10} &=& 104_{10} \textnormal{ and } 103_{10}\\ \end{align*}

The xor approach does not work because the upper bits are invariant, so we cannot detect the presence of the upper set bits (upper ‘1’s). It unconditionally gives the same bit pattern (mask) marking the lowest first set bit and everything below it 1s and 0s for everything else above it. Which can be exploited to simplify counting the consecutive trailing zeros (from the right) by turning it into counting the contiguous 1s in this invariant pattern, or add 1 to it and binary search the position of the set bit and subtract 1 because the said bit was made into the invariant (xor) pattern as well so +1 move onto the next upper binary digit.

The or approach detects the presence of the upper set bits but it’s a pain to mask out the invariant lower 1s, which curiously you can do by XOR-ing with the invariant pattern generated by $2^n \textnormal{ xor } 2^n-1$ or you can do AND-NOT-ing

\begin{align*} 0110,1000_2 &= +104_{10}\\ 0110,0111_2 &= +103_{10}\\ 0110,1111_2 &= +111_{10} &=& 104_{10} \textnormal{ or } 103_{10}\\ 0000,1111_2 &= +15_{10} &=& 104_{10} \textnormal{ xor } 103_{10}\\ 1111,0000_2 &= +240_{10} &=& \overline{104_{10} \textnormal{ xor } 103_{10}}\\ 0110,0000_2 &= +96_{10} &=& (104_{10} \textnormal{ or } 103_{10}) \textnormal{ and } \overline{(104_{10} \textnormal{ xor } 103_{10})}\\ 0110,0000_2 &= +96_{10} &=& (104_{10} \textnormal{ or } 103_{10}) \textnormal{ xor } (104_{10} \textnormal{ xor } 103_{10})\\ 0110,0000_2 &= +96_{10} &=& 104_{10} \textnormal{ and } 103_{10}\\ \end{align*}

Which and happens to already does the job by keeping the top bits (which non-zero value detects their presence) yet unconditionally clear the lowest set bit and everything below it.

The gut of is x & (x-1) maneuver is that it clears the bit from the lowest set bit and everything down below

clearLowestSetBitAndEverythingBelow(x): x & (x-1)

This is used by Brian W. Kernighan to count number of set bits by knocking them one off at a time starting from below. Of course the worst case scenario is when the 1s are so dense that the algorithm must go through every bit without jumping past the zeros.

So the solution is

isPowerOfTwo(x): clearLowestSetBitAndEverythingBelow(x)==0
isPowerOfTwo(x): (x & (x-1))==0

However a special case escaped us, which is x=0. 0x0000 & 0xFFFF is 0, but 0 isn’t a power of 2 unless you consider the minus infinity power which is the territory of floating point anyway. This can be easily patched by making the result unconditionally false if x=0 in the first place.

isPowerOfTwo(x): x && ((x & (x-1))==0)

Note the logical && which means x is first tested for its non-zeroness (by boiling any non-zero value down to +1) and it also enables the efficient short-circuit evaluation which if x is false, which means the result is unconditionally false under &&, the rest are irrelevant so it’s not evaluated.

Concept: Look up table

This is unconditionally the O(1) way because you have a mask of every bit in the type ready and you could index by it. However the penalty is that a load operation could be expensive if not everything can fit in the register file.

GUI Paradigms (1): Redux (Flutter/React) translated to MATLAB

Posted on November 22, 2021 by admin

For GUI development, we often start with controls (or widgets) that user interact with and it’ll emit/run the callback function we registered for the event when the event happens.

Most of the time we just want to read the state/properties of certain controls, process them, and update other controls to show the result. Model-View-Controller (MVC) puts strict boundaries between interaction, data processing and display.

The most common schematic for MVC is a circle showing the cycle of Controller->Model->View, but in practice, it’s the controller that’s the brains. The view can simultaneously accept user interactions, such as a editable text box or a list. The model usually don’t directly update the view directly on its own like the idealized diagram.

MVC with User Action — From https://www.educative.io/blog/mvc-tutorial

With MVC, basically we are concentrating the control’s callbacks to the controller object instead of just letting each control’s callback interact with the data store (model) and view in an unstructured way.

When learning Flutter, I was exposed to the Redux pattern (which came from React). Because the tutorials was designed around the language features of Dart, the documentation kind of obscured the essence of the idea (why do we want to do this) as it dwelt on the framework (structure can be refactored into a package). The docs talked a lot about boundaries but wasn’t clearly why they have to be meticulously observed, which I’ll get to later.

The core inspiration in Redux/BLoC is taking advantage of the concept of ‘listening to a data object for changes’ (instead of UI controls/widget events)!

Instead of having the UI control’s callback directly change other UI control’s state (e.g. for display), we design a state vector/dictionary/struct/class that holds contents (state variables) that we care. It doesn’t have to map 1-1 to input events or 1-1 to output display controls.

When an user interaction (input) event emitted a callback, the control’s callback do whatever it needs to produce the value(s) for the relevant state variable(s) and change the state vector. The changed state vector will trigger the listener that scans for what needs to be updated to reflect the new state and change the states of the appropriate view UI controls.

This way the input UI controls’ callbacks do not have to micromanage what output UI controls to update, so it can focus on the business logic that generates the content pool that will be picked up by the view UI controls to display the results. In Redux, you are free to design your state variables to match more closely to the input data from UI controls or output/view controls’ state. I personally prefer a state vector design that is closer to the output view than input controls.

The intuition above is not the complete/exact Redux, especially with Dart/Flutter/React. We also have to to keep the state in ONE place and make the order of state changes (thus behavior) predictable!

Actions and reducers are separate. Every input control fires a event (action signal) and we’ll wait until the reducers (registered to the actions) to pick it up during dispatch() instead of jumping on it. This way there’s only ONE place that can change states. Leave all the side effects in the control callback where you generate the action. No side effects (like changing other controls) allowed in reducers!
Reducers do not update the state in place (it’s read only). Always generate a new state vector to replace the old one (for performance, we’ll replace the state vector if we verified the contents actually changed). This will make timing predictable (you are stepping through state changes one by one)

In Javascript, there isn’t really a listener actively listening state variable changes. Dispatch (which will be called every time the user interacts using control) just runs through all the listeners registered at the very end after it has dispatched all the reducers. In MATLAB, you can optionally set the state vector to be Observable and attach the change listener callback instead of explicitly calling it within dispatch.

https://gist.github.com/gaearon/ffd88b0e4f00b22c3159

Here is an example of a MATLAB class that captures the spirit of Redux. I added a 2 second delay to emulate long operations and used enableDisableFig() to avoid dealing with queuing user interactions while it’s going through a long operation.

classdef ReduxStoreDemo < handle

    % Should be made private later
    properties (SetAccess = private, SetObservable)
        state % {count}
    end
        
    methods (Static)
        % Made static so reducer cannot call dispatch and indirectly do
        % side effect or create loops
        function state = reducer(state, action)        
            % Can use str2fun(action) here or use a function map
            switch action
                case 'increment'
                    fprintf('Wait 2 secs before incrementing\n');
                    pause(2)
                    state.count = state.count + 1;
                    fprintf('Incremented\n');
            end                
        end        
    end
    
    % We keep all the side-effect generating operations (such as
    % temporarily changing states in the GUI) in dispatch() so
    % there's only ONE PLACE where state can change
    methods
        function dispatch(obj, action, src, evt)
            % Disable all figures during an interaction
            figures = findobj(groot, 'type', 'figure');
            old_fig_states = arrayfun(@(f) enableDisableFig(f, 'off'), figures);      
            src.String = 'Wait ...';
            
                new_state = ReduxStoreDemo.reducer(obj.state, action);
                
                % Don't waste cycles updating nops 
                if( ~isequal(new_state, obj.state) )
                    % MATLAB already have listeners attached.
                    % So no need to scan listeners like React Redux            
                    obj.state = new_state;
                end                
                
            % Re-enable figure obj.controls after it's done
            arrayfun(@(f, os) enableDisableFig(f, os), figures, old_fig_states);                        
            src.String = 'Increment';
        end
    end
    
    methods
        function obj = ReduxStoreDemo()
            figure();            
            obj.state.count = 0;                        
            
            h_1x  = uicontrol('style', 'text', 'String', '1x Box', ...
                              'Units', 'Normalized', ...
                              'Position', [0.1 0.3, 0.2, 0.1], ...
                              'HorizontalAlignment', 'left');                          
            addlistener(obj, 'state', ...
                        'PostSet', @(varargin) obj.update_count_1x( h_1x , varargin{:})); 
                        
            uicontrol('style', 'pushbutton', 'String', 'Increment', ...
                      'Units', 'Normalized', ...
                      'Position', [0.1 0.1, 0.15, 0.1], ...
                      'Callback', @(varargin) obj.dispatch('increment', varargin{:}));                             
            
            % Force trigger the listeners to reflect the initial state
            obj.state = obj.state;
        end
    end
    
    %% These are 'renders' registered when the uiobj.controls are created
    % Should stick to reading off the state. Do not call dispatch here
    % (just leave it for the next action to pick up the consequentials)
    methods
        % The (src, event) is useless for listeners because it's not the 
        % uicontrol handle but the state property's metainfo (access modifiers, etc)
        function update_count_1x(obj, hObj, varargin)            
            hObj.String = num2str(obj.state.count);
        end        
    end
    
end

Data Relationships of Spreadsheets: Relational Database vs. Heterogenous Data Tables

Posted on October 23, 2021 by admin

This blog post is development in process. Will fill in the details missing details (especially pandas) later. Some of the MATLAB syntax are inaccurate in the sense that it’s just a description that is context dependent (such as column names can be cellstr, char string or linear/logical indices).

From data relationship point of view, relation database (RDMBS), heterogenous data tables (MATLAB’s dataset/table or Python Panda’s Dataframe) are the same thing. But a proper database have to worry about concurrency issues and provide more consistency tools (ACID model).

Heterogenous data tables are almost always column-oriented database (mainly for analyzing data) where MySQL and Postgres are row-store database. You can think of column-store database as Struct of Arrays (SoA) and row-store database as Array of Struct (AoS). Remember locality = performance: in general, you want to put the stuff you frequently want to access together as close to each other as possible.

Mechanics:

Concepts	SQL	MATLAB table	Pandas Dataframe

tables	FROM	(work with T)	(work with df)
columns variables fields	SELECT	`T.(field)` `T(:, cols/varnames)`
rows records	WHERE HAVING	`T( cond(T), : )` `T_grp( cond(T_grp), : )`
conditions	NOT IS IN BETWEEN	`~` `==, isequal*()` `ismember()` `a<=b & b<=c`
Inject table to another table	INSERT INTO t2 SELECT vars FROM t1 WHERE rows	`T2(end+(1:#rows), vars) = T1(rows, vars)` (Doable, throws warning)
Insert record/row	INSERT INTO t (c1, c2, ..) VALUES (v1, v2, ..)	`T=[T; {v1, v2, ...}]` (Cannot default for unspecified column*)
update records/elements	UPDATE table SET column = content WHERE row_cond	`T.(col)(row_cond) = content`
New table from selection	SELECT vars INTO t2 FROM t1 WHERE rows	`T2 = T1(rows, vars)`
clear table	TRUNCATE TABLE t	`T( :, : )=[]`
delete rows	DELETE FROM t WHERE cond (if WHERE is not specified, it kills all rows one by one with consistency checks. Avoid it and use TRUNCATE TABLE instead)	`T( cond, : ) = []`

* I developed sophisticated tools to allow partial row insertion, but it’s not something TMW supports right out of the box. This involves overloading the default value generator for each data type then extract the skeleton T( [], : ) to identify the data types.

Core database concepts:

Concepts	SQL	MATLAB (table/dataset)	Pandas (Dataframe)
linear index	CREATE INDEX idx ON T (col)	`T.idx = (1:size(T,1))'`
group index	CREATE UNIQUE INDEX idx ON T (cols)	`[~, T.idx] = sortrows(T, cols)` (old implementation is `grp2idx()`)
set operations	UNION INTERSET	union() intersect() setdiff(), setxor()
sort	ORDER BY	sortrows()
unique	SELECT DISTINCT	unique()
reduction aggregration	F()	@reductionFunctions
grouping	GROUP BY	Specifying ‘GroupingVariables’ in varfun(), rowfun(), etc.
partitioning	(set partition option in Table Definition)	`T1=T(:, {'key', varnames_1})` `T2=T(:, {'key', varnames_2})`
joins	[type] JOIN	*`join(T1, T2, ...)`	df.join(df2, …)
cartesian product	CROSS JOIN (misnomer, no keys)	`T_cross = [repelem(T1, size(T2,1), 1), repmat(T2, [size(T1,1), 1])]`

Function programming concepts map (linear index), filter (logical index), reduce (summary & group) are heavily used with databases

Formal databases has a Table Definition (Column Properties) that must be specified ahead of time and can be updated in-place later on (think of it as static typing). Heterogenous Data Tables can figure most of that out on the fly depending on context (think of it as dynamic typing). This impacts:

data type (creation and conversion)
unspecified entries (NULL).
Often NaN in MATLAB native types but I extended it by overloading relevant data types with a isnull() function and consistently use the same interface
default values
keys (Indices)

SQL features not offered by heterogenous data tables yet:

column name aliases (AS)
wildcard over names (*)
pattern matching (LIKE)

SQL features that are unnatural with heterogeneous data tables’ syntax:

implicitly filter a table with conditions in another table sharing the same key.
It’s an implied join(T, T_cond)+filter operation in MATLAB. Often used with ANY, ALL, EXISTS

Fundamentally heterogenous data types expects working with snapshots that doesn’t update often. Therefore they do not offer active checking (callbacks) as in SQL:

Invariant constraints (CHECK, UNIQUE, NOT NULL, Foreign key).
Auto Increment
Virtual (dependent) tables (CREATE VIEW)

Know these database/spreadsheet concepts:

Tall vs wide tables

Language logistics (not related to database)

Concepts	SQL	MATLAB (table/dataset)	Pandas (Dataframe)
Partial display	MySQL: LIMIT Oracle: FETCH FIRST	`T( 1:10, : )`	`df.head()`
Comments	`--` or `/` … `/`	`%` or `%{` … `%}`	`#` or `"""` … `"""`
function	CREATE PROCEDURE fcn	`function [varargout{:}]=fcn(varargin{:})`	`def fcn:`
case	CASE WHEN THEN ELSE END	switch case end	(no case structure, use dictionary)
Null if no results	IFNULL ( statement )	`function X=null_if_empty(T, cond)` `X=T( cond, : );` `if( isempty(X) ) X=NaN;`
Replace nulls	ISNULL(col, target_val)	`T.col(isnan(T.col)) = target_val` `T = standardizeMissing( T, ... )`

Rambling Nerd with a Plan

Hoi Wong's blog

Category Archives: Concepts

Dictionary of equivalent/analogous concepts in programming languages

Data Types

Array Operations

Editor Syntax

Object Oriented Programming Constructs

Functional Programming Constructs

Coroutines / Asynchronous Programming

Matrix Arrays

Heterogeneous Data Structures

Programming Techniques: Bit hackery

Concept: Two ways to get two’s complement

Concept: XOR can be used for bit flips or check for bit changes

Concept: Top bit holds the sign

Concept: Sign extensions (the top/sign bit gets drag-copied when right shifted)

Concept: Getting a bit mask of a 1s (if true) and all 0s (if false)

Concept: $2^n - 1$ sets all binary digits below it to a stream of 1s

Concept: Look up table

GUI Paradigms (1): Redux (Flutter/React) translated to MATLAB

Data Relationships of Spreadsheets: Relational Database vs. Heterogenous Data Tables

Data Types

Array Operations

Editor Syntax

Object Oriented Programming Constructs

Functional Programming Constructs

Coroutines / Asynchronous Programming

Matrix Arrays

Heterogeneous Data Structures

Concept: Two ways to get two’s complement

Concept: XOR can be used for bit flips or check for bit changes

Concept: Top bit holds the sign

Concept: Sign extensions (the top/sign bit gets drag-copied when right shifted)

Concept: Getting a bit mask of a 1s (if true) and all 0s (if false)

Concept: sets all binary digits below it to a stream of 1s

Concept: Look up table

Concept: $2^n - 1$ sets all binary digits below it to a stream of 1s