There are two ways to access columns in DataFrame. The preferred way is by square brackets (indexing into it like a dictionary), while it’s tempting to use the neater dot notation (treating columns like an attribute), my recommendation is don’t!
Python has dictionaries that handles arbitary labels well while it doesn’t have dynamic field names like MATLAB do. This puts DataFrame at a disadvantage developing dot notation syntax while the dictionary syntax opens up a lot of possibilities that are worth giving up dot notation for. The nature of the language design makes the dot notation very half-baked in Python and it’s better to avoid it altogether
Reason 1: Cannot create new columns with dot notation
UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
Reason 2: Only column names that doesn’t happen to be valid Python attribute names AND DataFrame do not have any method with the same name can be accessed through dot notation.
Take an example of dataframe constructed from device info dictionaries created by the package pyft4222. I added a column called 'test me' to a table converted from the dictionary of device info. The tabe T looks like this:
I tried dir() on the table and noticed:
The column name "test me" did not appear anywhere, not even mangled. It has a space in between so it’s not a valid attribute or variable name, so this column is effectively hidden from the dot notation
flags is an internal attribute of DataFrame and it was not overriden by the data column flags when called by the dot notation. This means the flags column was also hidden to the dot notation as there were no mangled name for it either
Even more weird is that getattr() works for columns with non-qualified attribute name like test me (despite the dot notation cannot access it because of the lack of dynamic field names syntax yet test me doesn’t show up in dir()). getattr(T, 'flags') still gets the DataFrame’s internal attribute flags instead of the column called flags as expected.
Iterators are sequencable references. Since MATLAB doesn’t do references, so iterators (and therefore generators) do not make sense
Data Types
Common
C
C++
MATLAB
Python
Sets
N/A
std::set
Only set operations, not set data type
{ , , ...}
Dictionaries
std::unordered_map
– Dynamic fieldnames (qualified varnames as keys) – containers.Map() or dictionary() since R2022b
Dictionaries {key:value} (Native)
Heterogeneous containers
cells {}
lists (mutable) tuples (immutable)
Structured Heterogeneous containers
table() dataset() [Old]
Mix in classes
Pandas Dataframe
Array, Matrices & Tensors
Native [ , ; , ]
Numpy/PyTorch
Records
struct
class (members)
dynamic field (structs) properties (class)
getfield()/setfield()
No structs (use dicts)
attribute (class) getattr()/setattr()
Type deduction
N/A
auto
Native
Native
Type extraction
N/A
decltype() for compile time (static)
typeid() for RTTI (runtime)
class()
type()
Native sets operations in Python are not stable and there’s no option to use stable algorithm like MATLAB does. Consider installing orderly-set package.
Array Operations
Common
MATLAB
Python
Repeat
repmat()
[] * N np.repeat()
Logical Indexing
Native
List comprehension Boolean Indexing (Numpy)
Editor Syntax
Common
C
C++
MATLAB
Python
Commenting
/* ... */
// (only for newer C)
// (single line)
/* ... */ (block)
% (single line)
(Block): %{ ... %}
# (single line)
""" or ''' is docstring which might be undersirably picked up
Macros only make sense in C/C++. This makes code less transparent and is frowned upon in higher level programming languages. Even its use in C++ should be limited. Use inline functions whenever possible.
Python allows adding members (attributes) on the fly with setattr(), which includes methods. MATLAB’s dynamicprops allows adding properties (data members) on the fly with addprop
Functional Programming Constructs
Common
C++
MATLAB
Python
Function as variable
Functors (Function Objects) operator()
Function Handle
Callables (Function Objects) __call__()
Lambda Syntax
Lambda [capture](inputs) {expr} -> optional trailing return type
Anonymous Function @(inputs) expr
Lambda lambda inputs: expr
Closure (Early binding): an instance of function objects
Capture [] only as necessary.
Early binding [=] is capture all.
Early binding ONLY for anonymous functions (lambda).
Can capture Po through default values lambda x,P=Po: x+P (We’re relying users to not enter the captured/optional input argument)
Concepts of Early/Late Binding also apply to non-lambda functions. It’s about when to access (usually read) the ‘global’ or broader scope (such as during nested functions) variables that gets recruited as a non-input variable that’s local to the function itself.
An instance of a function object is not a closure if there’s any parameter that’s late bound. All lambdas (anonymous functions) in MATLAB are early bound (at creation).
The more proper way (without creating an extra optional argument that’s not supposed to be used, aka defaults overridden) to convert late binding to early binding (by capturing variables) is called partial application, where you freeze the parameters (to be captured) by making them inputs to an outer layer function and return a function object (could be lambda) that uses these parameters.
The same trick (partial application) applies to bind (capture) variables in simple/nested function handles in MATLAB which do behave the same way (early binding) like anonymous functions (lambda).
Currying is partial application one parameter at a time, which is tedious way to stay faithful to pure functional programming.
This is another example of open source projects not going through a comprehensive use case study before changing the default behavior, which end up pulling the rug on some users.
This time it’s Spyder’s good-intentions trying to proactively prevent user mistakes (such as not keeping track of the workspace) throwing the people who meticulously understand their workspace off.
I was working on a FT4222 device which should not be opened again if it’s already opened, aka the ft4222 class object exists. So naturally like in MATLAB, at the top of the script I check if the device object already exist and only create/open it when it’s not already there, like this:
if 'dev' in locals():
pass
else:
print('Branch')
dev = ft4222.openByDescription('FT4222 A')
To my surprise it doesn’t work. 'dev' in locals() always return False every time I press F5, despite when I check again after the script runs, the variable is indeed in there and 'dev' in locals() returns True. WTF?!
Turns out I was not alone! Somebody had the exact same idiom as I did. Spyder 4 changed the default behavior, and we are supposed to manually check this dialog box entry so the scripts do not run off a clean slate when we press F5!
Spyder 5Spyder 6
It’s an extremely terrible idea to have the IDE muck with the state by default. In MATLAB, if we want the script to start with clean state, we either put clear at the top of the script or clearvars -except to keep the variable.
It’s even harder to catch the new default insidious behavior of Spyder given it runs the script from a clean slate from F5/Run then dump the values to the workspace. It’s now a merge between pre-existing variables in the local() workspace and the results of the script from from a blank state!
The people who decided change to this default behaveior certainly didn’t think through this and rushed to do the obvious to please the careless programmers. If a programmer made a mistake by re-running the script without clearing the workspace and was impacted by the dirty variables, they can always reset everything and get out of this (and learn they should clean up the dirty state through the experience), however, somebody who know what they are doing will not be able to figure out what they did wrong until they search for a behavior that looked more like a bug from Spyder/Python! It’s just horrible design choice! MATLAB doesn’t casually to throw users off like this. Damn!
Also I looked into code cells #%% (MATLAB has the equivalent %%), but there’s another annoyance in Spyder: block commenting through """ or ``` pairs is interpreted as output string from runcelll()! In other words, runcelll() outputs docstrings! So every time you execute the cell, the code you comments will be concatenated into one long raw string with escape characters and pollute your console screen! Damn!
Spyder annoyances (3): The shortcut key Ctrl+D to reset console doesn’t work unless there’s nothing half typed in the console.
T && X is equivalent to "if( T) then run X"
T || X is equivalent to "if(!T) then run X"
I don’t exploit this too much in C/C++ because it’s hard to read (and therefore hard to keep track of it to make sure it’s bug free) and most often I’m interested in the output value so I have to watch out for the side effects. However this is common in Bash scripts
Domination Property
In languages that expressions evaluates to a value, sometimes if-statements can be replaced by short-circuit evaluation because short-circuit evaluation exploits the domination property of AND and OR logic operations:
When you FIRST run into the dominant value for the binary operation (0 for AND) and (1 for OR), evaluate no further (i.e. skip the rest) because rest won’t change the overall result away from the dominant value.
So in this use case (emulating if-then statements), what the latter expression X evaluates to or what the combined logic value is irrelevant. We are merely tricking the short-circuit mechanism to trip (short) to NOT evaluate based on what the earlier expression turned out. Action is the ‘norm’. Conditional inaction is the essence of this idiom.
The dual of domination property is idempotent, which is easier to reason because if pre-condition (say T) forces overall expression to boil down to the expression we want to conditionally execute (say X), we are stuck evaluating X if condition T is met.
\newcommand{\Hquad}{\hspace{0.5em}}
\begin{alignat*}{2}
1 \Hquad & \mathrm{AND} & \Hquad X = X \\
0 \Hquad & \mathrm{OR} & \Hquad X = X
\end{alignat*}
These 2 possibilities (domination and idempotency) partitions to space (choices) of possibilities (i.e. cover all possible combinations), in other words there are no other scenarios than described. So the precondition T decides whether you run X or not, which is the equivalent of an if-then statement.
Operator (function) view of logic domination [Functional programming perspective]
By grouping the first value and the binary logic operator with a pair of parenthesis, in dominance view
(0 AND) is also called the ‘clear’ operator
(1 OR) is also called the ‘set’ operator
but this view is not too interesting for our case because we are not interested in what the conditional expression and the overall expression evaluates to, which is signified by ‘clear’ and ‘set’.
On the other hand, (1 AND) and (0 OR) are pass-through (idempotent) operators which passes the evaluation to the latter expression X.
\newcommand{\Hquad}{\hspace{0.5em}}
\begin{alignat*}{2}
(1 \Hquad & \mathrm{AND}) \Hquad & \circ & \Hquad X = X \\
(0 \Hquad & \mathrm{OR}) \Hquad & \circ & \Hquad X = X
\end{alignat*}
Sean Eron Anderson (Stanford CS graphics lab)’s bit twiddling pages often shows a bunch of neat bit tricks, but it’s more like a recipe book than a unified way to summarize the common concepts behind them. Here’s my attempt. This page will get updated as I got the time and more useful insights collected.
Concept: Two ways to get two’s complement
This is the basics of most bit hacks below. Sometimes the definition itself is a bit trick on its own.
You can generalize it to an arbitrary number by subtracting -1 more under the bar on the left hand side and you will get +1 more on the right hand side. Every extra -1 under the bar (bit flips) shows up as +1 outside the bar (bit flips).
This matches the observation that complement schemes (one’s or two’s) both have increasing magnitude move in opposite directions for positive and negative numbers. Look at this table:
Unsigned
Binary
Two’s Complement
7
111
-1
6
110
-2
5
101
-3
4
100
-4
3
011
3
2
010
2
1
001
1
0
000
0
A very important observation that’d be used over and over blow is that in two’s complement, -1 is always a mask of all binary ‘1’s regardless of the word width.
This rule can also read as: magnitude offsets goes in opposite directions
Note that this is NOT distributing bit-flip to two addition/subtraction despite it resembles it with an important distinction that the sign of n changed without turning into (n-1). If it were to distribute, you’ll get (n-2) on the left-hand-side instead of the (n-1) term because the -1 would have been counted twice under distribution.
Bit flips simply doesn’t distribute over the 4 basic (algebraic field) operations. The two’s complement offset is done once and only once when you change the overall representation no matter how many components you break it down into. It’s merely done to shift over the -0 in one’s complement so there’s an extra space for an extra negative number which its positive counterpart is not representable without starting a new digit.
Note to self: the INT_MIN is just the sign bit of ‘1’ followed by all zeros after.
Concept: XOR can be used for bit flips or check for bit changes
Concept: Top bit holds the sign
Sounds simple, but if you keep in mind that (x<0) is really asking to see if the top bit is 1, you can check if two numbers has opposite signs without bit shifting it down by simply XOR-ing them (anything below the top bit are ignored) and use (x^y)<0 to check for the resulting top bit is 1, which signals that the sign bits are different.
Concept: Sign extensions (the top/sign bit gets drag-copied when right shifted)
When you right shift (in signed integers), the top (sign) bit gets drag-copied (sign extended) by the number of bits you right shifted. (Obviously for signed integers, right shifts are zero-filled)
Can exploit this to
drag the top (sign) bit all the way down to the bottom (so you either get all 1s or 0s) to provide a conditional mask based on the sign (see below)
1??????? \gg 7 \textnormal{ (i.e. type bit width - 1)} = 11111111_2 = -1_{10} \\
0??????? \gg 7 \textnormal{ (i.e. type bit width - 1)} = 00000000_2 = +0_{10} \\
Signed extensions also means a negative number will stay negative and a positive number will stay positive if you right shift
Sign extension behavior is not guaranteed by 1987 ANSI C, but it’s standard on pretty much anything more modern than that. Just make sure anything that uses this behavior are inlined (so the implementation can be easily swapped out), well documented/commented, and platform checks/switches are in place, and there’s a way to quickly check with the slower but platform independent implementation.
Concept: Getting a bit mask of a 1s (if true) and all 0s (if false)
The ability to convert a logic evaluation (condition) that gives
is the basis of many branchless ‘drop/keep this if that’ operations.
This can also be achieved by
putting a minus sign in front, such as -(cond) that will convert a (+1, 0) into (-1, 0), or
more efficiently exploiting sign extensions by dragging the top bit to the bottom (by right shifting by the type’s bit length-1)
computing absolute using the two’s complement’s definition of flip all bits and add 1: drag out a mask that shows that sign, which happens to be a do nothing if all 0s and flip all bits if all 1s in an xor, while the mask of all 1s, which is -1, when subtracted, becomes +1 needed to finish the two’s complement (and that’s subtract by 0 for already positive value).
Concept: sets all binary digits below it to a stream of 1s
When you count binary numbers up, you must exhaust all the lower digits by filling them with all 1s before you get to advance to (set) a new digit on the left of them. For example,
This can be exploited to create bit-masks that preserves all digits on the left of the first ‘1’ seen from the right (LSB), ‘0’ at that lowest (LSB) set bit (aka ‘1’), and all ‘1’s below it.
Binary digits are are the (1 or 0) coefficients of a linear combination of powers of 2. Having a loner ‘1’ (aka everything else is 0) means the number is a power of 2.
Being the lone ‘1’ bit in the number means every bit above it must be zero. Any ‘1’s above the right-most ‘1’ means it’s not the loner, hence not a power of 2.
If you subtract 1 from the power-of-2 number, only all bits below (not including) the line ‘1’ bit becomes 1, and that ‘1’ bit position become zero, and as mentioned before, all bits above it are 0s by definition since the ‘1’ we are working on is a loner.
Since the digits in and are mutually exclusive (see example below)
The xor approach does not work because the upper bits are invariant, so we cannot detect the presence of the upper set bits (upper ‘1’s). It unconditionally gives the same bit pattern (mask) marking the lowest first set bit and everything below it 1s and 0s for everything else above it. Which can be exploited to simplify counting the consecutive trailing zeros (from the right) by turning it into counting the contiguous 1s in this invariant pattern, or add 1 to it and binary search the position of the set bit and subtract 1 because the said bit was made into the invariant (xor) pattern as well so +1 move onto the next upper binary digit.
The or approach detects the presence of the upper set bits but it’s a pain to mask out the invariant lower 1s, which curiously you can do by XOR-ing with the invariant pattern generated by or you can do AND-NOT-ing
Which and happens to already does the job by keeping the top bits (which non-zero value detects their presence) yet unconditionally clear the lowest set bit and everything below it.
The gut of is x & (x-1) maneuver is that it clears the bit from the lowest set bit and everything down below
clearLowestSetBitAndEverythingBelow(x): x & (x-1)
This is used by Brian W. Kernighan to count number of set bits by knocking them one off at a time starting from below. Of course the worst case scenario is when the 1s are so dense that the algorithm must go through every bit without jumping past the zeros.
However a special case escaped us, which is x=0. 0x0000 & 0xFFFF is 0, but 0 isn’t a power of 2 unless you consider the minus infinity power which is the territory of floating point anyway. This can be easily patched by making the result unconditionally false if x=0 in the first place.
isPowerOfTwo(x): x && ((x & (x-1))==0)
Note the logical && which means x is first tested for its non-zeroness (by boiling any non-zero value down to +1) and it also enables the efficient short-circuit evaluation which if x is false, which means the result is unconditionally false under &&, the rest are irrelevant so it’s not evaluated.
Concept: Look up table
This is unconditionally the O(1) way because you have a mask of every bit in the type ready and you could index by it. However the penalty is that a load operation could be expensive if not everything can fit in the register file.