How missing keys are handled in Dictionary (Hashtables) in C++, Python and MATLAB

C++

In C++ (STL), the default behavior to touching (especially reading) a missing key (whether it’s map or unordered_map, aka hashtable) is that the new key will be automatically inserted with the default value for the type (say 0 for integers).

I have an example MapWithDefault template wrapper implementation that allows you to endow the map (during construction) with a default value to be returned when the key you’re trying to read does not exist

C++ has the at() method but it involves throwing an exception when the key is not found. However enabling exceptions is a material performance overhead burden in C++.

MATLAB

MATLAB’s hashtables are done with containers.Map() and with more modern MATLAB, dictionary objects (R2020b and on), unless you want to stick to valid MATLAB variable names as keys and exploit dynamic fieldnames.

Neither containers.Map() or dictionary have a default value mechanism when a key is not found. It will just throw an error if the key you are trying to read does not exist. Use iskey()/isKey() method to check if the key exist first and decide to read it or spit out a default value.

Python

Natively Python dictionaries will throw a Key error exception if the requested key in [] operator (aka __getitem__()) do not already exist.

Use .get(key, default) method if you want to assume a default value if the key is not found. The .get() method does not throw an exception: the default is None if not specified.

If you want C++’s default behavior of reading a new key means inserting the said key with a default, you have to explicitly import collections package and use defaultdict. I wouldn’t recommend this as the behavior is not intuitive and likely confusing in insidious ways.

There’s a simiar approach to my MapWithDefault in Python dictionaries: subclass from dict and define your own __missing__ dunder/magic method that returns a default when a key is missing, then use the parent (aka dict)’s constructor to do an explicit (type/class) conversion for existing dict object into your child class object that has __missing__ implemented.

Despite this approach is a little like my MapWithDefault, the __missing__ approach has more flexibility like allowing the default value to be dependent on the query key string, but it comes at the expense of making up one different class, not instance per different default values.

Monkey-patching instance methods is frowned upon in Python. So if you want the default value to tie to instances, the mechanism needs to be redesigned.

Loading

We use ContextManager (“with … as” statement) in Python because Python’s fundamental language design (garbage collecting objects) broke RAII

[TLDR] Python doesn’t have RAII. C++ and MATLAB allows RAII. You can have a proper RAII only if destructor timing is 100% controllable by the programmer.

Python uses Context Manager (with ... as idiom) to address the old issue of opening up a resource handler (say a file or network socket) and automatically close (free) it regardless of whether the program quit abruptly or it gracefully terminates after it’s done with the resource.

Unlike destructors in C++ and MATLAB, which registers what to do (such as closing the resource) when the program quits or right before the resource (object) is gone, Python’s Context Manager is basically rehasing the old try-block idea by creating a rigid framework around it.

It’s not that Python doesn’t know the RAII mechanism (which is much cleaner), but Python’s fundamental language design choices drove itself to a corner so it’s stuck micro-optimizing the try-except/catch-finally approach of managing opened resourecs:

  • Everything is seen as object in Python. Even integers have a ton of methods.
    MATLAB and C++ treats POD, Plain Old Data, such as integers separately from classes
  • Python’s garbage collector controls the timing of when the destructor of any object is called (del merely decrement the reference count).
    MATLAB’s garbage collector do not apply to objects so the destructor timing is guaranteed
    C++ has no garbage collection so the destructor timing is guaranteed and managed by the programmer.

Python cannot easily exclude garbage collecting classes (which breaks RAII) because fundamentally everything are classes (dictionaries potentially with callables) in Python.

This is one of the reasons why I have a lot of respects for MATLAB for giving a lot of consideration for corner cases (like what ’empty’ means) in their language design decisions. Python has many excellent ideas but not enough thoughts was given to how these ideas interact, producing what side effects.


Pythons documentation says out loud right what it does: with ... as ... is effectively a rigidly defined try-except-finally block:

Context Manager heavily depends on resource opener function (EXPR) to return a constructed class instance that implements __exit__ and __enter__, so if you have a C external library imported to Python, like python-ft4222, likely you have to write in your context manager in full when you write your wrapper.


Typically the destructor should check if the resource is already closed first, then close it if it wasn’t already closed. Take io.IOBase as an example:

However, this is only a convenience when you are at the interpreter and can live with the destructor called with a slight delay.

To make sure your code work reliably without timing bugs, you’ll need to explicitly close it somewhere other than at a destructor or rely on object lifecycle timing. The destructor can acts as a double guard to close it again if it hasn’t, but it should not be relied on.


The with ... as construct is extremely ugly, but it’s one of the downsides of Python that cannot be worked around easily. It also makes it difficult for users to retry acquiring a resource because one way or another retrying involves injecting the retry logic in __enter__. It’s not that much typographic savings using with ... as over try-except-finally block if you don’t plan to recycle th contextmanager and the cleanup code is a one-liner.

Loading

Tricks you eventually pick up with math

This is a note-to-self page which I’ll update as I naturally revisit these ideas opportunistically.

Special numbers

-1: alternating signs through odd/even powers (-1)^k
0: null (trivial additive solution), invariant (sums to zero)
1: identity (trivial multiplicative solution), invariant (multiplies to 1)
[0,1) shrinks with growing powers
2: 2*2=2+2
Odd: 2k+1, Even: 2k

Problem solving approaches

Properties of linearity, aka superposition

Find ways to see a raw definition of a concept hidden in the problem you’re solving.

Plugging in easy/obvious examples to verify a hypothesis (often used in differential equations) during exploration

Make up a convenient term or multiplier that you wish you could and hopefully the counteracting term can be pushed out or used somewhere, like -1+1 or multiply the numerator and denominator both by \sqrt{2}

Special functions

Things only a constant function can do

Small \mathrm{sinc} goes to 1 is the same as the small angle approximation for \sin(x)\approx x

Quadratics: Exploit x^2 - (\alpha+\beta)x + (\alpha\beta) (e.g. used in trace and det to infer eigenvalues)

Probe and extract with indicator function I_{x\in C}, elementary vector \mathbf{e}_i and elementary matrices \mathbf{E}, Dirac or Kronecker delta.

Calculus

Symmetric integrals cancels for odd function and doubles of one side for even functions

Series

Spotting hidden famous series (such as geometric sums)

Series expansion dropping terms

\cos(x) is even terms of e^x with alternating signs starting with 1,
\sin(x) is odd terms of e^x with alternating signs starting with x

Taylor series always have factorial at the bottom (denominator) of the coefficient matching the n-th derivative at the top (numerator) for the n-th power term.

Telescoping series (adjacent terms cancels)

Use derivative to bring down polynomial power by 1 and create a shifted series (which can be used to recurse or cancel)

Topology

In real line topology, outside the intuitive examples (singletons included), consider universal and empty set first, rationals and irrationals, then blame Cantor.

Discrete Math (or Primes)

Modulos: generate all possible remainders of a certain modulo by multiplying.

Loading

Pandas DataFrame in Python (1): Disadvantage of using attributes to access columns

There are two ways to access columns in DataFrame. The preferred way is by square brackets (indexing into it like a dictionary), while it’s tempting to use the neater dot notation (treating columns like an attribute), my recommendation is don’t!

Python has dictionaries that handles arbitary labels well while it doesn’t have dynamic field names like MATLAB do. This puts DataFrame at a disadvantage developing dot notation syntax while the dictionary syntax opens up a lot of possibilities that are worth giving up dot notation for. The nature of the language design makes the dot notation very half-baked in Python and it’s better to avoid it altogether

Reason 1: Cannot create new columns with dot notation

UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access

Reason 2: Only column names that doesn’t happen to be valid Python attribute names AND DataFrame do not have any method with the same name can be accessed through dot notation.

Take an example of dataframe constructed from device info dictionaries created by the package pyft4222. I added a column called 'test me' to a table converted from the dictionary of device info. The tabe T looks like this:

I tried dir() on the table and noticed:

  • The column name "test me" did not appear anywhere, not even mangled. It has a space in between so it’s not a valid attribute or variable name, so this column is effectively hidden from the dot notation
  • flags is an internal attribute of DataFrame and it was not overriden by the data column flags when called by the dot notation. This means the flags column was also hidden to the dot notation as there were no mangled name for it either

Even more weird is that getattr() works for columns with non-qualified attribute name like test me (despite the dot notation cannot access it because of the lack of dynamic field names syntax yet test me doesn’t show up in dir()). getattr(T, 'flags') still gets the DataFrame’s internal attribute flags instead of the column called flags as expected.

Loading

Dictionary of equivalent/analogous concepts in programming languages

CommonCC++MATLABPython
Variable arguments<stdarg.h>
T f(...)
Packed in va_arg
Very BAD!

Cannot overload
when signatures are uncertain.
varargin
varargout

Both packed as cells.

MATLAB does not have named arguments
*args (simple, stored as tuples)

**kwargs (specify input by keyword, stored as a dictionary)
Referencing
N/A
operator[](_) is for references
subsindex
subsassgn


[_] is for concat
{_} is for (un)pack
__getitem__()
__setitem__()
Default
values
N/ASupportedNot supported.
Manage with inputParser()
Non-intuitive static data behavior. Stick to None or immutables.
Major
Dimension
RowRowColumnRow (Native/Numpy)
Column for Pandas
ConstnessconstconstOnly in classesN/A (Consenting adults)
Variable
Aliasing
PointersReferencesNO!
Copy-on-write

Handle classes under limited circumstances
References
= assignmentCopy one
element
Values: Copy
References: Bind
New Copy
Copy-on-write
NO VALUES
Bind references only
(could be to unnamed objects)
Chained
access operators
N/ADifficult to operator overload it rightDifficult to get it right. MATLAB had some chaining bugs with dataset() as well.Chains correctly natively
Assignment
expressions
(assignment evaluates to assigned lvalue)
==N/ANamed Expression :=
Version ManagementverLessThan()
isMATLABReleaseOlderThan
virtenv (Virtual Environment)
Stream
(Conveyor belt mechanism. Saves memory)
I/O (std, file, sockets)
iterator in
STL containers
MATLAB doesn’t do references. Just increment indices.iterators (uni-directional only)
iter(): __iter__()
next(): __next__()
GeneratorsInput iteratorsoutput only per iteration with yield
CoroutinesC++20 <coroutine>
Iterators are sequencable references. Since MATLAB doesn’t do references, so iterators (and therefore generators) do not make sense.

Data Types

CommonCC++MATLABPython
SetsN/Astd::setOnly set operations, not set data type{ , , ...}
Dictionariesstd::unordered_map– Dynamic fieldnames
(qualified varnames as keys)
containers.Map() or dictionary() since R2022b
Dictionaries
{key:value}
(Native)
Heterogeneous containerscells {}lists (mutable)
tuples (immutable)
Structured
Heterogeneous containers
table()
dataset() [Old]

Mix in classes
Pandas Dataframe
Array,
Matrices &
Tensors
Native [ , ; , ]Numpy/PyTorch
Recordsstructclass
(members)
dynamic field (structs)
properties (class)

getfield()/setfield()
No structs
(use dicts)

attribute (class)
getattr()/setattr()
Type deductionN/AautoNativeNative
Type extractionN/Adecltype() for compile time (static)

typeid() for RTTI (runtime)
class()type()
Native sets operations in Python are not stable and there’s no option to use stable algorithm like MATLAB does. Consider installing orderly-set package.

Array Operations

CommonMATLABPython
Repeatrepmat()[] * N
np.repeat()
Logical IndexingNativeList comprehension
Boolean Indexing (Numpy)

Editor Syntax

CommonCC++MATLABPython
Commenting/* ... */

// (only for newer C)
// (single line)

/* ... */ (block)
% (single line)

(Block):
%{
...
%}
# (single line)

""" or '''
is docstring which might be undersirably picked up
Reliable multi-line
commenting
(IDE)
Ctrl+(Shift)+R(Windows), / (Mac or Linux)[Spyder]:
Ctrl+1(toggle), 4(comment), 5(uncomment)
Code cell
(IDE)
%%[Spyder]:
# %%
Line
Continuation
\\...\
Console
Precision
format%precision (IPython)
Clear variablesclear / clearvars%reset -sf (IPython)
Macros only make sense in C/C++. This makes code less transparent and is frowned upon in higher level programming languages. Even its use in C++ should be limited. Use inline functions whenever possible.

Python is messy about the workspace, so if you just delete

Object Oriented Programming Constructs

CommonC++MATLABPython
Getters
Setters
No native syntax.

Name mangle (prefix or suffix) yourself to manage
Define methods:
get.x
set.x
Getter:
@property
def x(self): ...


Setter:
@x.setter
def x(self, value): ...
DeletersMembers can’t be
changed on the fly
Members can’t be
changed on the fly
Deleter (removing attributes
dynamically by del)
Overloading
(Dispatch function by signature)
OverloadingOverload only by
first argument
@overload (Static type)
@singledispath
@multipledispatch
Initializing class variablesInitializer Lists
Constructor
ConstructorConstructor
ConstructorClassName()ClassName()__init__()
Destructor~ClassName()delete()__del__()
Special
methods
Special member functions(no name)
method that control specific behaviors
Magic/Dunder methods
Operator overloadingoperatoroperator methods to defineDunder methods
Resource
Self-cleanup
RIAAonCleanup(): make a dummy object with cleanup operation as destructor to be removed when it goes out of scopewith Context Managers
Python allows adding members (attributes) on the fly with setattr(), which includes methods. MATLAB’s dynamicprops allows adding properties (data members) on the fly with addprop

onCleanup() does not work reliably on Python because MATLAB’s object destructor time is deterministic (native types are up to garbage collector) while Python leaves it up to garbage collector.

Functional Programming Constructs

CommonC++MATLABPython
Function as
variable
Functors
(Function Objects)
operator()
Function HandleCallables
(Function Objects)
__call__()
Lambda
Syntax
Lambda
[capture](inputs) {expr} -> optional trailing return type
Anonymous Function
@(inputs) expr
Lambda
lambda inputs: expr
Closure
(Early binding): an
instance of function objects
Capture [] only as necessary.

Early binding [=] is capture all.
Early binding ONLY for anonymous functions (lambda).

Late binding for function handles to loose or nested functions.
Late binding* by default, even for Lambdas.

Can capture Po through default values
lambda x,P=Po: x+P
(We’re relying users to not enter the captured/optional input argument)
Concepts of Early/Late Binding also apply to non-lambda functions. It’s about when to access (usually read) the ‘global’ or broader scope (such as during nested functions) variables that gets recruited as a non-input variable that’s local to the function itself.

An instance of a function object is not a closure if there’s any parameter that’s late bound. All lambdas (anonymous functions) in MATLAB are early bound (at creation).

The more proper way (without creating an extra optional argument that’s not supposed to be used, aka defaults overridden) to convert late binding to early binding (by capturing variables) is called partial application, where you freeze the parameters (to be captured) by making them inputs to an outer layer function and return a function object (could be lambda) that uses these parameters.

The same trick (partial application) applies to bind (capture) variables in simple/nested function handles in MATLAB which do behave the same way (early binding) like anonymous functions (lambda).

Currying is partial application one parameter at a time, which is tedious way to stay faithful to pure functional programming.

Coroutines / Asynchronous Programming

MATLAB natively does not support coroutines.

CommonC++20Python
GeneratorsInput IteratorsFunctions that yield instead of return
(Implicitly return a generator/functor with iter and next)
Coroutines

Loading