Pandas DataFrame in Python (1): Disadvantage of using attributes to access columns

There are two ways to access columns in DataFrame. The preferred way is by square brackets (indexing into it like a dictionary), while it’s tempting to use the neater dot notation (treating columns like an attribute), my recommendation is don’t!

Python has dictionaries that handles arbitary labels well while it doesn’t have dynamic field names like MATLAB do. This puts DataFrame at a disadvantage developing dot notation syntax while the dictionary syntax opens up a lot of possibilities that are worth giving up dot notation for. The nature of the language design makes the dot notation very half-baked in Python and it’s better to avoid it altogether

Reason 1: Cannot create new columns with dot notation

UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access

Reason 2: Only column names that doesn’t happen to be valid Python attribute names AND DataFrame do not have any method with the same name can be accessed through dot notation.

Take an example of dataframe constructed from device info dictionaries created by the package pyft4222. I added a column called 'test me' to a table converted from the dictionary of device info. The tabe T looks like this:

I tried dir() on the table and noticed:

  • The column name "test me" did not appear anywhere, not even mangled. It has a space in between so it’s not a valid attribute or variable name, so this column is effectively hidden from the dot notation
  • flags is an internal attribute of DataFrame and it was not overriden by the data column flags when called by the dot notation. This means the flags column was also hidden to the dot notation as there were no mangled name for it either

Even more weird is that getattr() works for columns with non-qualified attribute name like test me (despite the dot notation cannot access it because of the lack of dynamic field names syntax yet test me doesn’t show up in dir()). getattr(T, 'flags') still gets the DataFrame’s internal attribute flags instead of the column called flags as expected.

Loading

Dictionary of equivalent/analogous concepts in programming languages

CommonCC++MATLABPython
Variable arguments<stdarg.h>
T f(...)
Packed in va_arg
Very BAD!

Cannot overload
when signatures are uncertain.
varargin
varargout

Both packed as cells.

MATLAB does not have named arguments
*args (simple, stored as tuples)

**kwargs (specify input by keyword, stored as a dictionary)
Referencing
N/A
operator[](_) is for references
subsindex
subsassgn


[_] is for concat
{_} is for (un)pack
__getitem__()
__setitem__()
Logical
Indexing
N/AN/ANative (first-class)List comprehension
(Raw Python)

Boolean Indexing native with numpy/pandas
Default
values
N/ASupportedNot supported.
Manage with inputParser()
Non-intuitive static data behavior. Stick to None or immutables.
Major
Dimension
RowRowColumnRow (Native/Numpy)
Column for Pandas
ConstnessconstconstOnly in classesN/A (Consenting adults)
Variable
Aliasing
PointersReferencesNO!
Copy-on-write
References
= assignmentCopy one
element
Values: Copy
References: Bind
New Copy
Copy-on-write
NO VALUES
Bind references only
(could be to unnamed objects)
Chained
Operations
N/A

Assignment’s value is assigned value
Difficult to get it rightDifficult to get it right. MATLAB had some chaining bugs with dataset() as well.Chains correctly natively

Data Types

CommonCC++MATLABPython
SetsN/Astd::setOnly set operations, not set data type{ , , ...}
Dictionariesstd::unordered_map– Dynamic fieldnames
(qualified varnames as keys)
containers.Map() or dictionary() since R2022b
Dictionaries
{key:value}
(Native)
Heterogeneous containerscellslists (mutable)
tuples (immutable)
Structured
Heterogeneous
table()
dataset() [Old]
Pandas Dataframe
Array,
Matrices &
Tensors
Native [ , ; , ]Numpy/PyTorch
Recordsstructclass
(members)
dynamic field (structs)
properties (class)

getfield()/setfield()
No structs
(use dicts)

attribute (class)
Native sets operations in Python are not stable and there’s no option to use stable algorithm like MATLAB does. Consider installing orderly-set package.

Editor Syntax

CommonCC++MATLABPython
Commenting/* ... */

// (only for newer C)
// (single line)

/* ... */ (block)
% (single line)

(Block):
%{
...
%}
# (single line)

""" or '''
is docstring which might be undersirably picked up
Reliable multi-line
commenting
(IDE)
Ctrl+(Shift)+R(Windows), / (Mac or Linux)[Spyder]:
Ctrl+1(toggle), 4(comment), 5(uncomment)
Code cell
(IDE)
%%[Spyder]:
# %%
Line
Continuation
\\...\

Object Oriented Programming Constructs

CommonC++MATLABPython
Getters
Setters
No native syntax.

Name mangle (prefix or suffix) yourself to manage
Define methods:
get.x
set.x
Getter:
@property
def x(self): ...


Setter:
@x.setter
def x(self, value): ...
DeletersMembers can’t be
changed on the fly
Members can’t be
changed on the fly
Deleter (removing attributes
dynamically by del)
Overloading
(Dispatch function by signature)
OverloadingOverload only by
first argument
@overload (Static type)
@singledispath
@multipledispatch
Initializing class variablesInitializer Lists
Constructor
ConstructorConstructor
ConstructorClassName()ClassName()__init__()
Destructor~ClassName()delete()__del__()
Special
methods
Special member functions(no name)
method that control specific behaviors
Magic/Dunder methods
Operator overloadingoperatoroperator methods to defineDunder methods

Functional Programming Constructs

CommonC++MATLABPython
Function as
variable
Functors
(Function Objects)
operator()
Function HandleCallables
(Function Objects)
__call__()
Lambda
Syntax
Lambda
(C++11)
Anonymous FunctionLambda
Closure
(Early binding): an
instance of lambda
Capture [] only as necessary.

Early binding [=] is capture all.
Early binding ONLY.

Takes snapshot of workspace values involved when instantiated (anonymous function object is created)
Late binding* by default.

Can capture Po through default values
lambda x,P=Po: x+P
(We’re relying users to not enter the captured/optional input argument)
Concepts of Early/Late Binding also apply to non-lambda functions. It’s about when to access (usually read) the ‘global’ or broader scope (such as during nested functions) variables that gets recruited as a non-input variable that’s local to the function itself.

The more proper way (without creating an extra optional argument that’s not supposed to be used, aka defaults overridden) to convert late binding to early binding (by capturing variables) is called partial application, where you freeze the parameters (to be captured) by making them inputs to an outer layer function and return a function object (could be lambda) that uses these parameters.

Currying is partial application one parameter at a time, which is tedious way to stay faithful to pure functional programming.

Loading

HP 54600 Series (First Gen) Module Compatibility Reasoning

The modules are categorized into these characteristics:

  • Plain (oldest, compatible with all): 54650A (GPIB), 54651A (Serial), 54652A (Parallel Printer)
  • Test Automation (TAM) License/Memory: 54655A (GPIB), 54656A (Serial + 5 output lines)
  • FFT/Time & Math License/Memory: 54657A (GPIB), 54659B (Serial+Parallel)
  • Serial + Parallel: 54652B (no FFT), 54659B (with FFT)

The matching oscilloscopes/logic analyzers are sorted into 3 main sub-generations:

  • Too Old (Cannot understand Serial+Parallel): 5460XA, 54610A
  • Everything in between: 5460XB, 54610B, 54620X
  • Too New (Cannot understand TAM): 54615/6B (I suspect C too), 54645A/D

Logic Analyzers (54620A/C) is considered “Everything in between” and it gleefully disregards the Test Automation/FFT features as they are only relevant to analog signals.

Only FFT modules have a RTC to keep time. TAM modules are too primative to have this.

The “Too Old” scopes have newer firmware available that handles FFT (which you need to upgrade by a chip swap if the firmware is too old), but they still don’t understand multiplexing serial & parallel lines they are stuck with 54657A.

54657A covers the broadest range of oscilloscopes (everything)

If you want the FFT and serial port together. There’s only one choice which is 54659B and you have to avold the “Too Old” oscilloscopes

It’s hard to keep track of this compatibility matrix below. That’s why this blog post explained the reasoning by categories above. It really boils down to what features that are too new (multiplexing serial+parallel port) for an old firmware and what features (TAM) the newest firmware dropped support for.

Loading

Spyder traps for MATLAB users (1): By default, Spyder’s F5/Run executes the script from clean workspace.

This is another example of open source projects not going through a comprehensive use case study before changing the default behavior, which end up pulling the rug on some users.

This time it’s Spyder’s good-intentions trying to proactively prevent user mistakes (such as not keeping track of the workspace) throwing the people who meticulously understand their workspace off.

I was working on a FT4222 device which should not be opened again if it’s already opened, aka the ft4222 class object exists. So naturally like in MATLAB, at the top of the script I check if the device object already exist and only create/open it when it’s not already there, like this:

if 'dev' in locals():
    pass
else:
    print('Branch')
    dev = ft4222.openByDescription('FT4222 A')

To my surprise it doesn’t work. 'dev' in locals() always return False every time I press F5, despite when I check again after the script runs, the variable is indeed in there and 'dev' in locals() returns True. WTF?!

Turns out I was not alone! Somebody had the exact same idiom as I did. Spyder 4 changed the default behavior, and we are supposed to manually check this dialog box entry so the scripts do not run off a clean slate when we press F5!

Spyder 5
Spyder 6

It’s an extremely terrible idea to have the IDE muck with the state by default. In MATLAB, if we want the script to start with clean state, we either put clear at the top of the script or clearvars -except to keep the variable.

It’s even harder to catch the new default insidious behavior of Spyder given it runs the script from a clean slate from F5/Run then dump the values to the workspace. It’s now a merge between pre-existing variables in the local() workspace and the results of the script from from a blank state!

The people who decided change to this default behaveior certainly didn’t think through this and rushed to do the obvious to please the careless programmers. If a programmer made a mistake by re-running the script without clearing the workspace and was impacted by the dirty variables, they can always reset everything and get out of this (and learn they should clean up the dirty state through the experience), however, somebody who know what they are doing will not be able to figure out what they did wrong until they search for a behavior that looked more like a bug from Spyder/Python! It’s just horrible design choice! MATLAB doesn’t casually to throw users off like this. Damn!


Also I looked into code cells #%% (MATLAB has the equivalent %%), but there’s another annoyance in Spyder: block commenting through """ or ``` pairs is interpreted as output string from runcelll()! In other words, runcelll() outputs docstrings! So every time you execute the cell, the code you comments will be concatenated into one long raw string with escape characters and pollute your console screen! Damn!


Spyder annoyances (3): The shortcut key Ctrl+D to reset console doesn’t work unless there’s nothing half typed in the console.

Loading

Norton Ghost Behavior for NTFS images modified by GhostExplorer

I discovered today in a hard way (wasted time) to find out that NTFS images modified by new Ghost (AFTER and not including v8.3) Explorer (files injected deleted) will behave as if it’s unmodified (changes not committed) when you try to restore the said image IF YOU DID NOT RECOMPILE!

First of all, injecting/deleting files with Ghost Explorer is like a journaling file system, where the changes are tacked at the end of the file (added files are appended to the end, and deletion doesn’t actually delete, but append extra info saying such file is marked as deleted).

This means until you RECOMPILE (File -> Compile, which shows up as a “Save As” dialog box) the original image stays there.

New Ghosts compatible with file injection/deletion mechanism will respect the extra info tacked at the end and correctly skip the old files that are available when deploying the image. Old Ghosts that doesn’t recognize the extra stuff tacked on at the end that’s written by new Ghost Explorer will just ignore it and your image works as if it’s the old, unmodified image when restored with old Ghosts.

Turns out Ghost Explorer 8.3 or before cannot update files in NTFS partition images.

I’ve experimented with that and realize new Ghosts clones the updates (before compiling) correctly while old Ghosts clones as if it’s the unmodified image. Of course both of them clones correctly after recompilation.

Recompile is a lengthy process that actually go in and delete the orphan files and inject the new files but this needs to be done if you want to save space.

As for compatibility, recompiled ghost images do not work on older ghosts. So you’ll need to restore the image on a physical disk (or a mounted vhd) using new ghost and create the image with old ghost.

Loading