There are two ways to access columns in DataFrame. The preferred way is by square brackets (indexing into it like a dictionary), while it’s tempting to use the neater dot notation (treating columns like an attribute), my recommendation is don’t!
Python has dictionaries that handles arbitary labels well while it doesn’t have dynamic field names like MATLAB do. This puts DataFrame at a disadvantage developing dot notation syntax while the dictionary syntax opens up a lot of possibilities that are worth giving up dot notation for. The nature of the language design makes the dot notation very half-baked in Python and it’s better to avoid it altogether
Reason 1: Cannot create new columns with dot notation
UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
Reason 2: Only column names that doesn’t happen to be valid Python attribute names AND DataFrame do not have any method with the same name can be accessed through dot notation.
Take an example of dataframe constructed from device info dictionaries created by the package pyft4222
. I added a column called 'test me'
to a table converted from the dictionary of device info. The tabe T
looks like this:
data:image/s3,"s3://crabby-images/cb316/cb3164a4f46a81c0f7b06a88e4e3f809ea008df0" alt=""
I tried dir()
on the table and noticed:
- The column name
"test me"
did not appear anywhere, not even mangled. It has a space in between so it’s not a valid attribute or variable name, so this column is effectively hidden from the dot notation flags
is an internal attribute of DataFrame and it was not overriden by the data columnflags
when called by the dot notation. This means theflags
column was also hidden to the dot notation as there were no mangled name for it either