There are two ways to access columns in DataFrame. The preferred way is by square brackets (indexing into it like a dictionary), while it’s tempting to use the neater dot notation (treating columns like an attribute), my recommendation is don’t!
Python has dictionaries that handles arbitary labels well while it doesn’t have dynamic field names like MATLAB do. This puts DataFrame at a disadvantage developing dot notation syntax while the dictionary syntax opens up a lot of possibilities that are worth giving up dot notation for. The nature of the language design makes the dot notation very half-baked in Python and it’s better to avoid it altogether
Reason 1: Cannot create new columns with dot notation
UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access
Reason 2: Only column names that doesn’t happen to be valid Python attribute names AND DataFrame do not have any method with the same name can be accessed through dot notation.
Take an example of dataframe constructed from device info dictionaries created by the package pyft4222
. I added a column called 'test me'
to a table converted from the dictionary of device info. The tabe T
looks like this:
data:image/s3,"s3://crabby-images/cb316/cb3164a4f46a81c0f7b06a88e4e3f809ea008df0" alt=""
I tried dir()
on the table and noticed:
- The column name
"test me"
did not appear anywhere, not even mangled. It has a space in between so it’s not a valid attribute or variable name, so this column is effectively hidden from the dot notation flags
is an internal attribute of DataFrame and it was not overriden by the data columnflags
when called by the dot notation. This means theflags
column was also hidden to the dot notation as there were no mangled name for it either
Even more weird is that getattr()
works for columns with non-qualified attribute name like test me
(despite the dot notation cannot access it because of the lack of dynamic field names syntax yet test me
doesn’t show up in dir()
). getattr(T, 'flags')
still gets the DataFrame’s internal attribute flags
instead of the column called flags
as expected.