Pandas DataFrame in Python (1): Disadvantage of using attributes to access columns

There are two ways to access columns in DataFrame. The preferred way is by square brackets (indexing into it like a dictionary), while it’s tempting to use the neater dot notation (treating columns like an attribute), my recommendation is don’t!

Python has dictionaries that handles arbitary labels well while it doesn’t have dynamic field names like MATLAB do. This puts DataFrame at a disadvantage developing dot notation syntax while the dictionary syntax opens up a lot of possibilities that are worth giving up dot notation for. The nature of the language design makes the dot notation very half-baked in Python and it’s better to avoid it altogether

Reason 1: Cannot create new columns with dot notation

UserWarning: Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access

Reason 2: Only column names that doesn’t happen to be valid Python attribute names AND DataFrame do not have any method with the same name can be accessed through dot notation.

Take an example of dataframe constructed from device info dictionaries created by the package pyft4222. I added a column called 'test me' to a table converted from the dictionary of device info. The tabe T looks like this:

I tried dir() on the table and noticed:

  • The column name "test me" did not appear anywhere, not even mangled. It has a space in between so it’s not a valid attribute or variable name, so this column is effectively hidden from the dot notation
  • flags is an internal attribute of DataFrame and it was not overriden by the data column flags when called by the dot notation. This means the flags column was also hidden to the dot notation as there were no mangled name for it either

Even more weird is that getattr() works for columns with non-qualified attribute name like test me (despite the dot notation cannot access it because of the lack of dynamic field names syntax yet test me doesn’t show up in dir()). getattr(T, 'flags') still gets the DataFrame’s internal attribute flags instead of the column called flags as expected.

Loading

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments