Modifying mutable (like bytearray) arguments’ data in Python functions Use slice assignment x[:] = ... to replace the entire contents. Dynamically typed language lets you shadow input arguments with a local variable!

I’d like to write a function to selectively modify lines read from a file handle and write it back. By default, lines are read as byte()  objects that are immutable, so I converted it to bytearray() instead so it can be modified because only a few lines meeting certain criteria needs to be changed.

When I try to refactor similar operation into a function, I was hoping to pass the mutable bytearray() as an argument and directly modify the caller’s content like in C++, given Python variables works LIKE reference binding.

I know bytearray.replace() does not modify the data in place, but instead outputs the modified line to a new variable. Normally, I can simply do this:

line = line.replace(b'\tCLASS', b'')

and the code will work. However, it doesn’t do anything when I try to pass it as an argument to a Python function (unless I return line as output). Although I am well aware that Python variables assignments to existing variables means orphaning the old data and re-purposing the label, the variable assignment behavior in Python requires careful thought when used in non-idiomatic situation.

In other words, I want this function to have side effects on the variable ‘line‘, but I wasn’t doing it right. This is a tempting mistake for people with a C/C++ background: in C/C++, it is not possible to shadow an input parameter even if we were to explicitly declare it, so the innocent assignment I did above has to modify the object in the caller (passed as a reference to the function) in C/C++, as if I did this directly in the caller.

However, in Python, variables do not need to be declared (aka, dynamically typed). This opens up the possibility of unwittingly shadowing the input parameters, which is what happened here. Mutable arguments on the stack still can be modified through the function, but when you assign a variable using ‘=’ operator, a new local variable with the name on the LHS is created, which shadows the input parameter.

This means the connection to the caller objects is lost during shadowing.

The correct way to do this is use slice assignment (which the logic/concept is very different despite the syntax is similar) to replace all the contents of the input variable with the output of bytearray.replace():

def remove_from_header_token_CLASS(tokens, line):
     # line is expected to be byte array (mutable)    
    try:
        column_CLASS = tokens.index(b'CLASS')
    except:
        column_CLASS = None
    else:
        line[:] = line.replace(b'\tCLASS', b'')  
                
    return column_CLASS

Since Python has a clear distinct concept of parameter variable (from local variable), trying to apply nonlocal keyword over it (in hopes to broaden the scope) will not parse/compile.


This is actually the same behavior as in MATLAB (dynamic typing) for the same reason that variables does not have to be declared like in C/C++ (static typing). In MATLAB, if you choose to have a handle object (which works like references), you can shadow the input argument by creating a local variable of the same name:

classdef DemoHandleClass < handle
    properties
        x = 3;
    end
end

function demo_shadowing()
    C = DemoHandleClass();
    f(C);
    
    disp(C.x)
end

function f(C)
    C = DemoHandleClass();  // Shadowing
    C.x = 14;
end

The above MATLAB program will display 14 without shadowing and 3 with shadowing (C became a new local variable that has nothing to do with the input argument C). MATLAB users rarely run into this because the language design heavily discourage side-effects: we are supposed to return the changed local variable to the caller. The only way to do side-effects in MATLAB is through handles (which you need to establish a class, which is clumsy). Technically you can write the data to external resources (e.g. file) and read it back. But guess what? Resources are accessed through handles, so there’s no escape.


Of course, there’s a better way to do so (MATLAB’s preferred way): return the modified object back to the caller as if they are immutable:

def remove_from_header_token_CLASS(tokens, line):
     # line DOES NOT HAVE TO BE MUTABLE    
    try:
        column_CLASS = tokens.index(b'CLASS')
    except:
        column_CLASS = None
    else:
        line = line.replace(b'\tCLASS', b'')  
                
    return column_CLASS, line

This is what I ultimately used (so I ended up not converting the byte lines to bytearray), given that Python’s tuple syntax make it easy to return multiple outputs like MATLAB. The call ended up looking like this:

column_SPL_CLASS, line = remove_from_header_token_CLASS(tokens, line)                

Nonetheless, I think there’s an important lesson to be learned for doing side-effects in dynamically typed languages. Maybe I’ll need this one day if I get an excuse to do something more complicated that genuinely requires side-effects.


In summary, variable assignments in most dynamically typed languages will shadow the input argument with a newly generated local variable instead of modifying the data in the original input argument. This implies that there function side-effects cannot be carried out through variable assignment.

The most common implication is: do not (equality) assign to a input variable to modify its contents in a dynamically typed language.

Loading

Spyder on MX Linux

This is another example that non-commercial (open-source) Linux/Python does not have a feel of a finished product: things break out of the box when installed fresh, in the most simple, expected ways, without any tweaks.

Again, don’t get me wrong, open-source free software are good stuff (more modern concepts and people working on it for free), but it’s never going to beat professional companies (like Microsoft/MATLAB) in how well-funded they are so they can maintain their software and the user experience using their profits. So far, users are still expected to put up with a bunch of unjustifiably unnecessary work to get to where they want to go with community-maintained software like Linux/Python.

This time I’m installing Spyder on MX Linux. Look at how many damn hoops I have to jump to get Spyder 3 to function properly there:

  1. I installed python3-spyder from MX Package Installer
  2. Spyder complained about missing rope on start
  3. Installed python-rope on MX Package Installer. The complaint still won’t go away
  4. I tried follow the instruction sudo pip3 install rope_py3k  and realized pip3 was not installed already with the Python that came with Spyder! (Didn’t have the problem with the Windows counterpart).
  5. Installed python3-pip from MX Package Installer.
  6. Came back and run pip3 install rope_py3k. It choked at "Command 'python  setup.oy egg_info' failed with error code 1 in /tmp/pip-build-0nnknjhi/rope-py3k". Again, known problem.
  7. Followed the solution in the comments pip3 install --upgrade setuptools
  8. Then come back and run pip3 install rope_py3k again. It says "Failed building wheel for rope-py3k" in between, but nonetheless I’ll try to move on since it says "Successfully installed rope-py3k-0.9.4.post1"

Then Spyder launch uneventfully.

These are not design decisions (sacrificing one quality for another), but inter-operability wrinkles that nobody are paid enough to do the grunt-work babysitting it. So if your business profits heavily from it, consider sponsoring the developers!


It’s also slightly annoying that the version of Spyder maitained in MX Linux’s most recent repository is a little older than what’s actually available (3.1.3+dfsg1-3 instead of 3.3.4).

At first I followed instructions to have PIP to update it: pip3 install -U spyder, but it doesn’t work. I got a lot of “failed building wheel for (package)” error.

I also realized the Python that came with it is 3.5, not the 3.7(.3) I had in Windows. I checked the MX package manager and indeed it stopped at 3.5. After some searching, I learned the base system package was frozen in 2016! MX Test Repo (at your own risk) has Python 3.7 though.

Loading

Watch out for ‘const’ method in Python

One thing I feel a little bit not quite as intuitive when I switch to Python is I constantly have to look up whether the method directly updates the contents or it’ll return a different object (of the same type) that I’ll have to overwrite the input variable myself.

An example would be strings and bytes object. replace() sounded like an updating method, but it’s actually a ‘const’ method (a term borrowed from C++ to say that the method does not have side-effects) that does not change the state of the object.

I initially thought this has to do with whether the object is immutable or not, but I tried it on bytearray objects (which is mutable), replace() behaves consistently with the identically named methods in other immutable objects (bytes object, string object): you’ll need to assign the output to self (basically bind the name to the temporary and throw away the original).

bts = b'test'
bts.replace('es', 'oas')       # dumps the output to workspace (can be accessed by _) and do nothing else
bts = bts.replace('es', 'oas') # actually updates bts

 

Loading

Anonymous Functions (MATLAB) vs Lambdas (Python) Anonymous Functions in MATLAB is closure while Lambdas in Python are not

Lambdas in Python does not play by the same rules as anonymous functions in MATLAB

  • MATLAB takes a snapshot of (capture) the workspace variables involved in the anonymous function AT the time the anonymous function handle is created, thus the captured values will live on afterwards (by definition a proper closure).
  • Lambda in Python is NOT closure! [EDIT: I’ll need to investigate the definition of closure more closely before I use the term here] The free variables involved in lambda expressions are simply read on-the-fly (aka, the last state) when the functor is executed.

It’s kind of a mixed love-and-hate situation for both. Either design choice will be confusing for some use cases. I was at first thrown off by MATLAB’s anonymous function’s full variable capture behavior, then after I get used to it, Python’s Lambda’s non-closure tripped me. Even in the official FAQ, it address the surprise that people are not getting what they expected creating lambdas in a for-loop.

To enable capture in Python, you assign the value you wanted to capture to a lambda input argument (aka, using a bound variable as an intermediary and initialize it with the free variable that needs to be captured), then use the intermediary in the expression. For example:

lambda: ser.close()      # does not capture 'ser'
lambda s=ser: s.close()  # 'ser' is captured by s.

I usually keep the usage of nested functions to the minimum, even in MATLAB, because effectively it’s kind of a compromised ‘global’ between nested levels, or a little bit like protected classes in C++. It breaks encapsulation (intentionally) for functions in your inner circle (nest).

It’s often useful for coding up GUI in MATLAB quick because you need to share access to the UI controls within the same group. For GUI that gets more complicated, I actually avoided nested functions altogether and used *appdata() to share UI object handles.

Functors of nested functions are closures in both MATLAB and Python! Only Lambdas in Python behave slightly differently.

Loading