Powershell notes (for MATLAB/python users)

Data Type characteristics

PowershellMATLABPython
Nearly everything is a/anObjectMatrix (APL-philosophy)Object (which are dictionaries)
Assignment behavior*Reassigned referenceCopy-on-writeReassigned reference
Monads (wrapper for heterogenous data )Array/CollectionsCellsLists

* Shallow assignment (transferring reference only) means the LHS does not have its own copy, so modifying the new reference will modify the underlying data on the RHS.

Syntax / Usage

PowershellMATLABPython
Method chainingYesMight misbehaveYes
List ComprehensionNo. Map first then filterYes
Named input argumentsNative
f -a 1 -b 22
Name-Value pairs parsed insideNative
f(a=13, b=22)
Implicit NON-NULL return valueOptional
Binary map operationNative matrix ops
*fun() does n-ary
Use numpy
list( map(operator.add, L1, L2) )
Check Type$g -is [type]is*() or isa()isinstance(val, type)
Unpacking (flattening)
monads in monads
Default
Use unary , to avoid
No
Use [{:}] to perform
No
Use *, list comp, or
list(itertools.chain(*ls))
Conditional/statement block inside container creationYes?
View Object Info with Data| Format-List -Property *
or
Format-List -InputObject
properties()
methods()
get()
List members (method and properties)’s prototypes| Get-Member

Powershell specific

  • The UNCAPTURED output value in the last line of the block is the return value! Unary side effect statements such as $x++ do not have output value. Watch out for statements that looks like it’s going nowhere at the end of the code as these are not nop/bugs, but return value. This has the same stench as fall-throughs.
  • foreach() follows the last uncaptured output value return rule above doing a 1-to-1 map from the input collection to output collection (you can assign output to foreach() as it’s also seen as a function)
  • Powershell suck at binary operations between two arrays. Just an elementwise A+B you’d be thinking in terms of loops and worry about dimensions.
  • You can put if and loop blocks inside collections list construction, like this:
@( 3, if(cond1){...; $v1}  do{...; $v2}while(cond2) )

MATLAB specific

  • When used with classes and custom matrices/arrays, chaining fields/properties/methods by indices often do not work, when they do, they often give out only the first element instead of the entire array (IIRC, there are operator methods that needs to be coordinated in the classes involved to make sure they chain correctly). In short, just don’t chain unless in very simple, scalar cases. Always output it to a variable a access the leaf.

Range & Indexing

PowershellMATLABPython
Logical IndexingYesNo. Use list comprehension/Numpy
Negative (cyclic) IndexingYesYes
end‘ of array keywordYesNo. Skip stop in slice instead
Step (skip every n items)YesYes. Both range or slice
Detect descending rangeYes
Automatic extend arrayYes
Reading array out of boundsDo nothingErrorError

Negative (cyclic) indexing along with automatic descending range, along with the lack of ‘end’ keyword is a huge pain in the rear when you want to scan from left to right like A[5:end].

Instead, you’ll have to do $A[4..($A.length-1)] because the range 4..-1 inside A[4..-1] is unrolled as 4,3,2,1,0,-1 (thus scanning from right to left and wraps around) without first consulting with the array A like the end keyword in MATLAB does so it can substitute the ends of the range with the array information before it unrolls.

I am willing to bet that this behavior does not have a sound basis other than people thinking negative indices and descending ranges alone are two good ideas without realizing that nearly nobody freaking wants to scan from right to left and wrap around!

I had the same gripes about negative indices in Python not carefully coordinating with other combinations in common use cases which cases unintuitive behavior.

Range indexing syntax

# Powershell
1..10 # No step/skip for range creation
A[1..10]  # No special treatment in array such as figuring out the 'end'

% MATLAB
A[start:(step):stop]

# Python
A[range(start,stop,step)]
# Slicing (it's not range)
A[(start):(stop):(step)] # Can skip everything 
# In Python, A=X merely reassign the label A as the alias for X.
# Modifying the reassigned A through A=X will modify underlying contents of X
# To deep-copy contents without .Clone(), assign the full slice
A[:] = X

Hasthtable / Dictionaries

% MATLAB: Use dynamic fields in struct or containers.Map()
# Python: dictionaries such as {a:1, b='x'}
# Powershell: @{a=1, b='x'}

Structs

Powershell does not have direct struct or dynamic field name struct. Instead if your object is uniform (you expect the fields not to change much), use [PSCustomObject]@{}. You can also just use simple hashtable @{}, but for some reason it doesn’t work the way I expected when put into arrays when I try to reference it by array index.

Array rules surprises

  • Array comparisons are filtering operation (not boolean array output like MATLAB). (0..9) -ge 5 gives 5 to 9, not a list of False … False, True … True. To get a boolean array, use this shortcut:
(0..9) | % {$_ -ge 5}

Map-filter combo syntax is | ? instead of Map syntax | %

  • Monad (Cells in MATLAB) are unpacked and stacked by default (in MATLAB, I had to write a lot of routines to unpack and stack cells of cells). To keep cells packed (in MATLAB lingo, it’s like ‘UniformOutput’, false in cellfun), add a comma unary operator in front of the operation that are expected to be unpacked like this:
.$_.Split('_')

Set Operations

This is one of the WTF moments of Powershell as a programming language. Convenient set operations is essential for most of the routine boring stuff that involves relational data. A lot of Powershell’s intended audience works in database like environment (like IT managers dealing with Active Directory), they have Group-Object for typical data analysis tasks, yet they make life miserable just to do basic set operations like intersection and differencing!

Powershell has a Compare-Object, but this is as unnatural and annoying to use as users are effectively rebuilding all 4 basic set-ops (intersection, union, set-diff, xor) based on any two! Not to mention you have to sift through table to get to the piece you wanted!

Basically Compare-Object out of the box

  • is a set-diff showing both directions (A\B and also B\A) at the same time. If you throw away the direction info, it’s xor.
  • if you want intersection, you’ll need to add -IncludeEqual -ExcludeDifferent
  • (WTF!) If you just specify -ExcludeDifferent, by definition there’s no output because by default Compare-Object shows you ONLY the two set-diffs and you are telling it to not show any diffs!
  • Union is specifying -IncludeEqual only. But it’d rather stack both then do a | Sort-Object - Unique

Some people might suggest doing | ? {$_ -eq $B} for intersection (or is-member). This is generally a bad idea if you have a lot of data because it’s in the O(n*n) runtime algorithm (loop-within-loop) while any properly done intersection algorithm will just sort then scan the adjacent item to check for duplicates, which gives O(n log(n)) time (typical sorting algorithm takes up most of the time).

If you noticed, it’s set operations within the outputs of Compare-Objects with the Venn diagram of -IncludeEqual -ExcludeDifferent switches! It’s doable, but totally unnecessary mindfuck that should not be repeated frequently.

In MATLAB land, I made my own overloading operators that do set operation over cellstr(), categorical and tabular objects (I went into their code and added the features and talked to TMW so they added the features later), sometimes getting into their sort and indexing logic as necessary. This shows how badly do I need set operations to come naturally.

One might not deal with it too much in low level languages like C++ (STL set doesn’t get used as much compared to other containers), but for a language made to get a lot of common things done (i.e. the language designer kind of reads the users mind), I’m surprised that the Powershell team overlooked the set operations!

Sets are very powerful abstractions that should not be made less descriptive (hard to read) by dancing around it with equivalent operations with some programming gymnastics! If these basic stuff are not built in, we are going to see a lot of people taking ugly shortcuts to avoid coding up these bread and butter functions and put it in libraries (or downloading 3rd-party libraries)!

Powershell surprises

  • Typical symbolic comparison operators do not work because ‘>’ can be misinterpreted as redirection in command prompts. Use switches like -gt (greater than) instead.
  • Redirection’s default text output uses UTF16-LE encoding (2 bytes per character). Programs assuming ASCII (1 byte per character) might not behave as intended (e.g. if you use copy command merge an ASCII/UTF8 file with UTF16-LE, you might end up with spaces in the sections that are formatted with UTF16-LE)
  • Cannot extract string matches from regex without executing a -match which returns boolean unless we use the the $matches$ spilled into variable space. Consider [regex]::Match($Text, $Pattern).Groups[1].Value
  • Methods are called with parenthesis yet functions are not called with parenthesis, just like cmd-lets! Trying to call a function with multiple input arguments with parenthesis like f(3,5) will be interpreted as calling f with ONE ARGUMENT containing an ARRAY of 3 and 5!
  • Write-Host takes everything after it literally (white spaces included, almost like echo command), with the exception of plugging in $variables! If you want anything interpreted, such as concatenation, you need to put the bracket around the whole statement!

Libraries and Modules

  • Reload module using Import-Module $moduleName -Force

Loading

Dart Language: late binding in lambda (no capture syntax)

Dart’s documentation at the time of writing (2021-10-27) is not as detailed as I hoped for so I don’t know if the Lambda (anonymous function) is late-binding or early-binding.

For those who don’t know what late/early-binding is:

Bindingearlylate
Expressionclosed (closure)open (not a closure)
Self-containedyesno
Free symbols
(relevant variables
in outer workspace)
bound/captured at creation
(no free symbols)
left untouched until evaluated/executed
(has free symbols)
Snapshotduring creationduring execution

Late-binding is almost a always a gotcha as it’s not natural. Makes great Python interview problems since the combination of ‘everything is a reference‘ and ‘reference to unnamed objects‘ spells trouble with lazy evaluation.

I’d say unlike MATLAB, which has a more mature thinking that is not willing to trade a slow-but-right program for a fast-but-wrong program, Python tries to squeeze all the new cool conceptual gadgets without worrying about how to avoid certain toxic combinations.

I wrote a small program to find out Dart’s lambda is late-bound just like Python:

void main()
{
  int y=3;
    var f = (int x) => y+x;	
    // can also write it as 
    // f(int x) => y+x;
  y=1000;	

  print( f(10) );
}
The result showed 1010, which means Dart use open lambdas (late-binding)

LanguageBinding rules for lambda (anonymous functions)
DartLate binding
Need partial application (discussed below) to enforce early binding.
PythonLate binding.
Can capture (bind early) by endowing running (free) variables (symbols) with defaults based on workspace variables
C++11Early binding (no access to caller workspace variables not captured within the lambda)
MATLABEarly binding (universal snapshot, like [=] capture with C++11 lambdas)

I tried to see if there’s a ‘capture’ syntax in Dart like in C+11 but I couldn’t find any.

I also tried to see if I can endow a free symbol with a default (using an outer workspace variable) in Dart, but the answer is no because unlike Python, default value expressions are required to be constexpr (literals or variables that cannot change during runtime) in Dart.

So far the only way to do early binding is with Partial Application (currying is different: it’s doing partial application of multiple variables by breaking it into a cascade of function compositions when you partially substitute one free variable/symbol at a time):

  1. Add an extra outer function layer (I call it the capture/binder wrapper) putting all free variables you want to bind as input arguments (which shadows the variables names outside the lambda expression if you chose not to rename the parameter variables to avoid conceptual confusion),
  2. then immediately EVALUATE the wrapper with the with the outer workspace variables (free variables) to capture them into the functor (closed lambda, or simply closure) which binder wrapper spits out.

Note I used a different style of lambda syntax in the partial application code example below

void main()
{
  var y=3;
    f(int x) => y+x;
	
    // Partial Application
    g(int w) => (int x) => w+x;	
    // Meat: EVALUATING g() AT y saves the content of y in h()
    var h = g(y);
	
  y=1000;	

  // y is free in f 
  print( f(10) );	

  // y is not involved in g until now (y=1000)	
  print( g(y)(10) );		

  // y is bounded for h when it was still y=3
  print( h(10) );
}

Only h() captures y when it’s still 3. The snapshot won’t happen if you cascade it with other lambda (since the y remains free as they show up in the lambda expression).

You MUST evaluate it at y at the instance you want y captured. In other words, you can defer capturing y until later points in your code, though most likely you’d want to do it right after the lambda expression was declared.


As a side note, I could have used ‘y’ instead of ‘w’ as parameter in the partial application statement

g(int y) => (int x) => y + x

but the ‘y‘ inside g() has shadowed (no longer refers to) the ‘y‘ in the outer workspace!

What makes it confusing here is that quite a few authors think it’s helpful (mnemonics-wise) to use the free variable (outer scope) name you are going to inject into the dummy (argument) as the name of the dummy! This gives the false impression of the variable being free while it’s bounded (through feeding it as a parameter in the wrapper/outer function)!

Scoping rules is a nasty source of confusion in understanding lambda calculus so I decide to give it a different name ‘w‘. I’m generally dismissive of shadowing under any circumstances, to the extent that I find Python’s @ syntactic sugar for decorators shadowing the underlying function which could have been reused somewhere (because it’s implicitly doing f=g(f) instead doing h=g(f). I hate it when you see the function with name f then f was repurposed to not mean the definition of f you saw right below but you have to keep in mind that it’s actually a g(f)). See my rants here.

Notational ambiguity, even if resolvable, is NOT helpful at all here especially when there are so many level of abstractions squeezed into so few symbols! People jumping into learning the subject quickly for the first time should not have unnecessarily keep track of the obscure scoping rules in their head to resolve the ambiguities!

Loading

Modifying mutable (like bytearray) arguments’ data in Python functions Use slice assignment x[:] = ... to replace the entire contents. Dynamically typed language lets you shadow input arguments with a local variable!

I’d like to write a function to selectively modify lines read from a file handle and write it back. By default, lines are read as byte()  objects that are immutable, so I converted it to bytearray() instead so it can be modified because only a few lines meeting certain criteria needs to be changed.

When I try to refactor similar operation into a function, I was hoping to pass the mutable bytearray() as an argument and directly modify the caller’s content like in C++, given Python variables works LIKE reference binding.

I know bytearray.replace() does not modify the data in place, but instead outputs the modified line to a new variable. Normally, I can simply do this:

line = line.replace(b'\tCLASS', b'')

and the code will work. However, it doesn’t do anything when I try to pass it as an argument to a Python function (unless I return line as output). Although I am well aware that Python variables assignments to existing variables means orphaning the old data and re-purposing the label, the variable assignment behavior in Python requires careful thought when used in non-idiomatic situation.

In other words, I want this function to have side effects on the variable ‘line‘, but I wasn’t doing it right. This is a tempting mistake for people with a C/C++ background: in C/C++, it is not possible to shadow an input parameter even if we were to explicitly declare it, so the innocent assignment I did above has to modify the object in the caller (passed as a reference to the function) in C/C++, as if I did this directly in the caller.

However, in Python, variables do not need to be declared (aka, dynamically typed). This opens up the possibility of unwittingly shadowing the input parameters, which is what happened here. Mutable arguments on the stack still can be modified through the function, but when you assign a variable using ‘=’ operator, a new local variable with the name on the LHS is created, which shadows the input parameter.

This means the connection to the caller objects is lost during shadowing.

The correct way to do this is use slice assignment (which the logic/concept is very different despite the syntax is similar) to replace all the contents of the input variable with the output of bytearray.replace():

def remove_from_header_token_CLASS(tokens, line):
     # line is expected to be byte array (mutable)    
    try:
        column_CLASS = tokens.index(b'CLASS')
    except:
        column_CLASS = None
    else:
        line[:] = line.replace(b'\tCLASS', b'')  
                
    return column_CLASS

Since Python has a clear distinct concept of parameter variable (from local variable), trying to apply nonlocal keyword over it (in hopes to broaden the scope) will not parse/compile.


This is actually the same behavior as in MATLAB (dynamic typing) for the same reason that variables does not have to be declared like in C/C++ (static typing). In MATLAB, if you choose to have a handle object (which works like references), you can shadow the input argument by creating a local variable of the same name:

classdef DemoHandleClass < handle
    properties
        x = 3;
    end
end

function demo_shadowing()
    C = DemoHandleClass();
    f(C);
    
    disp(C.x)
end

function f(C)
    C = DemoHandleClass();  // Shadowing
    C.x = 14;
end

The above MATLAB program will display 14 without shadowing and 3 with shadowing (C became a new local variable that has nothing to do with the input argument C). MATLAB users rarely run into this because the language design heavily discourage side-effects: we are supposed to return the changed local variable to the caller. The only way to do side-effects in MATLAB is through handles (which you need to establish a class, which is clumsy). Technically you can write the data to external resources (e.g. file) and read it back. But guess what? Resources are accessed through handles, so there’s no escape.


Of course, there’s a better way to do so (MATLAB’s preferred way): return the modified object back to the caller as if they are immutable:

def remove_from_header_token_CLASS(tokens, line):
     # line DOES NOT HAVE TO BE MUTABLE    
    try:
        column_CLASS = tokens.index(b'CLASS')
    except:
        column_CLASS = None
    else:
        line = line.replace(b'\tCLASS', b'')  
                
    return column_CLASS, line

This is what I ultimately used (so I ended up not converting the byte lines to bytearray), given that Python’s tuple syntax make it easy to return multiple outputs like MATLAB. The call ended up looking like this:

column_SPL_CLASS, line = remove_from_header_token_CLASS(tokens, line)                

Nonetheless, I think there’s an important lesson to be learned for doing side-effects in dynamically typed languages. Maybe I’ll need this one day if I get an excuse to do something more complicated that genuinely requires side-effects.


In summary, variable assignments in most dynamically typed languages will shadow the input argument with a newly generated local variable instead of modifying the data in the original input argument. This implies that there function side-effects cannot be carried out through variable assignment.

The most common implication is: do not (equality) assign to a input variable to modify its contents in a dynamically typed language.

Loading

Spyder on MX Linux

This is another example that non-commercial (open-source) Linux/Python does not have a feel of a finished product: things break out of the box when installed fresh, in the most simple, expected ways, without any tweaks.

Again, don’t get me wrong, open-source free software are good stuff (more modern concepts and people working on it for free), but it’s never going to beat professional companies (like Microsoft/MATLAB) in how well-funded they are so they can maintain their software and the user experience using their profits. So far, users are still expected to put up with a bunch of unjustifiably unnecessary work to get to where they want to go with community-maintained software like Linux/Python.

This time I’m installing Spyder on MX Linux. Look at how many damn hoops I have to jump to get Spyder 3 to function properly there:

  1. I installed python3-spyder from MX Package Installer
  2. Spyder complained about missing rope on start
  3. Installed python-rope on MX Package Installer. The complaint still won’t go away
  4. I tried follow the instruction sudo pip3 install rope_py3k  and realized pip3 was not installed already with the Python that came with Spyder! (Didn’t have the problem with the Windows counterpart).
  5. Installed python3-pip from MX Package Installer.
  6. Came back and run pip3 install rope_py3k. It choked at "Command 'python  setup.oy egg_info' failed with error code 1 in /tmp/pip-build-0nnknjhi/rope-py3k". Again, known problem.
  7. Followed the solution in the comments pip3 install --upgrade setuptools
  8. Then come back and run pip3 install rope_py3k again. It says "Failed building wheel for rope-py3k" in between, but nonetheless I’ll try to move on since it says "Successfully installed rope-py3k-0.9.4.post1"

Then Spyder launch uneventfully.

These are not design decisions (sacrificing one quality for another), but inter-operability wrinkles that nobody are paid enough to do the grunt-work babysitting it. So if your business profits heavily from it, consider sponsoring the developers!


It’s also slightly annoying that the version of Spyder maitained in MX Linux’s most recent repository is a little older than what’s actually available (3.1.3+dfsg1-3 instead of 3.3.4).

At first I followed instructions to have PIP to update it: pip3 install -U spyder, but it doesn’t work. I got a lot of “failed building wheel for (package)” error.

I also realized the Python that came with it is 3.5, not the 3.7(.3) I had in Windows. I checked the MX package manager and indeed it stopped at 3.5. After some searching, I learned the base system package was frozen in 2016! MX Test Repo (at your own risk) has Python 3.7 though.

Loading

Watch out for ‘const’ method in Python

One thing I feel a little bit not quite as intuitive when I switch to Python is I constantly have to look up whether the method directly updates the contents or it’ll return a different object (of the same type) that I’ll have to overwrite the input variable myself.

An example would be strings and bytes object. replace() sounded like an updating method, but it’s actually a ‘const’ method (a term borrowed from C++ to say that the method does not have side-effects) that does not change the state of the object.

I initially thought this has to do with whether the object is immutable or not, but I tried it on bytearray objects (which is mutable), replace() behaves consistently with the identically named methods in other immutable objects (bytes object, string object): you’ll need to assign the output to self (basically bind the name to the temporary and throw away the original).

bts = b'test'
bts.replace('es', 'oas')       # dumps the output to workspace (can be accessed by _) and do nothing else
bts = bts.replace('es', 'oas') # actually updates bts

 

Loading