Powershell notes (for MATLAB/python users)

Data Type characteristics

PowershellMATLABPython
Nearly everything is a/anObjectMatrix (APL-philosophy)Object (which are dictionaries)
Assignment behavior*Reassigned referenceCopy-on-writeReassigned reference
Monads (wrapper for heterogenous data )Array/CollectionsCellsLists

* Shallow assignment (transferring reference only) means the LHS does not have its own copy, so modifying the new reference will modify the underlying data on the RHS.

Syntax / Usage

PowershellMATLABPython
Method chainingYesMight misbehaveYes
List ComprehensionNo. Map first then filterYes
Named input argumentsNative
f -a 1 -b 22
Name-Value pairs parsed insideNative
f(a=13, b=22)
Implicit NON-NULL return valueOptional
Binary map operationNative matrix ops
*fun() does n-ary
Use numpy
list( map(operator.add, L1, L2) )
Check Type$g -is [type]is*() or isa()isinstance(val, type)
Unpacking (flattening)
monads in monads
Default
Use unary , to avoid
No
Use [{:}] to perform
No
Use *, list comp, or
list(itertools.chain(*ls))
Conditional/statement block inside container creationYes?
View Object Info with Data| Format-List -Property *
or
Format-List -InputObject
properties()
methods()
get()
List members (method and properties)’s prototypes| Get-Member

Powershell specific

  • The UNCAPTURED output value in the last line of the block is the return value! Unary side effect statements such as $x++ do not have output value. Watch out for statements that looks like it’s going nowhere at the end of the code as these are not nop/bugs, but return value. This has the same stench as fall-throughs.
  • foreach() follows the last uncaptured output value return rule above doing a 1-to-1 map from the input collection to output collection (you can assign output to foreach() as it’s also seen as a function)
  • Powershell suck at binary operations between two arrays. Just an elementwise A+B you’d be thinking in terms of loops and worry about dimensions.
  • You can put if and loop blocks inside collections list construction, like this:
@( 3, if(cond1){...; $v1}  do{...; $v2}while(cond2) )

MATLAB specific

  • When used with classes and custom matrices/arrays, chaining fields/properties/methods by indices often do not work, when they do, they often give out only the first element instead of the entire array (IIRC, there are operator methods that needs to be coordinated in the classes involved to make sure they chain correctly). In short, just don’t chain unless in very simple, scalar cases. Always output it to a variable a access the leaf.

Range & Indexing

PowershellMATLABPython
Logical IndexingYesNo. Use list comprehension/Numpy
Negative (cyclic) IndexingYesYes
end‘ of array keywordYesNo. Skip stop in slice instead
Step (skip every n items)YesYes. Both range or slice
Detect descending rangeYes
Automatic extend arrayYes
Reading array out of boundsDo nothingErrorError

Negative (cyclic) indexing along with automatic descending range, along with the lack of ‘end’ keyword is a huge pain in the rear when you want to scan from left to right like A[5:end].

Instead, you’ll have to do $A[4..($A.length-1)] because the range 4..-1 inside A[4..-1] is unrolled as 4,3,2,1,0,-1 (thus scanning from right to left and wraps around) without first consulting with the array A like the end keyword in MATLAB does so it can substitute the ends of the range with the array information before it unrolls.

I am willing to bet that this behavior does not have a sound basis other than people thinking negative indices and descending ranges alone are two good ideas without realizing that nearly nobody freaking wants to scan from right to left and wrap around!

I had the same gripes about negative indices in Python not carefully coordinating with other combinations in common use cases which cases unintuitive behavior.

Range indexing syntax

# Powershell
1..10 # No step/skip for range creation
A[1..10]  # No special treatment in array such as figuring out the 'end'

% MATLAB
A[start:(step):stop]

# Python
A[range(start,stop,step)]
# Slicing (it's not range)
A[(start):(stop):(step)] # Can skip everything 
# In Python, A=X merely reassign the label A as the alias for X.
# Modifying the reassigned A through A=X will modify underlying contents of X
# To deep-copy contents without .Clone(), assign the full slice
A[:] = X

Hasthtable / Dictionaries

% MATLAB: Use dynamic fields in struct or containers.Map()
# Python: dictionaries such as {a:1, b='x'}
# Powershell: @{a=1, b='x'}

Structs

Powershell does not have direct struct or dynamic field name struct. Instead if your object is uniform (you expect the fields not to change much), use [PSCustomObject]@{}. You can also just use simple hashtable @{}, but for some reason it doesn’t work the way I expected when put into arrays when I try to reference it by array index.

Array rules surprises

  • Array comparisons are filtering operation (not boolean array output like MATLAB). (0..9) -ge 5 gives 5 to 9, not a list of False … False, True … True. To get a boolean array, use this shortcut:
(0..9) | % {$_ -ge 5}

Map-filter combo syntax is | ? instead of Map syntax | %

  • Monad (Cells in MATLAB) are unpacked and stacked by default (in MATLAB, I had to write a lot of routines to unpack and stack cells of cells). To keep cells packed (in MATLAB lingo, it’s like ‘UniformOutput’, false in cellfun), add a comma unary operator in front of the operation that are expected to be unpacked like this:
.$_.Split('_')

Set Operations

This is one of the WTF moments of Powershell as a programming language. Convenient set operations is essential for most of the routine boring stuff that involves relational data. A lot of Powershell’s intended audience works in database like environment (like IT managers dealing with Active Directory), they have Group-Object for typical data analysis tasks, yet they make life miserable just to do basic set operations like intersection and differencing!

Powershell has a Compare-Object, but this is as unnatural and annoying to use as users are effectively rebuilding all 4 basic set-ops (intersection, union, set-diff, xor) based on any two! Not to mention you have to sift through table to get to the piece you wanted!

Basically Compare-Object out of the box

  • is a set-diff showing both directions (A\B and also B\A) at the same time. If you throw away the direction info, it’s xor.
  • if you want intersection, you’ll need to add -IncludeEqual -ExcludeDifferent
  • (WTF!) If you just specify -ExcludeDifferent, by definition there’s no output because by default Compare-Object shows you ONLY the two set-diffs and you are telling it to not show any diffs!
  • Union is specifying -IncludeEqual only. But it’d rather stack both then do a | Sort-Object - Unique

Some people might suggest doing | ? {$_ -eq $B} for intersection (or is-member). This is generally a bad idea if you have a lot of data because it’s in the O(n*n) runtime algorithm (loop-within-loop) while any properly done intersection algorithm will just sort then scan the adjacent item to check for duplicates, which gives O(n log(n)) time (typical sorting algorithm takes up most of the time).

If you noticed, it’s set operations within the outputs of Compare-Objects with the Venn diagram of -IncludeEqual -ExcludeDifferent switches! It’s doable, but totally unnecessary mindfuck that should not be repeated frequently.

In MATLAB land, I made my own overloading operators that do set operation over cellstr(), categorical and tabular objects (I went into their code and added the features and talked to TMW so they added the features later), sometimes getting into their sort and indexing logic as necessary. This shows how badly do I need set operations to come naturally.

One might not deal with it too much in low level languages like C++ (STL set doesn’t get used as much compared to other containers), but for a language made to get a lot of common things done (i.e. the language designer kind of reads the users mind), I’m surprised that the Powershell team overlooked the set operations!

Sets are very powerful abstractions that should not be made less descriptive (hard to read) by dancing around it with equivalent operations with some programming gymnastics! If these basic stuff are not built in, we are going to see a lot of people taking ugly shortcuts to avoid coding up these bread and butter functions and put it in libraries (or downloading 3rd-party libraries)!

Powershell surprises

  • Typical symbolic comparison operators do not work because ‘>’ can be misinterpreted as redirection in command prompts. Use switches like -gt (greater than) instead.
  • Redirection’s default text output uses UTF16-LE encoding (2 bytes per character). Programs assuming ASCII (1 byte per character) might not behave as intended (e.g. if you use copy command merge an ASCII/UTF8 file with UTF16-LE, you might end up with spaces in the sections that are formatted with UTF16-LE)
  • Cannot extract string matches from regex without executing a -match which returns boolean unless we use the the $matches$ spilled into variable space. Consider [regex]::Match($Text, $Pattern).Groups[1].Value
  • Methods are called with parenthesis yet functions are not called with parenthesis, just like cmd-lets! Trying to call a function with multiple input arguments with parenthesis like f(3,5) will be interpreted as calling f with ONE ARGUMENT containing an ARRAY of 3 and 5!
  • Write-Host takes everything after it literally (white spaces included, almost like echo command), with the exception of plugging in $variables! If you want anything interpreted, such as concatenation, you need to put the bracket around the whole statement!

Libraries and Modules

  • Reload module using Import-Module $moduleName -Force

Loading

Regex Notes

Concepts

Mechanics

  • . any character
  • \ escapes special characters
  • characters (\d digits,\w word (i.e. letter/digit/underscore), \s whitespace).
  • [] character classes (define rules over what characters are accepted, unlike the . wildcard)
    [3-7] hypen inside [] bracket can specify ranges to mean things such as `[3,4,5,6,7]`
    [^ ...] is the mirror of it to exclude the mentioned characters
  • | choices (think of it as OR)
  • Complement (i.e. everything but) version are capitalized, such as \D is everything not a \d
  • whitespaces (\n newline, \t tab,

Modifiers

  • repetition quantifiers (? 0~1 times, + at least once, * any times, {match how many times})
  • (? ...) inline modifiers alters behaviors such as how newlines, case sensitivity, whether (...) captures or just groups, and comments within patterns are handled

Positioning rules

  • anchors (^ begins with, $ ends with)
  • \b word boundary

Output behavior

  • (...) capturing group, (?: ...) non-capturing group
  • \(index) content of previous matched groups/chunks referred to by indices.
    This feature generates derived new content instead of just extracting
  • (?( = | <= | ! | <! ) ...assertions...) lookarounds skips the contents mentioned in ...assertion... before/after the pattern so you can toss out the matched assertion from your capture results.

(?s) Also match newline characters (‘single-line’ or DOTALL mode)

Starting with (?s) flag (also called inline modifiers) expands the . (dot) single character pattern to ALSO match multiple lines (not by default).

Useful for extracting the contents of HTML blocks blindly and post-process it elsewhere

(?m) Pattern starts over as a new string for each line (‘multi-line’ mode)

Starting with (?m) flag tells anchors ^ (begin with) and $ (end with) to

Assertions: use lookarounds to skip (not capture) patterns
(?( = | <= | ! | <! ) assertion pattern)

  • < is lookbehind, no prefix-character is lookahead.
    -ahead/-behind refers to WHERE the you want TO CAPTURE relative to the assertion pattern,
    NOT what you want to assert (match and throw) away (inside the (? ...) )
  • = (positive) asserts the pattern inside the lookaround bracket,
    ! (negative) asserts the pattern inside the lookaround bracket MUST BE FALSE.

Assertions are very useful for getting to the meat you really want to capture rather than sifting through patterns introduced solely for making assertions that you intended to throw away

Extract HTML block

(?ms)(?<= starting tag pattern) body pattern (?= terminating tag pattern)

Loading

Using 3rd party packages for Powershell Install-Module

It make sense by default if you download 3rd party powershell packages like kbupdate, it should not run right away until you’ve done your due dilligence. You’ll get a warning like this during installation:

Untrusted repository
You are installing the modules from an untrusted repository. If you trust this repository, change its InstallationPolicy value by running the Set-PSRepository cmdlet. Are you sure you want to install the modules from 'PSGallery'?

But when I try to use it, I get an error message:

Get-KbUpdate : The 'Get-KbUpdate' command was found in the module 'kbupdate', but the module could not be loaded. For more information, run 'Import-Module kbupdate'.

Import-Module gives a cryptic message like this:

Import-Module : Errors occurred while loading the format data file:
D:\Administrator\Documents\WindowsPowerShell\Modules\PSFramework\1.6.214\xml\PSFramework.Format.ps1xml, ,
D:\Administrator\Documents\WindowsPowerShell\Modules\PSFramework\1.6.214\xml\PSFramework.Format.ps1xml: The file was
skipped because of the following validation exception: File
D:\Administrator\Documents\WindowsPowerShell\Modules\PSFramework\1.6.214\xml\PSFramework.Format.ps1xml cannot be
loaded because running scripts is disabled on this system. For more information, see about_Execution_Policies at
https:/go.microsoft.com/fwlink/?LinkID=135170..

Turns out either the package needs to be marked safe or just stop checking altogether:

Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy Unrestricted

Loading

Triple-Booting Windows 7, XP and DOS

Sometimes I need to do a little bit of retro-computing (not with virtual machines) to support some ancient hardware.

As far as compatibility is concerned, I have yet run across any weird piece of software that specifically requires Windows ME, 2000, Vista or Windows 8 to run that cannot be run with an OS one step up.

Windows 98 SE generally displaces anything from Windows 95 to Windows 98.

Windows 2000/XP usually run anything that are meant for NT starting from 4.0.

Windows NT 3.51 usually run Win32s programs that works on Win 3.1, except it’s way more stable.

Installation Order

The OSes should be installed from old to new:

  • DOS/Win 3.1 + 98 (SE)
  • XP
  • Windows 7

Reorganize boot menu

Windows XP installs a NT52 style (NTLDR) boot menu that recognizes DOS as a partition to boot. Windows 7 installer will install a NT60 style (BCD) boot menu that that the NTLDR loader as an OS (it’s called Earlier version of Windows) instead of directly booting to Windows XP. This means to get to Windows XP / DOS, you’ll have to select twice.

We can fix this by EasyBCD, which rebuilds the bootloader options for the installed OSes. Doing it with bcdedit is a major pain in the arse. There are some quirks to watch out for in the process no matter which path you choose:

  • You might need to boot into safe mode if the current BCD is locked.
  • Whatever OS that you are currently in calls itself C: and everybody else shifted according to the partition order.
  • When setting drive letter for the boot menu item, observe the drive letter scheme currently seen by the host OS. i.e. use C: when referring to the currently booted OS
  • Do not take up on EasyBCD’s offer to detect the drive letter automatically. They are likely to be wrong guesses that won’t boot, likely because of the shifting C: issue.

While you are at EasyBCD, it also offer the option of booting ISO (optical drives) and IMA (floppy) images, which I find it convenient for making the PC a tech service station.

Note that the DOS menu provided by EasyBCD went through an extra layer of indirection called GRUB4DOS, so it’s not as native as going through NT60 (BCD) > NT52 (NTLDR) > DOS in the sense that it installed foreign stuff not made by Microsoft such as Grub.

Tip about bcdedit

  • Some old versions of bcdedit’s /? menu did not tell you about the /store switch, which is necessary to manipulate foreign BCD files instead of the host BCD (that you used to boot to the current Windows you are working in).

Loading

Boot Windows 7 (and above) installer with HDD/SDD drives

For some very old system that doesn’t support hardware USB CD-ROM (ISO) emulators (or it only has USB 1.1 ports which is begrudgingly slow), there’s a way to put your installer in a HDD/SSD (IDE/SATA) and boot the installer image on them. Turns out it’s quite easy. All you need to do is copy the set of entire Windows installation files in an MBR drive with partition set active, then write the boot sector to it!

  1. Make sure your HDD is in MBR, not GPT
  2. Make a partition that’s bootable (can be NTFS) by marking it as Active (Active partition only make sense with MBR. That’s why you should make your disk MBR)
  3. Copy all the files from Windows CD image to the drive
  4. Run the following code the build the boot sector for the drive. One interesting twist is that you must run this command from the drive letter you want to rebuild the boot sector (or it’ll refuse to run) yet you have to specify what drive letter to rebuild the boot sector! Let’s call the drive P:\
P:\:> bootsect /nt60 P:\

The /nt60 is the modern boot manager for Windows 7 and above. /nt52 is Windows XP and old NT style (NTLDR) boot manager. Miss the old days when I was using winnt /b!

Loading