Regex Notes

Concepts

Mechanics

  • . any character
  • \ escapes special characters
  • characters (\d digits,\w word (i.e. letter/digit/underscore), \s whitespace).
  • [] character classes (define rules over what characters are accepted, unlike the . wildcard)
    [3-7] hypen inside [] bracket can specify ranges to mean things such as `[3,4,5,6,7]`
    [^ ...] is the mirror of it to exclude the mentioned characters
  • | choices (think of it as OR)
  • Complement (i.e. everything but) version are capitalized, such as \D is everything not a \d
  • whitespaces (\n newline, \t tab,

Modifiers

  • repetition quantifiers (? 0~1 times, + at least once, * any times, {match how many times})
  • (? ...) inline modifiers alters behaviors such as how newlines, case sensitivity, whether (...) captures or just groups, and comments within patterns are handled

Positioning rules

  • anchors (^ begins with, $ ends with)
  • \b word boundary

Output behavior

  • (...) capturing group, (?: ...) non-capturing group
  • \(index) content of previous matched groups/chunks referred to by indices.
    This feature generates derived new content instead of just extracting
  • (?( = | <= | ! | <! ) ...assertions...) lookarounds skips the contents mentioned in ...assertion... before/after the pattern so you can toss out the matched assertion from your capture results.

(?s) Also match newline characters (‘single-line’ or DOTALL mode)

Starting with (?s) flag (also called inline modifiers) expands the . (dot) single character pattern to ALSO match multiple lines (not by default).

Useful for extracting the contents of HTML blocks blindly and post-process it elsewhere

(?m) Pattern starts over as a new string for each line (‘multi-line’ mode)

Starting with (?m) flag tells anchors ^ (begin with) and $ (end with) to

Assertions: use lookarounds to skip (not capture) patterns
(?( = | <= | ! | <! ) assertion pattern)

  • < is lookbehind, no prefix-character is lookahead.
    -ahead/-behind refers to WHERE the you want TO CAPTURE relative to the assertion pattern,
    NOT what you want to assert (match and throw) away (inside the (? ...) )
  • = (positive) asserts the pattern inside the lookaround bracket,
    ! (negative) asserts the pattern inside the lookaround bracket MUST BE FALSE.

Assertions are very useful for getting to the meat you really want to capture rather than sifting through patterns introduced solely for making assertions that you intended to throw away

Extract HTML block

(?ms)(?<= starting tag pattern) body pattern (?= terminating tag pattern)

 26 total views

Using 3rd party packages for Powershell Install-Module

It make sense by default if you download 3rd party powershell packages like kbupdate, it should not run right away until you’ve done your due dilligence. You’ll get a warning like this during installation:

Untrusted repository
You are installing the modules from an untrusted repository. If you trust this repository, change its InstallationPolicy value by running the Set-PSRepository cmdlet. Are you sure you want to install the modules from 'PSGallery'?

But when I try to use it, I get an error message:

Get-KbUpdate : The 'Get-KbUpdate' command was found in the module 'kbupdate', but the module could not be loaded. For more information, run 'Import-Module kbupdate'.

Import-Module gives a cryptic message like this:

Import-Module : Errors occurred while loading the format data file:
D:\Administrator\Documents\WindowsPowerShell\Modules\PSFramework\1.6.214\xml\PSFramework.Format.ps1xml, ,
D:\Administrator\Documents\WindowsPowerShell\Modules\PSFramework\1.6.214\xml\PSFramework.Format.ps1xml: The file was
skipped because of the following validation exception: File
D:\Administrator\Documents\WindowsPowerShell\Modules\PSFramework\1.6.214\xml\PSFramework.Format.ps1xml cannot be
loaded because running scripts is disabled on this system. For more information, see about_Execution_Policies at
https:/go.microsoft.com/fwlink/?LinkID=135170..

Turns out either the package needs to be marked safe or just stop checking altogether:

Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy Unrestricted

 23 total views

Triple-Booting Windows 7, XP and DOS

Sometimes I need to do a little bit of retro-computing (not with virtual machines) to support some ancient hardware.

As far as compatibility is concerned, I have yet run across any weird piece of software that specifically requires Windows ME, 2000, Vista or Windows 8 to run that cannot be run with an OS one step up.

Windows 98 SE generally displaces anything from Windows 95 to Windows 98.

Windows 2000/XP usually run anything that are meant for NT starting from 4.0.

Windows NT 3.51 usually run Win32s programs that works on Win 3.1, except it’s way more stable.

Installation Order

The OSes should be installed from old to new:

  • DOS/Win 3.1 + 98 (SE)
  • XP
  • Windows 7

Reorganize boot menu

Windows XP installs a NT52 style (NTLDR) boot menu that recognizes DOS as a partition to boot. Windows 7 installer will install a NT60 style (BCD) boot menu that that the NTLDR loader as an OS (it’s called Earlier version of Windows) instead of directly booting to Windows XP. This means to get to Windows XP / DOS, you’ll have to select twice.

We can fix this by EasyBCD, which rebuilds the bootloader options for the installed OSes. Doing it with bcdedit is a major pain in the arse. There are some quirks to watch out for in the process no matter which path you choose:

  • You might need to boot into safe mode if the current BCD is locked.
  • Whatever OS that you are currently in calls itself C: and everybody else shifted according to the partition order.
  • When setting drive letter for the boot menu item, observe the drive letter scheme currently seen by the host OS. i.e. use C: when referring to the currently booted OS
  • Do not take up on EasyBCD’s offer to detect the drive letter automatically. They are likely to be wrong guesses that won’t boot, likely because of the shifting C: issue.

While you are at EasyBCD, it also offer the option of booting ISO (optical drives) and IMA (floppy) images, which I find it convenient for making the PC a tech service station.

Note that the DOS menu provided by EasyBCD went through an extra layer of indirection called GRUB4DOS, so it’s not as native as going through NT60 (BCD) > NT52 (NTLDR) > DOS in the sense that it installed foreign stuff not made by Microsoft such as Grub.

Tip about bcdedit

  • Some old versions of bcdedit’s /? menu did not tell you about the /store switch, which is necessary to manipulate foreign BCD files instead of the host BCD (that you used to boot to the current Windows you are working in).

 28 total views

Boot Windows 7 (and above) installer with HDD/SDD drives

For some very old system that doesn’t support hardware USB CD-ROM (ISO) emulators (or it only has USB 1.1 ports which is begrudgingly slow), there’s a way to put your installer in a HDD/SSD (IDE/SATA) and boot the installer image on them. Turns out it’s quite easy. All you need to do is copy the set of entire Windows installation files in an MBR drive with partition set active, then write the boot sector to it!

  1. Make sure your HDD is in MBR, not GPT
  2. Make a partition that’s bootable (can be NTFS) by marking it as Active (Active partition only make sense with MBR. That’s why you should make your disk MBR)
  3. Copy all the files from Windows CD image to the drive
  4. Run the following code the build the boot sector for the drive. One interesting twist is that you must run this command from the drive letter you want to rebuild the boot sector (or it’ll refuse to run) yet you have to specify what drive letter to rebuild the boot sector! Let’s call the drive P:\
P:\:> bootsect /nt60 P:\

The /nt60 is the modern boot manager for Windows 7 and above. /nt52 is Windows XP and old NT style (NTLDR) boot manager. Miss the old days when I was using winnt /b!

 73 total views

Windows 10 not sharing files (also not ping-able) out of the box

Firewall exception for “File and Printer Sharing” is not enabled by default. Check the boxes below to enable CIFS/SMB sharing.

Enabling “File and Printer Sharing” also enables pinging into the said Windows 10 machine since this group also enable “Echo Request – ICMPv4) that the details can be seen in Advanced Firewall Config rules.

Command line shortcuts

netsh advfirewall firewall set rule group="Network Discovery" new enable=Yes
netsh advfirewall firewall set rule group="File and Printer Sharing" new enable=Yes

 37 total views,  1 views today

Moving Windows User Folders quickly

Just out of good housekeeping, I’d like to move user folder to another drive letter so I can back it up quickly for re-installing Windows or plan program storage better (anything that can be re-installed, I don’t care to back it up).

There’s a lot of warning about messing with redefining where %userprofile% points to (which is %systemdrive%\Users\%username%) or using symbolic links (such as subst) for file redirection. So I’m sticking to the officially supported ways that doesn’t involve scripting or messing with the registry, i.e. move only the ones Microsoft expects users to be able to move it themselves.

I’ve identified these folders are safe to move:

The basic user shell folders

For a newly installed Windows 10, that’s basically every subfolders in %userprofile% itself!

Here’s the dumb way to do it which is taught nearly everywhere since Windows 7: using the location tab in these special shell folders:

They told you the hard and dangerous way modifying the registry, namely HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\User Shell Folders:

It’d be a pain in the ass to do it for 13 folders in Windows 10 (much less in Windows 7 so it was reasonable to do it this way back in the days). Turns out through experimentation, I figured that you can just move the shell folders to your chosen the destination folder, and the shell folders (which are decorated with extra features) can figure out they are being moved register the notifiers (aka the registry) properly, which is all the steps in location tab condensed to one drag and drop!

I observed the registry location and see what are the impact of the moves. A bunch of new entries are created corresponding to the 13 subfolders being moved

I believe those are the unique names for the named folder as their superficial name can change without having program confused about their nature.

I noticed only 6 core subfolders (which are the bread and butter ones that was there since Windows 7) is updated with the new path.

The others that are not changed are heavily tied to programs you installed (AppData) and Windows Explorer config (start menu, right-click explorer context menus) and IE stuff (cache and cookies). These data do not relate to the typical files users must backup but the configuration files that store user preferences. This is why I’m not surprised when Microsoft tells people not to mess with them because old software cruft might not handle them in a unified way after 20+ years of evolution.

App/Tiles (Metro UI) data starting Windows 8

Easus’ blog page might have confused the shell folders with files for Metro UI (Apps) and thought this is another way of moving files, which isn’t. This is the additional step specific to Tiled App files:

What I forgot to annotate above is “New apps will save to:” will also generate a \WindowsApps\MutableBackup folder. Such “Program Files” is owned by ‘SYSTEM’ account and “WindowsApps” is owned by ‘TrustedInstaller’ account, which you cannot clean them up after you changed your mind without first taking ownership and give yourself full permissions. Here are the folders created by the first item of the “Change where new contents is saved” page:

It’s usually more convenient to move the shell folders to {target drive}:\%username% that’s shared with the special folders for the Apps with the same name so Apps and programs can share files with a common folder. But technically these are two split concepts and you are free to make them separate.

The registry is not where you should muck with the path. Please let Windows’s proper user interface (shell folder’s Location folder handling) do it as registry is just one of the many places they will manage the settings. Also remember moving these special (aka shell) folders do not move your %userprofile% which is your home folder that things like Powershell starts with by default (I had to change the working directory and there isn’t a variable associated with the {target drive}:\%username% because variables to those special folders do not manage their root.

 29 total views

Concurrent sessions (RDP wrapper)

Basically allowing multiple users logged into Windows at the same time boils down to modifying 12 dwords (32-bit words) or 48 bytes in termsrv.dll and the content and location (byte offset) varies depending on which version it is.

The rest is just fighting with file lock behavior, Windows self-repair, updates, anti-virus and things that tries to ‘correct’ the modified file so the changes stick. These are hell of hard and tedious.

RDP wrapper was the tool to do this but it stopped updating since 2017 so it won’t track any updates and newer versions pushed later.

There is a tool called autoupdate that rides on top of RDP wrapper which automates the process for each version that was pushed out with the (byte location, byte values) defined in a config file called rdpwrap.ini.

Basically extract all the zip files to the same folder and copy the latest INI file along with it, run the helper routine

autoupdate -taskadd

to add the startup hook (running helper\autoupdate__enable_autorun_on_startup.bat does the same thing) then run the main routine

autoupdate

That’s it!

 29 total views

How to get a CAP (cantonese IME /w swearwords) and coexist with mozc (Japanese IME) in Linux

I’ve been using CPIME for ages and I’m comfortable with Sidney Lau’s phonetic scheme. Jyutping is unnatural to Hongkongers because we do not consider ‘j’ a ‘y’ sound like Germans do.

However, since Windows 10, there aren’t much choices when it comes Cantonese IME that defaults to Sidney Lau’s and yet it accommodates common swearwords (including the most common ones that were technically incorrect) well. The only reasonable choice is Andrew Choi’s CAP, which I will write about how to get it working for Windows 10 on another blog post.

There aren’t much choices for Linux either. There’s RIME, but it’s super hard to install and Sidney Lau’s phonetic scheme is buried deep down that can only be changed with shortcut keys during the IME composition mode. The deal breaker is t.he lack of swearword support. Being able to type 林鄭我𨳒你老母 is essential for every self-respecting Cantonese speaker. 「屌」你老母 just won’t cut it. Lol.

CAP takes a quite bit of wrestling to get it to install in Windows (in another post) and quite a bit of wrestling to get it to function in Linux. Once you get it working, it’s a very powerful Cantonese IME that allows superfast typing unless you plan to play with words (玩食字). I just can’t praise the IME design enough and I was willing to deal with the quirks which curbs its wide adoption.

Andrew Choi made a few release at his blog page in 2012 (ibus), 2015 (ibus), 2018 (fcitx), 2019 (fcitx4) and 2021 (fcitx4). For the linux version, this blog post is only concerned with the 2021 version (latest at the time of writing).

For Linux CAP, installing the debian package is the easiest part:

sudo dpkg -i fcitx-cap_1.0.0_amd64.deb

The thorns are

  1. get the CAP show up on the list of valid input methods
  2. fend of fcitx5 which is trying to kill CAP
  3. deal with IME settings state corruption (especially when working with other IME)
  4. live with being unable to select characters from subsequent pages of selection candidates

CAP is not immediately available as an IME out of the box
(even after installing the .deb package)

.

You will need to add Chinese to “Install / Remove Language” under “Language Support” to get anything to show up there!

Fend off fcitx5

fcitx5 was recently released and Ubuntu is aggressively trying to push it onto every user. However, its very existence kills the currently available CAP which is written for fcitx4 as of the 2021 release. This means you will have to give up fcitx5 if you want to use CAP!

Fcitx5 is considered as a replacement for fcitx4, so whenever Ubuntu sees that you have fcitx installed (which is likely fcitx4), it’ll tempt you into installing fcitx5. DO NOT ACCEPT THE INVITATION! However, fcitx5 do not coexist with fcitx4. Your fcitx4 will be removed the moment you installed fcitx4. To go back to fcitx4, you have first to remove fcitx5 completely then re-install fcitx.

What makes things more complicated is that Ubuntu’s gnome Language Support GUI keeps prompting you to install fcitx5 whenever you start it or do something with it such as installing new languages (which is required as the first non-obvious step). It’ll typically try to deceive you into installing fcitx5 with a dialog box like this:

but if you open up the details it’s fcitx5 which will cockblock CAP

However, when you try to perform the first step, if you already have fcitx (fcitx4) installed, adding new languages (required to get CAP) to work will come bundled with upgrades to fcitx5! It’d be super frustrating. So you can choose between the two paths

1) Concede to fcitx5 and downgrade to fcitx4

  • install the languages first (with fcitx5 IMEs),
  • remove fcitx5
  • install fcitx4

2) Prevent fcitx5 in the first place

  • remove fcitx4,
  • install the languages (no fcitx IMEs)
  • install fcitx4

Remove fcitx4

sudo apt purge fcitx

Remove fcitx5

sudo apt purge fcitx5*

Install languages

If you already have fcitx installed (path #1), you’ll have to click yes and live with fcitx4 being upgraded to fcitx5 which you’ll have to destroy it later and reinstall fcitx4.

If you already removed fcitx, Language Support will only install IME for other systems such as ibus associated with installed languages

Install fcitx4 AND activate it

sudo apt install fcitx

Remember to select the installed Fcitx 4 as your IME system (not ibus, etc, or none):


Now CAP is on the list of available IMEs in fcitx-configuration


Learn the new shortcut keys that are different from Windows

One default out of the box that’s hard to guess is moving from page to page. It used to be PageUp/PageDown but CAP follows fcitx’s global configuration moving between pages, which is the lower case of +/- keys which is basically =/- because + is upper case while it was intended to be lower case. I know, this is confusing!

IME switching follows the OLD windows shortcut keys (like Windows 98 and XP days), which

  • Ctrl+Space means turning IME on/off (global sense),
  • Ctrl+Shift changes IME languages (newer Windows use Alt+Shift by default)
  • Shift to temporarily disable/enable the IME (i.e. English mode) but stay within the language state

More customizations to get it closer to Windows IME behavior

CAP follows the global config settings in fcitx, unlike mozc (Japanese IME) which sometimes play by its own rules which behaves similar to its Windows’ counterpart.

If you are used to CPIME’s vertical lists, you can change it in ‘Appearance’ tab.


CAP candidate selection quirks when used with mozc (bug?)

For some reason, when both CAP and mozc are freshly installed, the first time you use the candidate list in mozc by selecting space/tab, the candidate list will disappear!

I installed and uninstalled fcitx, mozc and CAP and realized narrowed the bug to this reproducible path. My suspicion is that there’s a setting regarding the candidate selection shortcut (usually by ‘1’~’9’+’0′) parameter state that’s not exposed in fcitx-configuration that was being changed my mozc. And this guess puts me closer as I was able to play around with mozc’s config and found a candidate selection shortcut option

Note that mozc only has a max shortcut of 9 items (instead of 10, that means the ‘0’ key is not available as shortcut key) despite fcitx-configuration’s Global Config has a different idea (which CAP can use the 10th key, aka ‘0’ as candidate selection shortcut)

I noticed that after switching ‘1’-‘9’ to ‘a’-‘l’ (or no shortcut) mode, activate it in mozc by using space key to expand selection (this is necessary or the change won’t happen), I get the ‘1’-‘9’+’0’ candidate selection shortcut when I go back to CAP. I also noticed if I messed with the maximum number of suggestions in mozc a few times, I can get into an undefined state in CAP where it shows the candidate selection shortcut for the first few but not the rest, such as this:

I also noticed CAP has one consistent bug that the candidate selection (not just the keys) ONLY WORKS FOR THE FIRST PAGE! I tried to use the candidate selection shortcuts or click on the character with mouse for subsequent pages, it only commit the current word choice disregarding the selection!

 30 total views

Taking screenshots in Ubuntu with Flameshot

I’m spoiled by Greeshot in Windows and I find the gnome-screenshot that came with Cinnamon lacking.

With Greenshot, Print Screen key by default selects an area (which should be the most versatile and productive mode which should be prioritized with least complicated keys) and prompts you on whether you want to copy to clipboard or save a file.

Gnome-screenshot went with the most traditional behavior where Print Screen keys captures the entire screen AND it saves to a default file in ~/Pictures folder with timestamped file names, which I don’t want (I prefer copying to the clipboard).

So natively in gnome, Ctrl is the modifier for copying to clipboard. Shift is the modifier for selecting a section of the screen. So the most common operation I want to do became a bit of finger gymnastics Ctrl+Shift+Print and there are no immediate access to image editor like Greenshot (don’t even bother with GIMP, it’s slow to load and convoluted).

I discovered a much neater app called Flameshot. It has a much quicker design that you can select a section of a screen and do the most common screenshot edits on the fly and copy to the clipboard or save to files, even faster than Greenshot, which opens the captured image into a separate editor and you have to click file->copy-to-clipboard after edits!

Turns out that Flameshot does not use the native gnome’s print screen categories,

so establishing the shortcut has to be done as just a simple shortcut for running the command:

/usr/bin/flameshot gui

and I chose to use Win/Super+Print for Flameshot

Again, the command is /usr/bin/flameshot gui (or where flameshot is located if it’s not under /usr/bin)

 31 total views,  1 views today

Ubuntu cannot ping Windows hostnames out of the box (resolving NETBIOS announcements)

Out of the box, Ubuntu cannot resolve hostnames announced by Windows out of the box.

The internet had many solutions from ditching NETBIOS (winbind, wins) but it involves replacing systemd-resolved with the old NetworkManager (systemd-resolved was an extra level of indirection to break VPN ties), which I illustrated in this now deprecated blog post.

Having router assign host names you specify often included a local domain name (must not choose one that conflicts with the internet) such as local or lan. So the computers are accessed in the format of myPC.local or myPC.lan depending on the local domain name you picked. However, it doesn’t take advantage of the hostname announced by Windows computers.

I decided to give the NETBIOS service a second research today and found the missing link to the common solution of installing winbind and adding wins entry to host search order in /etc/nsswitch.conf (you can put it at the end or earlier if you want). I put it at the end as I wanted it to be the last resort

hosts:          files {a bunch of things depending on your system} wins 

Of course having a wins entry in the hosts search order involves installing winbind make sure the winbind service is running

sudo apt install winbind

The missing piece is editing /etc/samba/smb.conf to inject a name resolve order list after installing samba and winbind:

name resolve order = wins lmhosts bcast

You will need to install samba package first if you haven’t already installed it (for sharing folders with Windows)

sudo apt install samba

The post said the name resolve order section was commented out, but in newer version of samba, the line is simply non-existent. You’ll have to add it somewhere in /etc/samba/smb.conf, I chose to put it right at the beginning of [global] section.

Restart the services after editing to reflect the changes and you can start pinging!

sudo systemctl restart nmbd smbd winbind

So in the process above (installing samba and winbind and editing nsswitch.conf), you’ve also enabled linux to announce its hostname to Windows, which I’ve discussed in this blog post.

So to summarize the concepts,

  1. You need to install winbind to add wins to host search list in nsswitch.conf, but it doesn’t do you anything yet!
  2. Once you installed samba, your linux computer start announcing its hostname to Windows computers
  3. To be able to use the hostnames announced by Windows, i.e. the other direction, you’ll need to add the name resolve order line to smb.conf (samba config file) and restart samba and winbind.

 36 total views,  1 views today