MATLAB repeating arrays (elementwise array replication, interleaved ‘repmat’)

Since MATLAB R2015b, there’s a new feature called repelem(V, dim1, dim2, ...) which repeats each element by dimX times over dimension X. If N (dim1) is scalar, each V is uniformly repeated by N times. If N is a vector, it has to be the same length as V and each element of N says how many times the corresponding element in V is repeated.

Here are some historical ways of doing it (as mentioned in MATLAB array manipulation tips)

The scalar case (repeat uniformly) can be emulated by a Kronecker product multiplying everything with 1 (self):

kron(V, ones(N,1))
Just replace all the elements b with 1 so we are left with elements of A repeating the way we wanted

Kron method is conceptually smart but it has unnecessary arithmetic (multiply by 1). Nonetheless this method is reasonable fast until TMW finally developed a built-in function for it that outperforms all the tricks people have accumulated over decades.

The vector case (each element is repeated a different number of times according to vector N) is basically decoding Run-Length Encoding (RLE), aka counts to placements, which you can download maturely written programs on MATLAB File Exchange (FEX). There are a bunch of cumsum/diff/accumarray/reshape tricks but at the end of the day, they are RLE decoding in vectorized forms.


There’s a name for almost each recurring problem that we can think of in MALTAB. Before jumping in and implementing your for loop, ask around and try to find the right keyword/terms to describe your problem! >99.9% of the time your problem is not new!

The most odd-ball MATLAB algorithm scenario I’ve ever came across that requires original thought is the ‘Jenga Matrix‘ (I coined the name) while I was working at Stanford University Medical School as a research assistant for MADIT-CRT.

MATLAB’s OOP was not mature at that time, so dataset() objects didn’t surface. The reason for the ‘Jenga Matrix’ was to create ‘sparse cells’ which uses a sparse matrix with non-zero indices mapping to a cell vector so I can make a table (that’s approximately the guts of heterogenous data structure).

As I remove elements of the ‘sparse cell matrix’, I didn’t want holes in it to accumulate so I’ll have to periodically compact the underlying cell vector and shift the indices to reflect the indices after compacting. Normally if you have to mess with these kind of ingenious indexing algorithms, you are working on some generic abstractions/tools rather than the business logic itself.

There’s no ultimate correct way to implement something in MATLAB, but there are tons of bad ways that is strictly worse under all circumstances! Being smart with these little toy (Cody) problems like array manipulation do not really show practical proficiency in MATLAB. Anybody can spend a day or two to solve a genuinely new algorithm puzzle or just ask around in the forums if you run into it once in a blue moon. Who cares if you can do it 5 times faster if it’s just <1% of the development time?

Most of your time should be spent on using MATLAB to succinctly and intuitively describe your business logic (which requires exploring and understanding your project requirements deeply), and hide the boring background work with generic abstractions (e.g. RDBMS and RLE)! People should be able to read your function and variable names and form a clear picture of what your codebase is trying to achieve instead of stumbling over smart-ass idioms that’s not immediately obvious (which should buried in the lowest level of generic tool functions if you had to develop it in-house).

Even a mathematician in Linear Algebra using MATLAB for 40 years doesn’t mean he’s good at MATLAB! The real MATLAB skills are keeping up with MATLAB has to offer for a variety of scenarios relevant to the task at hand (or know enough abstract concepts like functional programming, OOP, database, etc, to be able to find out the right tools quickly), which is a hell lot of knowledge considering MATLAB covered most common scenario imaginable (the vast majority of MATLAB users aren’t aware of the full offerings and used MATLAB the wrong/hard way)!

 5 total views,  5 views today

Relational Database Concepts = Heterogenous Data Tables = Spreadsheets

This blog post is development in process. Will fill in the details missing details (especially pandas) later. Some of the MATLAB syntax are inaccurate in the sense that it’s just a description that is context dependent (such as column names can be cellstr, char string or linear/logical indices).

Mechanics:

Concepts SQLMATLAB (table/dataset): TPandas (Dataframe): df
tablesFROM(work with T)(work with df)
columns
variables
fields
SELECT T.(field)
T(:, columns/varnames)
rows
records
WHERE
HAVING
T( logical index using T, : )
T_grp( logical index using T_grp, : )
conditionsNOT
IS
IN
BETWEEN
~
==, isequalnan(), isequal()
ismember()
a<=b & b<=c
Inject table to
another table
INSERT INTO t2
SELECT vars FROM t1
WHERE rows
T2(end+(1:#rows), vars) = T1(rows, vars)
Insert record/rowINSERT INTO t (c1, c2, ..)
VALUES (v1, v2, ..)
T=[T; table(v1, v2, ..., 'VariableNames', {c1,c2,...}]
T( end+1, : ) = table(...)
update records/elementsUPDATE table
SET column = content
WHERE row_cond
T.(col)(row_cond) = content
New table
from selection
SELECT vars
INTO t2
FROM t1
WHERE rows
T2 = T1(rows, vars)
clear tableTRUNCATE TABLE tT( :, : )=[]
delete rowsDELETE FROM t WHERE cond
(if WHERE is not specified, it kills all rows one by one with consistency checks. Avoid it and use TRUNCATE TABLE instead)
T( cond, : ) = []

Core database concepts:

ConceptsSQLMATLAB (table/dataset)Pandas (Dataframe)
linear indexCREATE INDEX idx ON T (col)T.idx = (1:size(T,1))'
group indexCREATE UNIQUE INDEX idx ON T (cols)[~, T.idx] = sortrows(T, cols)
(old implementation is grp2idx())
set operationsUNION
INTERSET
union()
intersect()
setdiff(), setxor()
sortORDER BYsortrows()
uniqueSELECT DISTINCTunique()
reduction
aggregration
F()@reductionFunctions
groupingGROUP BYSpecifying ‘GroupingVariables’ in varfun(), rowfun(), etc.
partitioning(set partition option in Table Definition)T1=T(:, {'key', varnames_1}),
T2=T(:, {'key', varnames_2})
joins[type] JOIN*join(T1, T2, ...)df.join(df2, …)
cartesian productCROSS JOIN
(misnomer, no keys)
T_cross = [repelem(T1, size(T2,1), 1), repmat(T2, [size(T1,1), 1])].
Function programming concepts map (linear index), filter (logical index), reduce (summary & group) are heavily used with databases

Formal databases has a Table Definition (Column Properties) that must be specified ahead of time and can be updated in-place later on. Heterogenous Data Tables can figure most of that out on the fly depending on context. This impacts:

  • data type (creation and conversion)
  • unspecified entries (NULL).
    Often NaN in MATLAB native types but I extended it by overloading relevant data types with a isnull() function and consistently use the same interface
  • default values
  • keys (Indices)

SQL features not offered by heterogenous data tables yet:

  • column name aliases (AS)
  • wildcard over names (*)
  • pattern matching (LIKE)

SQL features that are unnatural with heterogeneous data tables’ syntax:

  • implicitly filter a table with conditions in another table sharing the same key.
    It’s an implied join(T, T_cond)+filter operation in MATLAB. Often used with ANY, ALL, EXISTS

Fundamentally heterogenous data types expects working with snapshots that doesn’t update often. Therefore they do not offer active checking (callbacks) as in SQL:

  • Invariant constraints (CHECK, UNIQUE, NOT NULL, Foreign key).
  • Auto Increment
  • Virtual (dependent) tables (CREATE VIEW)

Know these database/spreadsheet concepts:

  • Tall vs wide tables

Language logistics (not related to database)

ConceptsSQLMATLAB (table/dataset)Pandas (Dataframe)
Partial displayMySQL: LIMIT
Oracle: FETCH FIRST
T( 1:10, : )df.head()
Comments-- or /* */% or %{ %}# or """"""
functionCREATE PROCEDURE fcnfunction [varargout{:}]=fcn(varargin{:})def fcn:
caseCASE WHEN THEN ELSE ENDswitch case end(no case structure, use dictionary)
Null if no resultsIFNULL ( statement )function X=null_if_empty(T, cond)
X=T( cond, : );
if( isempty(X) ) X=NaN;
Replace nullsISNULL(col, target_val)T.col(isnan(T.col)) = target_val

 10 total views,  10 views today

MATLAB assign index for unique rows

MATLAB’s dataset/table objects’ internals often involves identifying unique contents and assigning a unique (grouping) index to it so the indices can be mapped or joined without actually going through the contents of each row.

In the old days when I were using dataset(), the first generation of table() objects before the rewrite, there is a tool called grp2idx() which assigns the same number to identical items regardless of data types. It was part of Statistics Toolbox (needs to pay extra for it) and it does not work if you have multiple columns that you want to assign an unique index unless the ROWS are identical.

Upon inspection. grp2idx() is overrated. There are two ways to get it without paying for the toolbox:

  • double(categorical(X)): cast a categorical type (technically you can use nominal/ordinal, but it’s part of statistics toolbox)
  • Use the 2nd output argument for sort() or sortrows() function. I recommend sortrows() because it’s can be overloaded on table() objects and it works on multiple rows.

 2 total views,  2 views today

DBeaver connecting to MySQL in Namecheap Shared Hosting

Namecheap already provided instructions to connect MySQL Workbench client for its shared hosting, which involves SSH-tunneling because they disallowed direct MySQL connection out of security concerns.

So here’s basically the logistics:

  1. SSH to your namecheap hostname (can use your domain name) at SSH port 21098
  2. Tunnel listens to Port 5522 and forward it to localhost (the client itself) at MySQL Port 3306
  3. Instead of connecting directly to the {namecheap shared hosting server}:3306, connect to the localhost:3306

It’s a little confusing on how to do it on DBeaver because “Advanced settings” is hidden by default which you will need. The name ‘local client’ (source) vs ‘remote’ (destination) in the dialog box is confusing. It’s actually equivalent to

ssh -L ["Local host":]"Local port":"Remote host":"Remote port"
ssh -L [bind_address:]port:host:hostport

bind_address can be left blank. If you are paranoid and don’t want other machines to use your current MySQL client machine as a gateway (they tunnel into your machine to use the tunnel you are currently establishing), set (aka bind) it to localhost, or you can bind it to the client’s network adapter’s IP which you want to allow machines on a trusted network to use this MySQL client computer as a gateway.

For some reason (I suspect it’s IPv6), “Remote host” needs to be set to the loopback adapter 127.0.0.1 (cannot use the special hostname ‘localhost‘).

Remember MySQL’s username and password is the special database-only login credentials you created at cPanel.

 2 total views,  1 views today

Text manipulation idioms in linux

awk: select columns
sed: stream editor (operations like select, substitute, add/delete lines, modify)
sed expressions can be separated by ";"
sed can substitute all occurrences with 'g' modified at the end: 's/(find)/(replace)/g'

# https://unix.stackexchange.com/questions/92187/setting-ifs-for-a-single-statement

# arg I/O
$@: unpack all input args
$*: join all inputs as ONE arg, separated by FIRST character of IFS (empty space if unspecified)

# Remember the double quotes around "$*" or "$array[*]" usages or else IFS won't function

array[@]: entire array
${array[@]}: unpacks entire array into MULTIPLE arguments
${array[*]}: join entire array into ONE argument separated by FIRST character of IFS (defaults to an empty space if unspecified)
( IFS=$'\n'; echo "${my_array[*]}" )

${#str}: length of string
${#array[@]}: length of array
${#array[@]:start:after_stop}: select array[start] ... array[after_stop-1]

${str:="my_string"}: initializes variable str with "my_string" (useful for side-effect)

$(str##my_pattern}: delete front matching my_pattern
${str%%my_pattern}: deletes tail matching my_pattern (can use one % instead)
$(str%?}: delete last character (the my_pattern is a single character wildcard "?")

$( whatever_command ): captures stdout created by running whatever_command
( $str ): tokenize to string array, governed by IFS (specify delimiter)
( $( whatever_command ) ): combines the two operations above: capture stdout from command and tokenize the results

# https://unix.stackexchange.com/questions/92187/setting-ifs-for-a-single-statement
function strjoin { local IFS="$1"; shift; echo "$*"; }

 7 total views

Improved code for Toner Reset SP C250SF/DN

This is based on the Raspberry Pi implementation of the Toner chip reset:

https://gist.github.com/joeljacobs/c57550cdb4e68e3b86d6b89fb58f305d

I am using a Raspberry Pi Zero W so the chip is BCM2835 instead and I can use 100Kbps/400KBps instead of 9600 baud as in the original code

The electrical pins we need is clustered on to top left, Pins 1 (3.3V), 3 (I2C SDATA), 5 (I2C SCLK), 9 (Ground)

Raspberry Pi Zero GPIO Pinout, Specifications and Programming language

While looking for the pinouts (https://pinout.xyz/pinout/i2c), I discovered a useful tool called i2cdetect that allows me to find out the address of the chips which means I can write a program automatically figure out the right image to load to the chip without looking:

sudo apt-get install i2c-tools
sudo i2cdetect -y 1
Sorry I forgot where I got this image from.
Please remind me in the comments section if you find out who should I credit it to.

Since I don’t have cheap pogo pins lying around, I took the 2.4mm pitch (the standard size used in PC, Arduino and Raspberry Pi) jumper block I have (so all pins are set at equal lengths to make simultaneous contact) and hope somehow there’s 4 pins that kind of align with the contact, and it did. See pictures here:

Can press the pins down by using jumpers

You might be worried about shorting into the next pin or hooking something up in reverse damaging the chips, but luckily the chips survived. My guess is that it’s a good design to put the Vcc next to Ground on one side instead of making it symmetric so the polarity can be reversed. When reversed, SCL is hooked forced to Ground, SDA is pulled up to Vcc while there is no power supply, so no damage is done. Brilliant! The worst case for my poorly aligned jumper block is that SDA and Vcc might touch each other, but it doesn’t matter because it’s a perfectly legal hookup (just not communicating)!

So no worries if you didn’t touch the pins right! The only case it might go wrong is if you intentionally flip the block and slide it by two pins (reversing Vcc and Ground). Other cases are pretty much data lines getting hooked high or low levels while power lines not getting any supplies.

I’ve designed the program that it’ll detect the chip if you hook it up right and immediately program the chip (takes only a second), so you don’t have to hold the jumper for too long to worry about unstable contacts.

#!/bin/bash

# This program detects rewrite the toner chips to "full" for a Ricoh SP C250SF/DN Printer using Raspberry PI (defaults to BCM2835 models such as Raspberry PI Zero W)

# The chip data is in file named "black" "cyan" "magenta" and "yellow". 
# The pad closest to the edge is GND (-> Pin 9), followed by VCC (-> Pin 1) , DATA (-> Pin 3), and Clock (-> Pin 5).

# Be sure i2c is enabled and installed (it's turned off by default) on Raspbian

# This line is disabled because it takes too long to unregister i2c_bcm2835 to start from a clean slate
# modprobe -r i2c_bcm2835 

# Sets the baud rate
modprobe i2c_bcm2835 baudrate=400000

# Create I2C address to color map
COLORS=( [50]="yellow" [51]="magenta" [52]="cyan" [53]="black" )
# Detect chip I2C address
I2C_address=$( sudo i2cdetect -y 1 | grep 50 | sed -e 's/50: //;s/-- //g' )
# Keep the 0x5* address lines since only 0x50~0x53 is valid. Strip the 50: header, discard all "--" entries, and you are left with the detected address
HEX_I2C_address="0x$I2C_address"

# LED flash function
function flash_once {
  period=${1:-0.5}
  target_device="/sys/class/leds/led0/brightness"

  echo 0 > ${target_device}
  sleep $period

  echo 1 > ${target_device}
  sleep $period
}

function flash {
  times=${1:-1}
  period=$2
  for((i=1; i<=times; i++)); do
    flash_once $period
  done
}

if [ -v COLORS[I2C_address] ]; then
  # Meat
  color=${COLORS[I2C_address]}
  echo "Detected toner chip for color: $color"

  echo "Short flashes before starting. Long flash after done"
  flash 5 0.1

   # "address" counter sync up with the hex code index in file
   printf "Writing"   
   address=0;
   for i in $(cat ${color}); do
     i2cset -y 1 ${HEX_I2C_address} $address $i;
     address=$(($address +1));
     printf .
   done
   echo "Done!"
  flash 3 0.5
else
  echo "Invalid I2C address for SP C250DN/SF toner chips: ${I2C_address}"
fi

I chose to flash the board’s only LED light quickly before starting and blink slowly a few times after it’s done for visual clues. It’s entirely optional. Here’s the guts of the code without the fancy indicators:

#!/bin/bash

# Sets the baud rate
modprobe i2c_bcm2835 baudrate=400000

# Create I2C address to color map
COLORS=( [50]="yellow" [51]="magenta" [52]="cyan" [53]="black" )

# Detect chip I2C address
I2C_address=$( sudo i2cdetect -y 1 | grep 50 | sed -e 's/50: //;s/-- //g' )
HEX_I2C_address="0x$I2C_address"

if [ -v COLORS[I2C_address] ]; then
  # Meat
  color=${COLORS[I2C_address]}

  # "address" counter sync up with the hex code index in file
  address=0;
  for i in $(cat ${color}); do
    i2cset -y 1 ${HEX_I2C_address} $address $i;
    address=$(($address +1));
  done
else
  echo "Invalid I2C address for SP C250DN/SF toner chips: ${I2C_address}"
fi

Download the package. Run program_toner

Just in case if people are wondering. The L01 chip’s datasheet is here:

 12 total views

Auto mount USB drives

Raspbian OS (Raspberry Pi) do not mount USB drives automatically out of the box.

I’m pretty annoyed by the lack of easy to use packages by 2021 and I still have to do it myself with the instructions here: https://github.com/avanc/mopidy-vintage/wiki/Automount-USB-sticks

These are cookbook instructions, but I’ll add some insights to what each component means so it’s easier to remember the steps.

At top level, to auto-detect and mount USB drives, we need the following components

  • udev: analogous to what happens behind device manager, it keeps track of and updates devices as they are connected and disconnected immediately. Need to register USB sticks by adding a event handler, which triggers a systemd service (see below)
  • systemd: analogous to Windows’ services. Need to register the service by stating what commands it will call on start (mostly mounting) and cleanup (mostly unmounting)
  • automount script: it’s a user defined script that abstracts most of the hard work detecting the partitions on the USB stick, assign the mount point names, and mount them
# Register udev event handler (rules)
# /etc/udev/rules.d/usbstick.rules

ACTION=="add", KERNEL=="sd[a-z][0-9]", TAG+="systemd", ENV{SYSTEMD_WANTS}="usbstick-handler@%k"

# It triggers a systemd call to "usbstick-handler@" service registered under /etc/systemd/system/
# Register systemd service
# /etc/systemd/system/usbstick-handler@.service
# (Note: instructions used /lib instead of /etc. It's better to add it as /etc as this is manually registered as user-defined service rather than from a package)

[Unit]
Description=Mount USB sticks
BindsTo=dev-%i.device
After=dev-%i.device

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/bin/automount %I
ExecStop=/usr/bin/pumount /dev/%I

# %I is the USB stick's device name under /dev, usually sda
# Abstracted the logic of determining the mount point name and mounting to 'automount' (see below)

Create the file /usr/local/bin/automount and give it execution permission: chmod +x /usr/local/bin/automount

#!/bin/bash

# $1 (first argument) is usually "sda" (supposedly USB stick device name) seen from %I in the systemd service commands
PART=$1

# Within the "sda" (USB stick device of interest), extract the partition labels (if applicable) from lsblk command. The first column (name) is dropped
FS_LABEL=`lsblk -o name,label | grep ${PART} | awk '{print $2}'`

# Decide the mount point name {partition label}_{partition name}
# e.g. MS-DOS_sda1
tokens=($FS_LABEL)
tokens+=($PART)
MOUNT_LABEL=$(IFS='_'; echo "${tokens[*]}")
# Using string array makes it easier to drop the prefix if there's no {partition label}
# Bash use IFS to specify separators for listing all elements of the array

# Suggestion: drop --sync for faster USB access (if you can umount properly)
/usr/bin/pmount --umask 000 --noatime -w --sync /dev/${PART} /media/${MOUNT_LABEL}

This automount script is adapted from https://raspberrypi.stackexchange.com/questions/66169/auto-mount-usb-stick-on-plug-in-without-uuid with my improvements.

 14 total views

Get myself comfortable with Raspberry Pi

I2C is disabled by default. Use raspi-config to enable it. Editing config file /boot/config.txt directly might not work

Locale & Keyboard (105 keys) defaults to UK out of the box. Shift+3 “#” (hash) sign became “£” pound sign. Use raspi-config to change the keyboard.

It reads random garbage partitions for MFT assigned to FAT16 drives. Just use FAT32

USB drives does not automount by default. usbmount is messy as it creates dummy /media/usb[0-7] folders. Do this instead.

 5 total views