## Data Type characteristics

* Shallow assignment (transferring reference only) means the LHS does not have its own copy, so modifying the new reference will modify the underlying data on the RHS.

## Syntax / Usage

Powershell specific

• The UNCAPTURED output value in the last line of the block is the return value! Unary side effect statements such as $x++ do not have output value. Watch out for statements that looks like it’s going nowhere at the end of the code as these are not nop/bugs, but return value. This has the same stench as fall-throughs. • foreach() follows the last uncaptured output value return rule above doing a 1-to-1 map from the input collection to output collection (you can assign output to foreach() as it’s also seen as a function) • Powershell suck at binary operations between two arrays. Just an elementwise A+B you’d be thinking in terms of loops and worry about dimensions. • You can put if and loop blocks inside collections list construction, like this: @( 3, if(cond1){...;$v1}  do{...; $v2}while(cond2) ) MATLAB specific • When used with classes and custom matrices/arrays, chaining fields/properties/methods by indices often do not work, when they do, they often give out only the first element instead of the entire array (IIRC, there are operator methods that needs to be coordinated in the classes involved to make sure they chain correctly). In short, just don’t chain unless in very simple, scalar cases. Always output it to a variable a access the leaf. ## Range & Indexing Negative (cyclic) indexing along with automatic descending range, along with the lack of ‘end’ keyword is a huge pain in the rear when you want to scan from left to right like A[5:end]. Instead, you’ll have to do $A[4..($A.length-1)] because the range 4..-1 inside A[4..-1] is unrolled as 4,3,2,1,0,-1 (thus scanning from right to left and wraps around) without first consulting with the array A like the end keyword in MATLAB does so it can substitute the ends of the range with the array information before it unrolls. I am willing to bet that this behavior does not have a sound basis other than people thinking negative indices and descending ranges alone are two good ideas without realizing that nearly nobody freaking wants to scan from right to left and wrap around! I had the same gripes about negative indices in Python not carefully coordinating with other combinations in common use cases which cases unintuitive behavior. ## Range indexing syntax # Powershell 1..10 # No step/skip for range creation A[1..10] # No special treatment in array such as figuring out the 'end' % MATLAB A[start:(step):stop] # Python A[range(start,stop,step)] # Slicing (it's not range) A[(start):(stop):(step)] # Can skip everything # In Python, A=X merely reassign the label A as the alias for X. # Modifying the reassigned A through A=X will modify underlying contents of X # To deep-copy contents without .Clone(), assign the full slice A[:] = X  ## Hasthtable / Dictionaries % MATLAB: Use dynamic fields in struct or containers.Map() # Python: dictionaries such as {a:1, b='x'} # Powershell: @{a=1, b='x'} ## Structs Powershell does not have direct struct or dynamic field name struct. Instead if your object is uniform (you expect the fields not to change much), use [PSCustomObject]@{}. You can also just use simple hashtable @{}, but for some reason it doesn’t work the way I expected when put into arrays when I try to reference it by array index. ## Array rules surprises • Array comparisons are filtering operation (not boolean array output like MATLAB). (0..9) -ge 5 gives 5 to 9, not a list of False … False, True … True. To get a boolean array, use this shortcut: (0..9) | % {$_ -ge 5}

Map-filter combo syntax is | ? instead of Map syntax | %

• Monad (Cells in MATLAB) are unpacked and stacked by default (in MATLAB, I had to write a lot of routines to unpack and stack cells of cells). To keep cells packed (in MATLAB lingo, it’s like ‘UniformOutput’, false in cellfun), add a comma unary operator in front of the operation that are expected to be unpacked like this:

## Libraries and Modules

• Reload module using Import-Module \$moduleName -Force

47 total views

## GUI Paradigms (1): Redux (Flutter/React) translated to MATLAB

For GUI development, we often start with controls (or widgets) that user interact with and it’ll emit/run the callback function we registered for the event when the event happens.

Most of the time we just want to read the state/properties of certain controls, process them, and update other controls to show the result. Model-View-Controller (MVC) puts strict boundaries between interaction, data processing and display.

The most common schematic for MVC is a circle showing the cycle of Controller->Model->View, but in practice, it’s the controller that’s the brains. The view can simultaneously accept user interactions, such as a editable text box or a list. The model usually don’t directly update the view directly on its own like the idealized diagram.

With MVC, basically we are concentrating the control’s callbacks to the controller object instead of just letting each control’s callback interact with the data store (model) and view in an unstructured way.

When learning Flutter, I was exposed to the Redux pattern (which came from React). Because the tutorials was designed around the language features of Dart, the documentation kind of obscured the essence of the idea (why do we want to do this) as it dwelt on the framework (structure can be refactored into a package). The docs talked a lot about boundaries but wasn’t clearly why they have to be meticulously observed, which I’ll get to later.

The core inspiration in Redux/BLoC is taking advantage of the concept of ‘listening to a data object for changes’ (instead of UI controls/widget events)!

Instead of having the UI control’s callback directly change other UI control’s state (e.g. for display), we design a state vector/dictionary/struct/class that holds contents (state variables) that we care. It doesn’t have to map 1-1 to input events or 1-1 to output display controls.

When an user interaction (input) event emitted a callback, the control’s callback do whatever it needs to produce the value(s) for the relevant state variable(s) and change the state vector. The changed state vector will trigger the listener that scans for what needs to be updated to reflect the new state and change the states of the appropriate view UI controls.

This way the input UI controls’ callbacks do not have to micromanage what output UI controls to update, so it can focus on the business logic that generates the content pool that will be picked up by the view UI controls to display the results. In Redux, you are free to design your state variables to match more closely to the input data from UI controls or output/view controls’ state. I personally prefer a state vector design that is closer to the output view than input controls.

The intuition above is not the complete/exact Redux, especially with Dart/Flutter/React. We also have to to keep the state in ONE place and make the order of state changes (thus behavior) predictable!

• Actions and reducers are separate. Every input control fires a event (action signal) and we’ll wait until the reducers (registered to the actions) to pick it up during dispatch() instead of jumping on it. This way there’s only ONE place that can change states. Leave all the side effects in the control callback where you generate the action. No side effects (like changing other controls) allowed in reducers!
• Reducers do not update the state in place (it’s read only). Always generate a new state vector to replace the old one (for performance, we’ll replace the state vector if we verified the contents actually changed). This will make timing predictable (you are stepping through state changes one by one)

In Javascript, there isn’t really a listener actively listening state variable changes. Dispatch (which will be called every time the user interacts using control) just runs through all the listeners registered at the very end after it has dispatched all the reducers. In MATLAB, you can optionally set the state vector to be Observable and attach the change listener callback instead of explicitly calling it within dispatch.

Here is an example of a MATLAB class that captures the spirit of Redux. I added a 2 second delay to emulate long operations and used enableDisableFig() to avoid dealing with queuing user interactions while it’s going through a long operation.

classdef ReduxStoreDemo < handle

% Should be made private later
properties (SetAccess = private, SetObservable)
state % {count}
end

methods (Static)
% Made static so reducer cannot call dispatch and indirectly do
% side effect or create loops
function state = reducer(state, action)
% Can use str2fun(action) here or use a function map
switch action
case 'increment'
fprintf('Wait 2 secs before incrementing\n');
pause(2)
state.count = state.count + 1;
fprintf('Incremented\n');
end
end
end

% We keep all the side-effect generating operations (such as
% temporarily changing states in the GUI) in dispatch() so
% there's only ONE PLACE where state can change
methods
function dispatch(obj, action, src, evt)
% Disable all figures during an interaction
figures = findobj(groot, 'type', 'figure');
old_fig_states = arrayfun(@(f) enableDisableFig(f, 'off'), figures);
src.String = 'Wait ...';

new_state = ReduxStoreDemo.reducer(obj.state, action);

% Don't waste cycles updating nops
if( ~isequal(new_state, obj.state) )
% MATLAB already have listeners attached.
% So no need to scan listeners like React Redux
obj.state = new_state;
end

% Re-enable figure obj.controls after it's done
arrayfun(@(f, os) enableDisableFig(f, os), figures, old_fig_states);
src.String = 'Increment';
end
end

methods
function obj = ReduxStoreDemo()
figure();
obj.state.count = 0;

h_1x  = uicontrol('style', 'text', 'String', '1x Box', ...
'Units', 'Normalized', ...
'Position', [0.1 0.3, 0.2, 0.1], ...
'HorizontalAlignment', 'left');
'PostSet', @(varargin) obj.update_count_1x( h_1x , varargin{:}));

uicontrol('style', 'pushbutton', 'String', 'Increment', ...
'Units', 'Normalized', ...
'Position', [0.1 0.1, 0.15, 0.1], ...
'Callback', @(varargin) obj.dispatch('increment', varargin{:}));

% Force trigger the listeners to reflect the initial state
obj.state = obj.state;
end
end

%% These are 'renders' registered when the uiobj.controls are created
% Should stick to reading off the state. Do not call dispatch here
% (just leave it for the next action to pick up the consequentials)
methods
% The (src, event) is useless for listeners because it's not the
% uicontrol handle but the state property's metainfo (access modifiers, etc)
function update_count_1x(obj, hObj, varargin)
hObj.String = num2str(obj.state.count);
end
end

end

164 total views

## An easy way to remember Euclid GCD algorithm: laying tiles

I was a little embarrassed that I’ve never came across Euclid’s GCD algorithm until years after I graduated with Math and Engineering degrees. The descriptions I’ve found online (especially Wikipedia) are convoluted and confusing, and the mechanical description of the algorithm does not shine light to any intuition. Then there comes proofs and abstract algebra properties that even after I followed the reasoning, I’d soon forget after I stop looking at it in a few weeks.

Just by staring at one simple description of the algorithm:

The classic algorithm for computing the GCD, known as Euclid’s algorithm, goes as follows: Let and be variables containing the two numbers, divide by . Save the divisor in , and save the remainder in . If is , then stop: contains the GCD. Otherwise repeat the process, starting with division of by .

K.N. King’s C++ Programming (Chapter 6 Exercise 2)

I reversed engineered one line of reasoning how one could come up with the Euclid GCD on his own. Turns out there is one twist/angle that most traditional explanations glossed over: quotients do NOT matter and WHY it does NOT matter!

It goes back to what GCD stands for: greatest common divisor. You are looking for the largest factor that simultaneously divides and evenly. If you rethink in terms of multiplication, you are finding the biggest tile size that can simultaneously cover and without gaps:

Find s.t. and where are integers.

Imagine you are a lazy and cheap contractor who have unlimited tile samples of all sizes to try before making a large order:

• [Goal] you have to cover with two floors with potentially different sizes and gaplessly
• [Constraint: divisible] your customer won’t let you cut (fractional) tiles because it’s ugly
• [Constraint: common denominator] you can only order ONE tile size to take advantage of bulk discounts
• [Objective: greatest] more tiles means more work to lay it, so you want to shop for the biggest tile size that does the job.

For simplicity of the argument, we also assume the divisor is the smaller number. If is the bigger number, the roles will reverse in the next round because the remainder from dividing by a larger number is the dividend itself, which becomes the divisor in the next round while the old divisor (the larger number) becomes the dividend.

For relatively prime pairs, the algorithm will eventually stop with a GCD of because divides all integers.

Here’s the analog of Euclid algorithm:

• start out with 1 big tile covering the entire smaller floor
• see if we can cover without gaps using big tiles of size
• if yes, we are done. , the macro-tile is your GCD (perfect-sized tile).
• if there are gaps (non-zero remainder), pick a tile-size that covers the ugly gap (the remainder becomes the divisor), then repeat the same process above over the last tile size (the divisor becomes the dividend) until there are no gaps (remainder become zero).

The algorithm is taking advantage of the fact the remainder is smaller than the last tile size (divisor) so we can rewrite the last tile size in multiples of the remainder (old gap) plus the new gap. By the time the new gap divides the last tile size, all numbers before that can be written as multiples of it (i.e. the last gap evenly divides all the numbers in the chain all the way up to and ).

We focus on the last tile and write it in terms of the old remainder :

Since the new remainder is , the old remainder divides the last tile . Since we are dividing in terms of the remainder , we have

\begin{aligned}
b &= 15 \\
&= (5+5+5) \\
&= 3r \\
\\
a &= 50 \\
& = (15 + 15 + 15) + 5\\
& = 3b + r\\
&= (5+5+5) + (5+5+5) + (5+5+5) + 5\\
&= 3(3r)+r \\
& = 10r
\end{aligned}

So both and can be written in terms of integer multiples of right before the algorithm stops.

GCD takes a few more steps:

\begin{aligned}
b &= 18\\
a &= 50\\
&= (18 + 18) + 14\\
&=2b + r\\
r &= 14 \\
\\
b_1 &= r = 14\\
a_1 &= b = 18\\
&= (14) + 4\\
&= b_1 + r_1\\
r_1 &= 4\\
\\
b_2 &=r_1 = 4\\
a_2 &= b_1 = 14\\
&= (4+4+4)+2\\
&= 3b_2 + r_2\\
r_2 &= 2\\
\\
b_3 &= r_2 = 2\\
a_3 &= b_2 = 4\\
&=(2+2) + 0\\
r_3 &= 0\\

\end{aligned}

The algorithms stops at which means we successfully divided the last tile with (logistically stored in ) with no gaps left. (or is therefore the greatest common factor (divisor) that are shared by all the numbers involved all the way up to and .

This is the beauty of Euclid’s GCD algorithm: once the gap (remainder) is sliced small enough to evenly divide the tile (divisor) right before it, every term before it be can written in terms of integer multiples of the last number that divides the last tile without gap (no more remainder).

In our example, can be written as a multiple of , aka because integer multiples of numbers that are already integer multiples of can be written in terms of a larger integer multiple of . This is why quotients do not matter in the Euclid GCD algorithm:

\begin{aligned}
b_2 &= 2r_2\\
4 &= (2+2) \\
\\
b_1 &= 3b_2 + r_2\\
&= 3(2r_2) + r_2\\
& = 7r_2\\
14 &= [(4+4+4)+2] \\
&=  [(2+2)+(2+2)+(2+2)+2] \\

\\
b &= b_1 + r_1\\
&=7r_2 + 2r_2\\
&=9r_2\\
18 &= 14 + 4 \\
&=  [(4+4+4)+2] + 4 \\
&=  [(2+2)+(2+2)+(2+2)+2]+[(2+2)] \\
\\
a &= 2b + r\\
&=2(9r_2) + 7r_2\\
&= 25r_2\\
50 & = 18 + 18 + 14 \\
&= [(2+2)+(2+2)+(2+2)+2]+[(2+2)] \cdots \\
&+ [(2+2)+(2+2)+(2+2)+2]+[(2+2)] \cdots \\
&+ [(2+2)+(2+2)+(2+2)+2]
\end {aligned}

The spirit of the algorithm is that every time we see a gap, we fill it with a small tile that’s exactly the gap size, and attempt to replace the last tile by covering it in terms of the new smaller tile (that was used to fill the gap). We are recursively slicing the last tile with the gaps produced until there are no gaps left. Once you find the GCD tile, you can replace all the bigger tiles before in terms of it with no gaps.

There’s an interesting perspective of using the Euclid GCD algorithm to solve Diophantine Equations (integer written as a linear combination of integers with integer coefficients): turns if in is the same as writing . We can flip the sign of by saying substituting .

65 total views

Complex expressions in lambda calculus notation here is hard to parse in the head correctly until the day I started using Dart Language and noticed the connection between => lambda notation and C-style function layout. Here’s what I learned:

• The expressions are right-associative because the right-most expression is the deepest nested function
• Function application (feeding values to the arguments) always starts from left-to-right because you can only gain entry to a nest of functions starting from the outermost level, peeling it layer by later like an onion. So if applying arg to statement is denoted by (statement arg), the order of input would be (((statement arg1) arg2) arg3)
• Names showing up on the parameter list of the nearest layer sticks to it (local to the function) and excludes everybody else (variable shadowing): this means the same name have absolutely no bearing as a free variable or being a free variable anywhere, so you can (and should) replace it with a unique name.

Here’s the calculus syntax in the book:

This example is the function application used above

Here’s where reusing the variable in multiple places to mean different things gets confusing as hell, and the author (as well as the nomenclature) do not emphasize the equivalence to variable shadowing rules:

This can be rewritten as having

Upon application,

which clearly reads for an input of , it’ll be incremented first then squared. Resolving further:

Fucking evils of variable shadowing! Not only it makes the code hard to read and reason, it also make the mathematics hard to follow!

74 total views

## Data Relationships of Spreadsheets: Relational Database vs. Heterogenous Data Tables

This blog post is development in process. Will fill in the details missing details (especially pandas) later. Some of the MATLAB syntax are inaccurate in the sense that it’s just a description that is context dependent (such as column names can be cellstr, char string or linear/logical indices).

From data relationship point of view, relation database (RDMBS), heterogenous data tables (MATLAB’s dataset/table or Python Panda’s Dataframe) are the same thing. But a proper database have to worry about concurrency issues and provide more consistency tools (ACID model).

Heterogenous data tables are almost always column-oriented database (mainly for analyzing data) where MySQL and Postgres are row-store database. You can think of column-store database as Struct of Arrays (SoA) and row-store database as Array of Struct (AoS). Remember locality = performance: in general, you want to put the stuff you frequently want to access together as close to each other as possible.

Mechanics:

Core database concepts:

Formal databases has a Table Definition (Column Properties) that must be specified ahead of time and can be updated in-place later on (think of it as static typing). Heterogenous Data Tables can figure most of that out on the fly depending on context (think of it as dynamic typing). This impacts:

• data type (creation and conversion)
• unspecified entries (NULL).
Often NaN in MATLAB native types but I extended it by overloading relevant data types with a isnull() function and consistently use the same interface
• default values
• keys (Indices)

SQL features not offered by heterogenous data tables yet:

• column name aliases (AS)
• wildcard over names (*)
• pattern matching (LIKE)

SQL features that are unnatural with heterogeneous data tables’ syntax:

• implicitly filter a table with conditions in another table sharing the same key.
It’s an implied join(T, T_cond)+filter operation in MATLAB. Often used with ANY, ALL, EXISTS

Fundamentally heterogenous data types expects working with snapshots that doesn’t update often. Therefore they do not offer active checking (callbacks) as in SQL:

• Invariant constraints (CHECK, UNIQUE, NOT NULL, Foreign key).
• Auto Increment
• Virtual (dependent) tables (CREATE VIEW)

• Tall vs wide tables

Language logistics (not related to database)

87 total views

## Explain XOR tricks succinctly with two properties Change detector, toggler

Most teaching materials on XOR starts from its straight definition then a bunch of recipes for exploiting XOR. It is not convenient to remember and not intuitive to come up with new tricks on our own. Instead, I’d like to describe XOR as two intuitive operations:

1. Change detector (The logic definition itself)
2. Toggling light switches (Flipping twice undoes it)

The second interpretation (toggling) is actually way more powerful and easy to remember than the first interpretation (definition).

With the toggling interpretation, you don’t have to think hard to come up with the four algebraic properties:

1. Associativity and commutativity: we only care about whether the individual light switches (bits) are flipped odd (flipped) or even (not-flipped) number of times at the end of the day. When it happens in what order does not matter.
2. Identity: flipping NONE of the switches (a mask of all 0) leaves it alone
3. Inverse: pick only the switches that are already turned on (using itself as mask) and flip them (xor) guarantees to turn everything off (0)

The XOR swap trick can be constructed by exploiting the fact that flipping the same switch twice (at any point) reverts it back to where it started:

 Start Mix with No info is lost, still have somewhere to undo if we wish. The incumbent is used to cancel the inside to recover Since is already saved (mixed with) , we don’t have to worry about losing it as long as we get to our main goal of putting in , which will be used as a recovery key later. Now use the stored in to unmix to recover

Note that this trick is obsolete for modern computer architecture because

• it enforces strict dependence that kills any predictive pipeline (speedup) in modern computer architecture, and
• will yield zero all the time if you try to swap with itself in-place (same memory location), which is wrong because you expect the swap to do nothing.

Adding a check for the degenerate case (self-swap) slows the program down even further, making it worse than using a temporary storage for swapping. Therefore there’s no good reason to use it unless you have an exotic use case.

This is mainly used as a teaching device (or homework problem) to teach that XOR-ing even instances of the same variable in the chain cancels it.

540 total views

## Computer Science Comics from Sandra and Woo

xkcd is an obvious CS geek comics. But who’d expect hardcore CS jokes in Sandra and Woo, a raccoon comic? Here you go:

1,538 total views