Styles for (Lambda) anonymous function (named or unnamed) in Dart Language

Posted on October 27, 2021 by admin

Official Dart docs is sometimes too simple to provide ultimate answers for various language questions. I discoverd an alternative syntax for named lambda here and here. In Dart,

(args) => expr

is the shorthand for

(args) { 
  return expr 
}

So the Lambda-style (C++11) names variable by assigning it to a messy functor type (inferred with var in Dart and auto in C++11):

var f = (args) => expr

Which can also be rewritten using C-style function pointer-like declaration except

the body expr is included (which is not allowed in C),
return must be explicit in curly braces block.
arrow notation => takes whatever the expr evaluates to (which can be null for statements like print)
it’s simply the function name f in Dart instead of pointer (*f) in C

[optional return type] f(args) { 
  return expr;
}

which can be simplified with the arrow operator

[optional return type] f(args) => { expr }

Data Relationships of Spreadsheets: Relational Database vs. Heterogenous Data Tables

Posted on October 23, 2021 by admin

This blog post is development in process. Will fill in the details missing details (especially pandas) later. Some of the MATLAB syntax are inaccurate in the sense that it’s just a description that is context dependent (such as column names can be cellstr, char string or linear/logical indices).

From data relationship point of view, relation database (RDMBS), heterogenous data tables (MATLAB’s dataset/table or Python Panda’s Dataframe) are the same thing. But a proper database have to worry about concurrency issues and provide more consistency tools (ACID model).

Heterogenous data tables are almost always column-oriented database (mainly for analyzing data) where MySQL and Postgres are row-store database. You can think of column-store database as Struct of Arrays (SoA) and row-store database as Array of Struct (AoS). Remember locality = performance: in general, you want to put the stuff you frequently want to access together as close to each other as possible.

Mechanics:

Concepts	SQL	MATLAB table	Pandas Dataframe

tables	FROM	(work with T)	(work with df)
columns variables fields	SELECT	`T.(field)` `T(:, cols/varnames)`
rows records	WHERE HAVING	`T( cond(T), : )` `T_grp( cond(T_grp), : )`
conditions	NOT IS IN BETWEEN	`~` `==, isequal*()` `ismember()` `a<=b & b<=c`
Inject table to another table	INSERT INTO t2 SELECT vars FROM t1 WHERE rows	`T2(end+(1:#rows), vars) = T1(rows, vars)` (Doable, throws warning)
Insert record/row	INSERT INTO t (c1, c2, ..) VALUES (v1, v2, ..)	`T=[T; {v1, v2, ...}]` (Cannot default for unspecified column*)
update records/elements	UPDATE table SET column = content WHERE row_cond	`T.(col)(row_cond) = content`
New table from selection	SELECT vars INTO t2 FROM t1 WHERE rows	`T2 = T1(rows, vars)`
clear table	TRUNCATE TABLE t	`T( :, : )=[]`
delete rows	DELETE FROM t WHERE cond (if WHERE is not specified, it kills all rows one by one with consistency checks. Avoid it and use TRUNCATE TABLE instead)	`T( cond, : ) = []`

* I developed sophisticated tools to allow partial row insertion, but it’s not something TMW supports right out of the box. This involves overloading the default value generator for each data type then extract the skeleton T( [], : ) to identify the data types.

Core database concepts:

Concepts	SQL	MATLAB (table/dataset)	Pandas (Dataframe)
linear index	CREATE INDEX idx ON T (col)	`T.idx = (1:size(T,1))'`
group index	CREATE UNIQUE INDEX idx ON T (cols)	`[~, T.idx] = sortrows(T, cols)` (old implementation is `grp2idx()`)
set operations	UNION INTERSET	union() intersect() setdiff(), setxor()
sort	ORDER BY	sortrows()
unique	SELECT DISTINCT	unique()
reduction aggregration	F()	@reductionFunctions
grouping	GROUP BY	Specifying ‘GroupingVariables’ in varfun(), rowfun(), etc.
partitioning	(set partition option in Table Definition)	`T1=T(:, {'key', varnames_1})` `T2=T(:, {'key', varnames_2})`
joins	[type] JOIN	*`join(T1, T2, ...)`	df.join(df2, …)
cartesian product	CROSS JOIN (misnomer, no keys)	`T_cross = [repelem(T1, size(T2,1), 1), repmat(T2, [size(T1,1), 1])]`

Function programming concepts map (linear index), filter (logical index), reduce (summary & group) are heavily used with databases

Formal databases has a Table Definition (Column Properties) that must be specified ahead of time and can be updated in-place later on (think of it as static typing). Heterogenous Data Tables can figure most of that out on the fly depending on context (think of it as dynamic typing). This impacts:

data type (creation and conversion)
unspecified entries (NULL).
Often NaN in MATLAB native types but I extended it by overloading relevant data types with a isnull() function and consistently use the same interface
default values
keys (Indices)

SQL features not offered by heterogenous data tables yet:

column name aliases (AS)
wildcard over names (*)
pattern matching (LIKE)

SQL features that are unnatural with heterogeneous data tables’ syntax:

implicitly filter a table with conditions in another table sharing the same key.
It’s an implied join(T, T_cond)+filter operation in MATLAB. Often used with ANY, ALL, EXISTS

Fundamentally heterogenous data types expects working with snapshots that doesn’t update often. Therefore they do not offer active checking (callbacks) as in SQL:

Invariant constraints (CHECK, UNIQUE, NOT NULL, Foreign key).
Auto Increment
Virtual (dependent) tables (CREATE VIEW)

Know these database/spreadsheet concepts:

Tall vs wide tables

Language logistics (not related to database)

Concepts	SQL	MATLAB (table/dataset)	Pandas (Dataframe)
Partial display	MySQL: LIMIT Oracle: FETCH FIRST	`T( 1:10, : )`	`df.head()`
Comments	`--` or `/` … `/`	`%` or `%{` … `%}`	`#` or `"""` … `"""`
function	CREATE PROCEDURE fcn	`function [varargout{:}]=fcn(varargin{:})`	`def fcn:`
case	CASE WHEN THEN ELSE END	switch case end	(no case structure, use dictionary)
Null if no results	IFNULL ( statement )	`function X=null_if_empty(T, cond)` `X=T( cond, : );` `if( isempty(X) ) X=NaN;`
Replace nulls	ISNULL(col, target_val)	`T.col(isnan(T.col)) = target_val` `T = standardizeMissing( T, ... )`

Implicit ways to store data in a program

Posted on April 24, 2019 by admin

The most obvious way to store data is in plain data structures such as arrays and queues, or even a hashtable, but don’t forget these implicit ones:

Call stack. As the name say, it’s a stack data structure. It saves the local variables before making another function call. This is often exploited in recursion to avoid passing around an explicit data structure
Closures. What closure means is that when you create an anonymous function, the variables involved (other than the arguments) that is saved along (captured) with the function object you created. This can be exploited to make forward iterators or generators
Functions. You can make a function that does nothing other than returning a certain piece of data. It’s an excellent way to make avoid the overhead of managing (reading, updating promptly) a config file. Works best when your programming language requires so little typing to specify data (such as MATLAB) that your code is almost as short as a plain text config file.

My favorite CS / Software Engineering comics

Posted on March 21, 2019 by admin

Other than xkcd, which also integrates my favorite topics such as math, I found MonkeyUser comics, which is more specifically for software engineering:

https://www.monkeyuser.com/2017/http-status-codes-community/?sc=true&dir=random

Oversimplified: Getting rid of data in STL containers Summary of Item 9 in "Effective STL"

Posted on November 26, 2017 by admin

Unless deleting a known range of elements directly through iterators (no conditions to match), which range–erase() method can be used directly, targeting specific key/value/predicate requires understanding of the container’s underlying data structure.

I’d like to give a summary of Item#9 in “Effective STL” by defining the names and concepts so the complicated rules can be sensibly deduced by a few basic facts.

The words ‘remove‘ and ‘erase‘ has very specific meaning for STL that are not immediately intuitive.

	Lives in	Target to match	Purpose
`remove_?()`	`<algorithm>`	required	Rearrange wanted elements to front
`erase()`	container	not accepted	Blindly deleting range/position given

There is a remove() method for lists, which is an old STL naming inconsistency (they should have called it erase() like for associative containers). Treat it as a historical mistake.

The usage is easy to remember once you understand it with the right wording above:

algorithm + container	contiguous	lists	associative
`remove_?()`: move front	Step 1	~~Step 1~~ (Use `remove_?()` method)	unordered*: cannot rearrange (Use `erase(key)` directly)
`erase()`: trim tail	Step 2	~~Step 2~~ (Use `remove_?()` method)	~~Use after `find_?()`~~ (Use `erase(key)` directly)

Note that there are two steps for sequential (contiguous+lists) containers , hence the erase-remove idiom. It’s really two steps:

auto tail = remove(c.begin(), c.end(), T); 
c.erase(tail, c.end());

but they can be combined in one line since the tail is only useful at one place. Hence

c.erase( remove(c.begin(), c.end(), T), c.end() );

Lists provides a efficient shortcut method (see table below) since linked-lists does not need to be rearranged (just short the pointers).

one-shot methods	contiguous	lists	associative
by content	N/A: use `erase-remove` idiom	`remove(T)`	`erase(key)`
by predicate	N/A: use `erase-remove_if` idiom	`remove_if(pred)`	N/A: Use for-loops for now No erase_if() yet as of C++17.

Never try range-based remove_?() for associative containers. It is a data corruption trap if any attempt is made to use anything named remove on associative containers.

The trap used to be possible since <algorithms> and containers were separate, but newer C++ protects you from the trap by checking if the element you are moving is of a MoveAssignable type. Since associative containers’ keys cannot be modified (without a rescan), the elements are not move-assignable.

As for erasing through for-loops (necessary if you want to sneak in an extra step while iterating), C++11 now returns an iterator following the last erased element uniformly across all containers. This helps to preserve the running iterator that gets invalidated immediately after the erase through i=c.erase(i);

* For brevity, I twisted the term unordered here to mean that the native (implementation) data order is dependent on the data involved.

When I said ‘cannot rearrange’, I meant ‘cannot efficiently rearrange’, since there are no cheap O(1) next() or prev() traversal.

It’s a mess to simply copy one element over another (during rearrangement), leaving orphans there, and re-balance a BST or re-hash a hash-map. Nobody wants to go through such pains to remove element when there are tons of more direct ways out there.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Rambling Nerd with a Plan

Hoi Wong's blog

Category Archives: Programming

Styles for (Lambda) anonymous function (named or unnamed) in Dart Language

Data Relationships of Spreadsheets: Relational Database vs. Heterogenous Data Tables

Implicit ways to store data in a program

My favorite CS / Software Engineering comics

Oversimplified: Getting rid of data in STL containers Summary of Item 9 in "Effective STL"