Using Dart’s C-style syntax to make chain of lambda functions concrete

Start with an example lambda calculus expression f\equiv\lambda x.(y+x). It is:

  • [programming view] a function with x as an input argument and it uses y from the outer workspace (called a free-variable). f(var x) { return y+x; }
  • [mathematical view] can also be written as f(x) = y+x where y is seen as a fixed/snapshotted value relative to the expression.
g(int y) => (int x) => y + x

Lambda calculus is right-associative

g(int y) => ( (int x) => y + x )

Unrolling in C-style it will give better insights to the relationships between layers

g(int y) {
  return (int x) { return y + x; };
}

Note that (int x) { return y + x; } is a functor. To emphasize this, the same code can be rewritten by assigning the name f to it and returning f:

g(int y) {
  f(int x) { return y + x; };
  return f;
}

Use the C++11 style syntax so that it doesn’t look like nested function implementation body instead of a functor nested inside a function:

g(int y) {
  var f = (int x) { return y + x; };
  return f;
}

Note that conceptually what we are doing with the wrapper cascade is indeed nested functions. However, in the wrapper, we spit out a functor (which did most of the partial work) so the user can endow/evaluating it with the last piece of needed info/input:

g(int y) {
  f(int x) { 
   return y + x;
  };
  return f;
}

More commonly as seen in Dart docs, this formatting shows a (binder/capturing) wrapper function returning its own nested function:

g(int y) {
  return (int x) { 
           return y + x;
         };
}

Loading

Styles for (Lambda) anonymous function (named or unnamed) in Dart Language

Official Dart docs is sometimes too simple to provide ultimate answers for various language questions. I discoverd an alternative syntax for named lambda here and here. In Dart,

(args) => expr

is the shorthand for

(args) { 
  return expr 
}

So the Lambda-style (C++11) names variable by assigning it to a messy functor type (inferred with var in Dart and auto in C++11):

var f = (args) => expr

Which can also be rewritten using C-style function pointer-like declaration except

  1. the body expr is included (which is not allowed in C),
  2. return must be explicit in curly braces block.
  3. arrow notation => takes whatever the expr evaluates to (which can be null for statements like print)
  4. it’s simply the function name f in Dart instead of pointer (*f) in C
[optional return type] f(args) { 
  return expr;
}

which can be simplified with the arrow operator

[optional return type] f(args) => { expr }

Loading

Data Relationships of Spreadsheets: Relational Database vs. Heterogenous Data Tables

This blog post is development in process. Will fill in the details missing details (especially pandas) later. Some of the MATLAB syntax are inaccurate in the sense that it’s just a description that is context dependent (such as column names can be cellstr, char string or linear/logical indices).

From data relationship point of view, relation database (RDMBS), heterogenous data tables (MATLAB’s dataset/table or Python Panda’s Dataframe) are the same thing. But a proper database have to worry about concurrency issues and provide more consistency tools (ACID model).

Heterogenous data tables are almost always column-oriented database (mainly for analyzing data) where MySQL and Postgres are row-store database. You can think of column-store database as Struct of Arrays (SoA) and row-store database as Array of Struct (AoS). Remember locality = performance: in general, you want to put the stuff you frequently want to access together as close to each other as possible.

Mechanics:

ConceptsSQLMATLAB tablePandas Dataframe
tablesFROM(work with T)(work with df)
columns
variables
fields
SELECT T.(field)
T(:, cols/varnames)
rows
records
WHERE

HAVING
T( cond(T), : )

T_grp( cond(T_grp), : )
conditionsNOT
IS
IN
BETWEEN
~
==, isequal*()
ismember()
a<=b & b<=c
Inject table to
another table
INSERT INTO t2
SELECT vars FROM t1
WHERE rows
T2(end+(1:#rows), vars) = T1(rows, vars)
(Doable, throws warning)

Insert record/rowINSERT INTO t (c1, c2, ..)
VALUES (v1, v2, ..)
T=[T; {v1, v2, ...}]
(Cannot default for unspecified column*)
update records/elementsUPDATE table
SET column = content
WHERE row_cond
T.(col)(row_cond) = content
New table
from selection
SELECT vars
INTO t2
FROM t1
WHERE rows
T2 = T1(rows, vars)
clear tableTRUNCATE TABLE tT( :, : )=[]
delete rowsDELETE FROM t WHERE cond
(if WHERE is not specified, it kills all rows one by one with consistency checks. Avoid it and use TRUNCATE TABLE instead)
T( cond, : ) = []
* I developed sophisticated tools to allow partial row insertion, but it’s not something TMW supports right out of the box. This involves overloading the default value generator for each data type then extract the skeleton T( [], : ) to identify the data types.

Core database concepts:

ConceptsSQLMATLAB (table/dataset)Pandas (Dataframe)
linear indexCREATE INDEX idx ON T (col)T.idx = (1:size(T,1))'
group indexCREATE UNIQUE INDEX idx ON T (cols)[~, T.idx] = sortrows(T, cols)
(old implementation is grp2idx())
set operationsUNION
INTERSET
union()
intersect()
setdiff(), setxor()
sortORDER BYsortrows()
uniqueSELECT DISTINCTunique()
reduction
aggregration
F()@reductionFunctions
groupingGROUP BYSpecifying ‘GroupingVariables’ in varfun(), rowfun(), etc.
partitioning(set partition option in Table Definition)T1=T(:, {'key', varnames_1})
T2=T(:, {'key', varnames_2})
joins[type] JOIN*join(T1, T2, ...)df.join(df2, …)
cartesian productCROSS JOIN
(misnomer, no keys)
T_cross = [repelem(T1, size(T2,1), 1), repmat(T2, [size(T1,1), 1])]
Function programming concepts map (linear index), filter (logical index), reduce (summary & group) are heavily used with databases

Formal databases has a Table Definition (Column Properties) that must be specified ahead of time and can be updated in-place later on (think of it as static typing). Heterogenous Data Tables can figure most of that out on the fly depending on context (think of it as dynamic typing). This impacts:

  • data type (creation and conversion)
  • unspecified entries (NULL).
    Often NaN in MATLAB native types but I extended it by overloading relevant data types with a isnull() function and consistently use the same interface
  • default values
  • keys (Indices)

SQL features not offered by heterogenous data tables yet:

  • column name aliases (AS)
  • wildcard over names (*)
  • pattern matching (LIKE)

SQL features that are unnatural with heterogeneous data tables’ syntax:

  • implicitly filter a table with conditions in another table sharing the same key.
    It’s an implied join(T, T_cond)+filter operation in MATLAB. Often used with ANY, ALL, EXISTS

Fundamentally heterogenous data types expects working with snapshots that doesn’t update often. Therefore they do not offer active checking (callbacks) as in SQL:

  • Invariant constraints (CHECK, UNIQUE, NOT NULL, Foreign key).
  • Auto Increment
  • Virtual (dependent) tables (CREATE VIEW)

Know these database/spreadsheet concepts:

  • Tall vs wide tables

Language logistics (not related to database)

ConceptsSQLMATLAB (table/dataset)Pandas (Dataframe)
Partial displayMySQL: LIMIT
Oracle: FETCH FIRST
T( 1:10, : )df.head()
Comments-- or /* */% or %{ %}# or """"""
functionCREATE PROCEDURE fcnfunction [varargout{:}]=fcn(varargin{:})def fcn:
caseCASE WHEN THEN ELSE ENDswitch case end(no case structure, use dictionary)
Null if no resultsIFNULL ( statement )function X=null_if_empty(T, cond)
X=T( cond, : );
if( isempty(X) ) X=NaN;
Replace nullsISNULL(col, target_val)T.col(isnan(T.col)) = target_val
T = standardizeMissing( T, ... )

Loading

Implicit ways to store data in a program

The most obvious way to store data is in plain data structures such as arrays and queues, or even a hashtable, but don’t forget these implicit ones:

  • Call stack. As the name say, it’s a stack data structure. It saves the local variables before making another function call. This is often exploited in recursion to avoid passing around an explicit data structure
  • Closures. What closure means is that when you create an anonymous function, the variables involved (other than the arguments) that is saved along (captured) with the function object you created. This can be exploited to make forward iterators or generators
  • Functions. You can make a function that does nothing other than returning a certain piece of data. It’s an excellent way to make avoid the overhead of managing (reading, updating promptly) a config file. Works best when your programming language requires so little typing to specify data (such as MATLAB) that your code is almost as short as a plain text config file.

Loading