## MATLAB Techniques: Self-identifying (by type) methods

We all know MATLAB by default fill numbers with 0 if we haven’t specified them (such as expanding a matrix by injecting values beyond the original matrix size). Cells are default filled with {[]} even if you meant to have cellstr()  {''} across the board. Sometimes it’s not what we wanted. 0 can be legitimate value, so we want NaN to denote undefined values. Same as cellstr(), we don’t want to choke the default string processing programs because one stupid {[]} turns the entire cell array into to a non-cellstr array.

For compatibility reasons (and it’s also hard to do so), it’s not a good idea to simply modify the factory behavior. I have something similar to table() objects that requires similar expansion on arbitrary data types, but MATLAB’s defaults proves to be clumsy.

Instead of writing switch-case statements or a bunch of if statements that relies on type information like this:

function x = makeUndefined(x)
switch class(x)
case {'double', 'single'}
x = NaN(size(x));
case 'cell'
if( iscellstr(x) )
x = repmat({''}, size(x));
end
% ...
end

I found a slick way to do it so I don’t have to keep track of it again if I need the same defaults in other places: take advantage of the fact that MATLAB selectively will load the right method depending on the first input argument(s)*:

Create a commonly named method (e.g. makeUndefined()) for the PODs and put it under the POD’s @folder (e.g. /@double/makeUndefined.m, /@cell/makeUndefined.m). The functions look something like this:

function y = makeUndefined(x)
% This function must be put under /@double
y = NaN(size(x));
function x = makeUndefined(x)
% This function must be put under /@cell
if( iscellstr(x) )
x = repmat({''}, size(x));
end

Similarly, you can make your isundefined() implementation for @double, @cell, etc, just like the factory-shipped @categorical/isundefined() corresponding to the same rules you set for makeUndefined().

Actually, the switch-case approach is analogous to the common abuses of RTTI in C++: every time a new type is added, you have to open all the methods that depends on the type info and update them, instead of having the classes implement those methods (with proper object hierarchy and overloading).

MATLAB does not have proper polymorphism, but can call the right method based on the first argument (or the latter ones if they have a proper dominance relationship: mind you that most PODs don’t), but this approach is as close as it can get to proper OO design despite we are just talking about PODs here.

This technique is especially valuable when you and TMW (or other users) have different ideas of what an English word (e.g. empty, defined, numeric) means. Like do you consider boolean (logical) numeric? TMW says no through isnumeric().

To give you an example, I made a tool to nicely plot arbitrary features in my @table over time (the equivalent of @timetable before TMW introduced it). It only make sense if the associated dependent variable (y-axis) can be quantified, so what I meant is broader than isnumeric(): it’s isConvertibleToDouble() since I casted my dependent variables with double() in between.

Boolean (logical) and categorical variables have quantifiable levels, so double() can be applied to them, they should return TRUE for isConvertibleToDouble() despite isnumeric() returns FALSE. They have the same behavior for basic types like double(), single(), char(), cellstr(), struct(), etc.

In summary,

1. You say what you really mean (by introducing nomenclature), NOT what it typically does
– this is like creating another indirection like half(x) instead of directly writing x/2 or x>>1.
– spend 90% of your time coming up with a very intuitive yet precise name. ‘Misspoke’ == Bug!
2. The new data types self-manage through implementing methods used by your code.
– assume nothing about input type other than the interfaces they are accessed through
(the traditional approach knows exactly what inputs they’re going to see)
– if you did #1 correctly, there’s no reason to foresee/prepare-for new input types
(just implement the methods for the input data types that you want it to run for now)
– no sweep (switch-otherwise) case to mishandle** unexpected new input data types
(because it won’t run on an input data type until all called methods are implemented)
– introducing new input data types won’t break the core code for existing types.
(new input data types can only break themselves if they implemented the methods wrong)

* This is tricky business. MATLAB doesn’t have polymorphism, but will look into the FIRST dominant input argument and load the appropriate classes. Usually it’s the first argument, but for non-POD classes, you can specify the dominance relationship (Inferior classes). Actually little has been said about such relationship in PODs in the official documentation.

I called support and found that there’s no dominance relationship in PODs, so it’s pretty much the first argument. That means this trick does not work if you want to overload bsxfun() for say, nominal() objects (which doesn’t have a bsxfun() implementation) keeping the same input argument order because the first argument is a function handle for both the factory and the user method. Bummer!

This is why the new ‘*_fun‘ functions I write, I always put the object to operate on as the first argument whenever possible. Gets a little bit ugly when I want to support multiple inputs like cellfun(), so I have to weight whether it’s worth the confusion for the overloading capability.

** Unless you want to torture yourself by listing all recognized types and make sure the ‘switch-otherwise‘ case fails. There good coding discipline that’s more tedious (but less error-prone), but they are obsolete when a structurally (and strictly) better approach is found.

433 total views, 1 views today

## MATLAB Techniques: Resuming loops in a script

If you have a time-consuming for-loop in a script and you want to terminate it for some reason (like checking partial results, debugging, etc) but you don’t want to start over again. What would you do if you want minimal typing each time you stop?

Here’s how I do it:

if( exist('k', 'var') ) k0=k; else k0=1; end
for k=k0:1000
end

If you want to restart the loop, simply enter k=1 in the command prompt and you’re good to go. Otherwise it will pick up where you left off.

324 total views, no views today

## MATLAB Fundamentals: Vectorization

A coworker whose background is in embedded systems (with a C background and no MATLAB at all), after hearing my rants that people are coding MATLAB like C using way more for-loops than necessary, asked me if he has two vectors,

a = 0:32767;
b = 0:32767;

and he want all combinations of the elements in and so that for each index pair , he will get

There are combinations out there. At first, I showed him the typical method shown in the MATLAB’s introduction materials:

% Should have used ndgrid() for a more natural (column first) layout
[B, A] = meshgrid(a, b);

C = 167*(A+42)./(B+17)

Then he asked, ‘This way I have to store the matrices and . Wouldn’t it be memory intensive? Is there a better way to do it like with functional programming?’ Now I have to show him a more advanced trick that requires some mental leaps (the ones necessary to get sophisticated at the MATLAB language):

C = 167*bsxfun(@rdivide, a'+42, b+17)

This one liner does not save intermediate input values, so it’s memory efficient as well.

bsxfun() is a function that takes two inputs (we call it a binary function) which any of them can be a matrix, vector or scalar. It will conceptually expand the dimensions so the function handle (e.g. @rdivide) get to apply to all combinations as if the inputs are expanded (repeated) to the longer of each dimension supplied. I bet under the hood it’s just a pair of for-loops with the loop increments managed so it doesn’t waste memory storing the intermediaries.

In the example above, I have a column and a row . The output is arranged as if  is copied right to meet the length of , and is copied down to meet the length of .

This involves two major concepts one needs to program the MATLAB way : vectorization and anonymous functions. Not something you’d tell a day-zero beginner (probably scare them off), but showing them a Ninja trick after they understand the beginner’s method might motivate them to learn the true power of MATLAB.

340 total views, 1 views today

## Structuring your MATLAB code base

When your MATLAB project gets large, or when you have a lot of projects, it’s time to consider restructuring your code and libraries so you can focus on what matters the most instead of plowing through the mess you just created.

For my projects, I usually create a file called ‘managedPathAndFiles_{myProjectName}.m’ at the top-level folder. The comments in the demo code below highlight the techniques used:

function [file, folder] = managedPathAndFile_myProject(isRegenerated)
% isRegenerated: set to 'false' to skip addpath() (which is slow)

% Optional default input arguments by name instead of couting nargin
if( ~exist('isRegenerated', 'var') )
isRegenerated = true;
end

% Note the use of nested structures (like replacing _ with .)
% You can use the hierarchy to group folders/files in a way you can
% operate on them in one-shot

% Use the location of this file as anchor
% 'pwd' is unreliable because you can call this from other folders if
% it's in MATLAB's path
folder.root = fileparts( mfilename('fullpath') );

% Include your project specific subroutines in the MATLAB path
% Use fullfile() to generate platform independent folder structures
folder.core.root = fullfile(folder.root, 'core');
folder.core.helper = fullfile(folder.core.root, 'helper');
% Add all the paths under the folder in one shot
if( isRegenerated )
% '-end' avoids name conflict by yielding to the incumbent paths
end

% Automatically create data storage folder
folder.data.root = fullfile(folder.root, 'data');
folder.data.cache = fullfile(folder.data.cache, 'data');
if( isRegenerated )
% Outputting something will absorb the error if the path alreayd
% exist. I made a mkdir_silent() in my libraries, but used the
% native call here for demo.
[~, ~] = structfun(@mkdir, folder.data);
end

% Sometimes you might have config or external files to load
file.data.statistics = fullfile(folder.data.root, 'statistics.mat');

end

Many people don’t know about the function genpath() so they ended up lumping all their dependencies in one folder which makes my eyes bleed. Organize your files into a tree that makes sense now!

I’d recommend any serious MATLAB developer build their own library folder with a consistent naming and a sensible tree hierarchy. After looking into FEX and what’s natively available in MATLAB and you still need to roll out your own code, you’re likely to rediscover what you’ve already built just by establishing a new .m file/function you are about to write in the folder you’d most naturally put it in (like people with like mind: self, tend to pick the same names).

Sometimes you have to whip up some ‘crappy’ code that doesn’t generalize (or can be reused) in other contexts. Consider putting them in a /private folder under the caller so it won’t be visible to everybody else. Of course, I encourage people spend a little more time to write the code properly so it can be put in your own MATLAB library.

445 total views, 2 views today

## How I learned MATLAB inside out

Back when I was in struggling graduate student working 3 university jobs to stay afloat, one of the job was to build a multi-center data collection system that take cares of remote data upload, store it in a database and visualize the waveforms and records.

I surveyed other platforms and languages for a couple of months and finally settled on MATLAB because at the time MATLAB had the most convenient data types (cells), data loading/saving (‘.mat’ files so I don’t have to manage the datatypes/format), external interfaces, and most importantly MATLAB FileExchange (FEX) pretty much cover every generic idea I can think of. With MATLAB, my work is pretty much down to coding the high level ‘business’ logic.

And no, I wasn’t biased towards MATLAB at the time because I have a signal processing background. I didn’t know much about MATLAB’s programming support back then other than number crunching (just like the 90% of the public who misundestood the power of the language), so I wouldn’t choose it for a software project at that level of complexity without much research.

Learning the guts of MATLAB, architecturing and coding the entire system took me only 3 months (well, it included a 1 month non-stop 16 hours a day, 7-days a week shut-in programming). Not a shabby platform for a project that is supposed to take 4 years. In fact, the rest of the time was spent

1. reading the last owners *#@&ed up perl code and fighting to get the fragile linux setup to work on other machines, then reverse the entire project requirements from the source code because there wasn’t any documentation and the previous owners graduated.
2. reconstruct Guidant’s half-finished (done by a 3rd party then abandoned) binary data reader by finishing the hardest part of the incomplete XSLT code.

The rest that has to do with MATLAB was relatively easy once I learned the main ideas through their documentation, newsgroups, Loren’s blog and the official support.

To understand and appreciate the beauty of MATLAB and use it effectively, you have to get past the following hurdles:

1. Basic data structures: cell, struct and language features.
2. Vectorization: use for-loops only in limited cases
3. Anonymous function (Lambdas), cellfun(), arrayfun(), bsxfun(), structfun()
5. Tables (Heterogeneous data structures) and Categorical objects (Nominal, Ordinal)

or else you are coding it like a C programmer: a complete waste of MATLAB’s license fees. If you know these 5 aspects of MATLAB well and would say there are strictly superior options out there, please let me know in the comments section and I’ll look into it.

470 total views, 1 views today

## MATLAB Quirks: cellfun() high-performance trap

cellfun() is a powerful function in MATLAB which mirrors the idea of applying a ‘map’ operation (as in functional programming) to monads (cells: a datatype that holds any arbitrary data type, including itself).

One common use is to identify which cells are empty so you can ignore them. Typically, you can do

index.emptyCells = cellfun(@isempty, C);

According to the help (H1) page, there is a limited set of strings that you can use in place of the function handle (functor) for backward compatibility:

A = CELLFUN('fun', C), where 'fun' is one of the following strings,
returns a logical or double array A the elements of which are computed
from those of C as follows:

'isreal'     -- true for cells containing a real array, false
otherwise
'isempty'    -- true for cells containing an empty array, false
otherwise
'islogical'  -- true for cells containing a logical array, false
otherwise
'length'     -- the length of the contents of each cell
'ndims'      -- the number of dimensions of the contents of each cell
'prodofsize' -- the number of elements of the contents of each cell

A = CELLFUN('size', C, K) returns the size along the K-th dimension of
the contents of each cell of C.

A = CELLFUN('isclass', C, CLASSNAME) returns true for each cell of C
that contains an array of class CLASSNAME.  Unlike the ISA function,
'isclass' of a subclass of CLASSNAME returns false.

Turns out these functions have their native implementation and runs super-fast. But I got burned once when I tried to use this on cells containing table() objects:

index.emptyCells = cellfun('isempty', cellsOfTables);

I meant to find out if I have zero-row or zero-column (empty) table objects with that. It gave me all false even when my cells have these 0-by-0 tables. What a Terrible Failure?! Turns out these ‘backward compatibility’ native implementations (I guess they already have cellfun() before having function handles) looks at the raw data stream like PODs (Plain Old Datatypes) as a C program would do.

A table() object has lots of stuff stored in it like variable (column) names, so there’s no way a program looking at an arbitrary binary stream without accounting for such data type will consider that object empty. It’s up to the overloaded isempty() or numel() of the class to tell what is empty and what’s not, but it needs to be called by the function handle to establish which method to call.

Lesson learned: don’t use those special string based functor in cellfun() unless you know for sure it’s a POD. Otherwise it will get you the wrong answer at the speed of light.

1,076 total views, 2 views today

## Make your MATLAB code work everywhere

MATLAB is used in many different setups that it’s hard to expect every line of your code you write generalizes to all other versions, platform and deployments. The following code examples helps you identify what envirnoment you are in and so you can branch accordingly:

% 32 bit vs 64 bit MATLAB
% This exploits the fact that mex is on the same platform as your MATLAB
is64bit = strcmpi( mexext(), 'mexw64');

% Handling different MATLAB versions
if( verLessThan('matlab', yourVersionString) )
error('This code requires %s or above to run!', yourVersionString);
end