MATLAB Quirks: cellfun() high-performance trap

cellfun() is a powerful function in MATLAB which mirrors the idea of applying a ‘map’ operation (as in functional programming) to monads (cells: a datatype that holds any arbitrary data type, including itself).

One common use is to identify which cells are empty so you can ignore them. Typically, you can do

index.emptyCells = cellfun(@isempty, C);

According to the help (H1) page, there is a limited set of strings that you can use in place of the function handle (functor) for backward compatibility:

 
A = CELLFUN('fun', C), where 'fun' is one of the following strings,
returns a logical or double array A the elements of which are computed 
from those of C as follows:
 
   'isreal'     -- true for cells containing a real array, false
                   otherwise 
   'isempty'    -- true for cells containing an empty array, false 
                   otherwise 
   'islogical'  -- true for cells containing a logical array, false 
                   otherwise 
   'length'     -- the length of the contents of each cell 
   'ndims'      -- the number of dimensions of the contents of each cell
   'prodofsize' -- the number of elements of the contents of each cell
 
A = CELLFUN('size', C, K) returns the size along the K-th dimension of
the contents of each cell of C.
 
A = CELLFUN('isclass', C, CLASSNAME) returns true for each cell of C
that contains an array of class CLASSNAME.  Unlike the ISA function, 
'isclass' of a subclass of CLASSNAME returns false.

Turns out these functions have their native implementation and runs super-fast. But I got burned once when I tried to use this on cells containing table() objects:

index.emptyCells = cellfun('isempty', cellsOfTables);

I meant to find out if I have zero-row or zero-column (empty) table objects with that. It gave me all false even when my cells have these 0-by-0 tables. What a Terrible Failure?! Turns out these ‘backward compatibility’ native implementations (I guess they already have cellfun() before having function handles) looks at the raw data stream like PODs (Plain Old Datatypes) as a C program would do.

A table() object has lots of stuff stored in it like variable (column) names, so there’s no way a program looking at an arbitrary binary stream without accounting for such data type will consider that object empty. It’s up to the overloaded isempty() or numel() of the class to tell what is empty and what’s not, but it needs to be called by the function handle to establish which method to call.

Lesson learned: don’t use those special string based functor in cellfun() unless you know for sure it’s a POD. Otherwise it will get you the wrong answer at the speed of light.

494 total views, 1 views today

Leave a Reply

Be the First to Comment!

Notify of
avatar
wpDiscuz