Switch between 32-bit and 64-bit user written software like CVX

CVX is a very convenient convex optimization package that allows the user to specify the optimization objective and constraints directly instead of manually manipulating them (by various transformations) into forms that are accepted by commonly available software like quadprog().

What I want to show today is not CVX, but a technique to handle the many different versions of the same program targeted at each system architecture (32/64-bit, Windows/Mac/Linux). Here’s a snapshot of what’s available with cvx:

OS 32/64 mexext Download links
Linux 32-bit mexglx cvx-glx.zip
64-bit mexa64 cvx-a64.zip
Mac 32-bit mexmaci cvx-maci.zip
64-bit mexmaci64 cvx-maci64.zip
Windows 32-bit mexw32 cvx-w32.zip
64-bit mexw64 cvx-w64.zip

You can download all packages for different architectures, but make a folder for each of them by their mexext() name. For example, 32-bit Windows’ implementation can go under /mexw32/cvx. Then you can programmatically initialize the right package for say, your startup.m file:

run( fullfile(yourLibrary, mexext(), 'cvx', 'cvx_startup.m') );

I intentionally put the /[mexext()] above /cvx, not the other way round because if you have many different software packages and want to include them in the path, you can do it in one shot without filtering for the platform names:

addpath( genpath( fullfile(yourLibrary, mexext()) ) );

You can consider using computer(‘arch’) in place of mexext(), but the names are different and you have to name your folders accordingly. For CVX, it happens to go by mexext(), so I naturally used mexext() instead.

102 total views, no views today

MATLAB Gotchas: Do NOT use getlevels(), getlabels() or categories() for categorical/nominal/ordinal objects Use unique(), unique(cellstr()) instead

I suspect TMW (The MathWorks, maker of MATLAB) hasn’t really thought about dead levels when a categorical object (I mean nominal() and ordinal() as well since they are wrapper child class of categorical()) has elements removed so that some levels doesn’t map to any elements anymore.

For performance reasons, it makes sense to keep the dead levels in because the user can repetitively add and remove the same last level by deleting and adding the same element, causing unnecessary work each time. Naturally, there’s a getlevels()/getlabels()/categories() method in nominal(),ordinal()/categorical() class so you know what raw levels are available. Turns out it’s a horrible idea to expose the raw levels when dead levels are allowed!

Unless you are dealing with the internals of categorical objects, there’s very little reason why one would care or want to know about the dead levels (it’s just a cache for performance). It’s the active levels that are currently mapped to some elements that matters when user make such queries, which is handled correctly by unique().

If there are no dead levels, getlevels() is equivalent to unique(), while categorical() and getlabels() are equivalent to unique(cellstr()), but I’m very likely to run into dead levels because I delete rows of data when I filter by certain criterion.

My first take on it would be to hide getlevels()/getlabels()/categories() from users. But over the years, I’ve grown from a conservative software point of view to accepting more liberal approach, especially after exposure to functional programming ideas. That means I’d rather have a way to know what’s going on inside (keep those functions there), but I’d like to be warned that it’s an evil feature that shouldn’t be used lightly.

Yes, I’m dissing the use of getlevels()/getlabels()/categories() like the infamous eval(). Once in a long while, it’s the neat approach. But for 99% of the time, it’s a strictly worse solution that causes a lot of damages. It’s way more unlikely that getlevels()/getlabels()/categories() will yield what you really mean with dead levels than multiple inheritance in C++ being the right approach on the first try.

If I use unique() all the time, why would I even bother to talk about getlevels()/getlabels()/categories() since I never used them? It’s because TMW didn’t warn users about the dangers in their documentation. These methods looks legit and innocent, but it’s a usage trap like returning stack pointers in C/C++ (you can technically do it, but with almost 100% certainty, you are telling the computer to do something you don’t mean to, in short: wrong).

I have two encounters that other people using the raw categorical levels burns me:

  1. One of my coworkers spoke against upgrading our MATLAB licenses (later withdrew his opposition) because the new versions breaks his old code involving nominal()/ordinal() objects. I was perplexed because it didn’t break any of my code despite I used more nominal() and ordinal() objects than anybody in my vicinity.On close inspection, he was using getlevels() and getlabels() all over the place instead of unique(), which works seamlessly in the new MATLAB. Remember I mentioned that the internal design/implementation details of nominal()/ordinal() changed in MATLAB R2013a? The internal treatment of dead levels changed, which is not relevant to end-users, but getlevels()/getlabels() exposed the dead levels and users unwittingly write code that depends on it.
  2. The default factory-shipped grpstats() is still ‘broken’ as of R2015b! If you feed grpstats() with a nominal grouping variable, it will give you lines of NaN because it was programmed spits out one row for each level (dead or alive) in the grouping variable. Since the dead levels has nothing to group by the reduction function (@mean if not specified), it spits out NaNs. This is traced to how grp2idx() is used internally: If the grouping variable is a cellstr() or double(), the groups are generated by using unique(), so there are no dead levels whatsoever. But if the grouping variable is a categorical, the developers thought their job is done already and just took it directly from the categorical object’s properties by calling getlabels() and getlevels():
    gidx = double(s);
    ...
    gnames = getlabels(s)';
    glevels = getlevels(s)';

    Apparently they forgot that there’s a reason why the categorical/unique() has the same function name as double/unique() and cellstr/unique(): the point of overloading is to have the same function name for the same intention! The intention of unique() should be uniformly applied across all the data types applicable. Think twice before relying on language support for type info (type traits in C++) to switch code when you can use overloading(MATLAB)/polymorphism(C++). A good architecture should lead you to the correct code logic without the need of overriding good practices.

    Rants aside, grpstats() will work as intended if those lines in grp2idx() are changed to:

    gidx = double(s);
    ...
    glevels = unique(s(:));
    gnames = cellstr(glevels);
    

    A higher level fix would be applying grp2idx() to the grouping variable before it was fed into grpstats():

    grpstats(X, grp2idx(g), ...)

    The rationale is that the underlying contents doesn’t matter for grouping variables as long as each of them uniquely stand for the group they represent! In other words, categorical() objects are seen as nothing but a bunch of integers, which can be obtained by casting it to double():

    gidx = double(s);
    grpstats(X, gidx, ...)

    This is what grp2idx() calls under the hood anyway when it sees a categorical. The grp2idx() called from grpstats() will see a bunch of integers, which will correctly apply unique() to them, thus removing all dead levels.

    Of course, use grp2idx() instead of double() because it works across all data types that applies. Why future-constrain yourself when a more generic implementation is already available?

    The sin committed by grpstats() over nominal() is that the variables in glevels and gnames shouldn’t get involved in the first place because they don’t matter and shouldn’t even show up in the outputs. This is what’s fundamentally wrong about it:

    [group,...,ngroups] = mgrp2idx(group,rows);
    ...
    // This code assumes there are no gaps in group levels (gnum), which is not always true.
    for gnum = 1:ngroups
        groups{gnum} = find(group==gnum);
    end

    We can either blame the for-loop for not skipping dead levels, or blame mgrp2idx (a wrapper of grp2idx) for spitting out the dead levels. It doesn’t really matter which way it is. The most important thing is that dead levels were let loose, and nobody in the developer-user chain understand the implications enough to stop the problem from propagating to the final output.

To summarize, the raw levels in categorical objects is a dirty cache including junk you do not want 99.99% of the time. Use unique() to get the meaningful unique levels instead.

136 total views, no views today

MATLAB Compatibility: nominal() and ordinal() objects since R2013a are not compatible with R2012b and before

In the old days (before R2013a), nominal() and ordinal() were separate parallel classes with astoundingly similar structures. That means there’s a lot of copy-paste-mod going on. TMW improved on it by consolidating the ideas into a new categorical() class, which nominal() and ordinal() derives from it.

The documentation mentioned that nominal() and ordinal() might be deprecated in the future, but I contacted their support urging them not to. It’s not for compatibility reasons: nominal() and ordinal() captures the common use cases that these two ideas do not need to be unified, and the names themselves clearly encodes the intention.

If the user want to exploit the commonalities between the two, either it’s already taken care of by the parent’s public methods, or the object can be sliced to make it happen. I looked into the source code for nominal() and ordinal(): it’s pretty much a wrapper over categorical’s methods yet the interface (input arguments) are much simpler and intuitive because we don’t have consider all the more general cases.

Back to the titled topic. Because categorical()’s properties (members) are different from pre R2013a’s nominal() and ordinal() objects, the objects created in R2012b or before cannot be loaded correctly in newer versions. That means the backward compatibility is completely broken for nominal()/ordinal() as far as saved objects are concerned.

There’s no good incentive to solve this problem on the TMWs side because the old nominal()/ordinal() is short-lived and they always want everybody to upgrade. Since I use nominal() most of the time and the ones that really need to be saved are all nominal(), I recommend the converting (‘casting’) them to cellstr by

>> A = nominal({'a','a','b','c'});
>> A = cellstr(A)
A = 
    'a'    'a'    'b'    'c'

Remember, nominal() is pretty much compressing a ton of cellstr into a few unique items and mapping the indices. No information is lost going back and forth between cellstr() and nominal(). It’s just a little extra computations for the conversion.

As for ordinal(), I rarely need to save it because order/level assignment is almost the very last thing in the processing chain because it changes so frequently (e.g. how would you draw the lines for six levels of fatness?), I might as well just not save it and reprocess the last step (where the code with ordinal() sits) when I need it.

Nonetheless, if you still want to save ordinals() instead of re-crunching it, this time you’ll want to save it as numerical levels by casting the ordinal() into double():

>> A = ordinal([1 2 3; 3 2 1; 2 1 3],{'low' 'medium' 'high'}, [3 1 2])
A = 
     medium      high        low    
     low         high        medium 
     high        medium      low    
>> D = double(A)
D =
     2     3     1
     1     3     2
     3     2     1
>> U = unique(A)
U = 
     low 
     medium 
     high 
>> L = cellstr(U)
L = 
    'low'
    'medium'
    'high'
>> I = double(U)
I =
     1
     2
     3
>> A_reconstructed = ordinal(D, L, I)
A_reconstructed = 
     medium      high        low    
     low         high        medium 
     high        medium      low

You’ll save (D, L, I) from old MATLAB and load it and reconstruct it with the triplets from the new MATLAB (I’d suggest using structs to keep track of the triplets). I know it’s a hairy mess!

 

120 total views, no views today

MATLAB Gotchas: Adding whitespace in strcat()

strcat() is a very handy function in MATLAB that allows you to combine strings using a mixture of cellstr() and char strings and it will auto-expand the char strings to match the cellstr() if necessary.

However, by design intention, strcat() removes trailing white spaces by internally applying deblank() to all char string inputs. It does NOT deblank cellstr() inputs. So if you want to combine date and time with a space, you have to use {‘ ‘} instead of ‘ ‘:

date = '2000-01-01';
time = '00:00:01';
>> strcat(date, ' ', time)  % The ' ' is ignored
ans =
2000-01-0100:00:01
>> strcat(date, {' '}, time)  % The ' ' is ignored
ans = 
    '2000-01-01 00:00:01'
>>

I find this more confusing than helpful. Including myself and other users, we naturally resort to processing line by line using cellfun() or other tricks just to get around the deblank() problem without taking a second look at the documentation because

  • rolling our own implementation is marginally as annoying as the deblank()
  • we expect cellstr() to match the dimensions without auto-expanding. I naturally thought it would expand only if it’s a char string.

Well, somebody asked this question on the newsgroup before, so obviously it’s not an intuitive design. It make sense to do it the way MATLAB designed strcat() because we need some way to tell MATLAB whether I want my inputs deblanked or not.

I think it’s more intuitive to have MATLAB’s default strcat() not to deblank() char strings at all and have a strcat_deblanked() that deblanks the inputs before feeding into strcat().

Unfortunately this behavior is there for a long time, so it’s too late to change it without affecting compatibility. Might as well live with it, but this is one of the very few unnatural (or slightly illogical design choice) of MATLAB to keep in mind.

194 total views, 1 views today

MATLAB Features: Persistent Excel ActiveX (DCOM) for xlsread() and xlswrite() R2015b

xlsread() and everything that calls it, such as readtable(), is terribly slow, especially when you have a boatload of Excel files to process. The reason behind it is that xlsread() closes that DCOM handle (which closes the Excel COM session) after it finishes, and restart Excel (DCOM) again when you call xlsread() again to load another spreadsheet file.

That means there’s a lot of opening and closing of the Excel application if you process multiple spreadsheets. The better strategy, which is covered extensively in MATLAB’s File Exchange (FEX), is to have your program open one Excel handle, and process all the spreadsheets within the same DCOM handle before closing it.

This strategy is quite overwhelming for a beginner, and even if you use FEX entries, you still cannot get around the fact that you have to know there’s a handle that manages the Excel session and remember to close it after you are done with it. Nothing beats having xlsread() do it automatically for you.

Starting with R2015b, the Excel DCOM handle called by xlsread() is now persistent: that after you make the first call to xlsread(), Excel.exe will stay in the memory unless you explicitly clear persistent variables or exit MATLAB, so you can reuse them every time xlsread() or xlswrite() is called. Finally!

The code itself is pretty slick. You can find it in ‘matlab.io.internal.getExcelInstance()’. Well, I guess it’s not hard to come up with it, but I guess in TMW, they must have a heated debate about whether it’s a good idea to keep Excel around (taking up resources) when you are done with it. With the computation power required to run R2015b, an extra Excel.exe lying around should be insignificant. Good call!

 

535 total views, no views today

MATLAB Techniques: Who’s your daddy? Ask dbstack(). Unusual uses of dbstack()

Normally having your function know about its caller (other than through the arguments we pass onto the stack) is usually a very bad idea in program organization, because it introduces unnecessary coupling and hinders visibility.

Nonetheless, debugger is a built-in feature of MATLAB and it provides dbstack() so you have access to your call stack as part of your program. Normally, I couldn’t come up with legitimate uses of it outside debugging.

One day, I was trying to write a wrapper function that does the idiom (mentioned in my earlier blog post)

fileparts( mfilename('fullpath') );

because I want the code to be self-documenting. Let’s call the function mfilepath(). Turns out it’s a difficult problem because mfilename(‘fullpath’) reports the path of the current function executing it. In the case of a wrapper, it’s the path of the wrapper function, not its caller that you are hoping to extract its path from.

In other words, if you write a wrapper function, it’s the second layer of the stack that you are interested in. So it can be done with dbstack():

function p = mfullpath()
  ST = dbstack('completenames');
  try
    p = ST(2).file;
  catch
    p = '';
  end

Since exception handling is tightly knit into MATLAB (unlike C++, which you pay for the extra overhead), there aren’t much performance penalty to use a try…catch() block than if I checked if ST actually have a second layer (i.e. has a non-base caller). I can confidently do that because there is only one way for this ST(2).file access operation to fail: mfullpath() is called from the base workspace.

Speaking of catchy titles, I wonder why Loren Shure, a self-proclaimed lover of puns and the blogger of the Art of MATLAB, haven’t exploited the built-in functions ‘who’ and ‘whos’ in her April Fools jokes like

whos your daddy
who let the dogs out

Note that these are legitimate MATLAB syntax that won’t throw you an exception. Unless you have ‘your’, ‘daddy’, ‘let’, ‘the’, ‘dogs’, ‘out’ as variable names, the above will quietly show nothing. It’d be hilarious if they pass that as an easter egg in the official MATLAB. They already have ‘why’,

why not

Error using rng (line 125)
First input must be a nonnegative integer seed less than 2^32, 'shuffle', 'default', or generator settings captured previously using S = RNG.

Error in why (line 10)
 dflt = rng(n,'v5uniform');

 

 

 

171 total views, 1 views today

MATLAB Quirks: struct with no fields are not empty

As far as struct() is concerned, I’m more inclined to using Struct of Array (SoA) over Array of Structs (AoS), unless all the use cases screams for SoA. Performance and memory overhead are the obvious reasons, but the true motivation for me to use SoA is that I’m thinking in terms of table-oriented programming (which I’ll discuss in later posts. See table() objects.): each field of a struct is a column in a table (heterogeneous array).

Since a table() is considered empty (by isempty()) if it has EITHER 0 rows INCLUSIVE OR 0 columns (no fields) and the default constructor creates a 0 \times 0 table, I thought struct() would do the same. NOT TRUE!

First of all, the default constructor of struct() gives ONE struct with NO FIELDS (so it’s supposed to correspond to a 1 \times 0 table). What’s even harder to remember is that struct2table(struct()) gives a 0 \times 0 table.

The second thing I missed is that a struct() with NO fields is NOT empty. You can have 3 structs with NO fields! So isempty(struct()) is always false!

I usually run into this problem when I want to seed the execution with an empty struct() and have the loop expand the fields if the file has contents in it, and I’ll check if the seeded struct was untouched to see if I can read data from the file. Next time I will remember to call struct([]) instead of struct(). What a trap!

At the end of the day, while struct is powerful, but I rarely find AoS necessary to do what I wanted once table() is out. AoS has pretty much the same restrictions as in table() that you cannot put different types in the same field across the AoS, but table allows you to index with variables (struct’s field) or rows (struct array index) without changing the data structure (AoS <-> SoA). So unless it’s a performance critical piece of the code, I’ll stick with tables() for most of my struct() needs.

 

264 total views, no views today

MATLAB Techniques: onCleanup() ‘destructor’

If your program opens a file, creates a timer(), or whatever resources that needs to be closed when they are no longer needed, before R2008a, you have to put your close resource calls at two places: one at the end of successful execution, the other at the exception handling in try…catch block:

FID = fopen('a.txt')
try
   // ... do something here
   fclose(FID);
catch
   fclose(FID);
end

Not only it’s messy that you have to duplicate your code, it’s also error prone when you add code in between. If your true intention is to close the resource whenever you exit the current program scope, there’s a better way for you: onCleanup() object. The code above can be simplified as:

FID = fopen('a.txt')
obj = onCleanup(@() fclose(FID));
// ... do something with FID here

The way onCleanup() works is that it creates an object which you define what its destructor (delete()) does on creation (by the constructor of course) by specifying a function handle. This way when ‘obj’ is cleared (either as it goes out of scope or your explicitly cleared it with ‘clear’), the destructor in ‘obj’ will be activated and do the cleanup actions you specified.

Due to copyright reasons, I won’t copy the simple code here. Just open onCleanup.m in MATLAB editor and you’ll see it that the code (excluding comments) has less words than the description above. Pretty neat!

Normally we use onCleanup() inside a function. The best place to put is is right after you opened a resource because anything in between can go wrong (i.e. might throw exceptions): you want ‘obj’ to be swept (i.e. its destructors called) when that happens.

Technically, you can make an onCleanup() object in the base (root) workspace (aka command window). The destructor will be triggered either when you clear the ‘obj’ explicitly using ‘clear’ or when you exit MATLAB. You can see for yourself with this:

obj = onCleanup(@() pause);

It kind of let you do a one-off cleanup on exit instead of a recurring cleanup in finish.m.

So the next time you open a resource that needs to be closed whether the program exits unexpectedly or not, use onCleanup()! It’s one of the elegant, smart uses of OOP.

 

168 total views, no views today

MATLAB Techniques: Self-identifying (by type) methods

We all know MATLAB by default fill numbers with 0 if we haven’t specified them (such as expanding a matrix by injecting values beyond the original matrix size). Cells are default filled with {[]} even if you meant to have cellstr() {”} across the board. Sometimes it’s not what we want. 0 can be legitimate value, so we want NaN to denote undefined values. Same as cellstr(), we don’t want to choke the default string processing programs because one stupid {[]} turns the entire cell array into to a non-cellstr array.

For compatibility reasons (and it’s also hard to do so), it’s not a good idea to simply modify the factory behavior. I have something similar to table() objects that requires similar expansion on arbitrary data types, but MATLAB’s defaults proves to be clumsy.

Instead of writing swtich()-case() statements or a bunch of if() statements that relies on type information like this:

function x = makeUndefined(x)
  switch class(x)
    case {'double', 'single'}
      x = NaN(size(x));
    case 'cell'
      if( iscellstr(x) )
        x = repmat({''}, size(x));
      end
    % ...
  end

I found a slick way to do it so I don’t have to keep track of it again if I need the same defaults in other places: take advantage of the fact that MATLAB selectively will load the right method depending on the first input argument(s)*:

Create a commonly named method (e.g. makeUndefined()) for the PODs and put it under the POD’s @folder (e.g. /@double/makeUndefined.m, /@cell/makeUndefined.m). The functions look something like this:

function y = makeUndefined(x)
% This function must be put under /@double
  y = NaN(size(x));
function x = makeUndefined(x)
% This function must be put under /@cell
  if( iscellstr(x) )
    x = repmat({''}, size(x));
  end

Similarly, you can make your isundefined() implementation for @double, @cell, etc, just like the factory-shipped @categorical/isundefined() corresponding to the same rules you set for makeUndefined().

Actually, the switch-case approach is analogous to the common abuses of RTTI in C++: every time a new type is added, you have to open all the methods that depends on the type info and update them, instead of having the classes implement those methods (with proper object hierarchy and overloading).

MATLAB does not have proper polymorphism, but can call the right method based on the first argument (or the latter ones if they have a proper dominance relationship: mind you that most PODs don’t), but this approach is as close as it can get to proper OO design despite we are just talking about PODs here.


* This is tricky business. MATLAB doesn’t have polymorphism, but will look into the FIRST dominant input argument and load the appropriate classes. Usually it’s the first argument, but for non-POD classes, you can specify the dominance relationship (Inferior classes). Actually little has been said about such relationship in PODs in the official documentation.

I called support and found that there’s no dominance relationship in PODs, so it’s pretty much the first argument. That means this trick does not work if you want to overload bsxfun() for say, nominal() objects (which doesn’t have a bsxfun() implementation) keeping the same input argument order because the first argument is a function handle for both the factory and the user method. Bummer!

This is why the new ‘*_fun’ functions I write, I always put the object to operate on as the first argument whenever possible. Gets a little bit ugly when I want to support multiple inputs like cellfun(), so I have to weight whether it’s worth the confusion for the overloading capability.

 

144 total views, no views today

MATLAB Techniques: Resuming loops in a script

If you have a time-consuming for-loop in a script and you want to terminate it for some reason (like checking partial results, debugging, etc) but you don’t want to start over again. What would you do if you want minimal typing each time you stop?

Here’s how I do it:

if( exist('k', 'var') ) k0=k; else k0=1; end
for k=k0:1000
  % Your code here
end

If you want to restart the loop, simply enter k=1 in the command prompt and you’re good to go. Otherwise it will pick up where you left off.

159 total views, no views today