MATLAB Compatibility: nominal() and ordinal() objects since R2013a are not compatible with R2012b and before

In the old days (before R2013a), nominal() and ordinal() were separate parallel classes with astoundingly similar structures. That means there’s a lot of copy-paste-mod going on. TMW improved on it by consolidating the ideas into a new categorical() class, which nominal() and ordinal() derives from it.

The documentation mentioned that nominal() and ordinal() might be deprecated in the future, but I contacted their support urging them not to. It’s not for compatibility reasons: nominal() and ordinal() captures the common use cases that these two ideas do not need to be unified, and the names themselves clearly encodes the intention.

If the user want to exploit the commonalities between the two, either it’s already taken care of by the parent’s public methods, or the object can be sliced to make it happen. I looked into the source code for nominal() and ordinal(): it’s pretty much a wrapper over categorical’s methods yet the interface (input arguments) are much simpler and intuitive because we don’t have to consider all the more general cases.

Back to the titled topic. Because categorical()’s properties (members) are different from pre R2013a’s nominal() and ordinal() objects, the objects created in R2012b or before cannot be loaded correctly in newer versions. That means the backward compatibility is completely broken for nominal()/ordinal() as far as saved objects are concerned.

There’s no good incentive to solve this problem on the TMWs side because the old nominal()/ordinal() is short-lived and they always want everybody to upgrade. Since I use nominal() most of the time and the ones that really need to be saved are all nominal(), I recommend the converting (‘casting’) them to cellstr by

>> A = nominal({'a','a','b','c'});
>> A = cellstr(A)
A = 
    'a'    'a'    'b'    'c'

Remember, nominal() is pretty much compressing a ton of cellstr into a few unique items and mapping the indices. No information is lost going back and forth between cellstr() and nominal(). It’s just a little extra computations for the conversion.

As for ordinal(), I rarely need to save it because order/level assignment is almost the very last thing in the processing chain because it changes so frequently (e.g. how would you draw the lines for six levels of fatness?), I might as well just not save it and reprocess the last step (where the code with ordinal() sits) when I need it.

Nonetheless, if you still want to save ordinals() instead of re-crunching it, this time you’ll want to save it as numerical levels by casting the ordinal() into double():

>> A = ordinal([1 2 3; 3 2 1; 2 1 3],{'low' 'medium' 'high'}, [3 1 2])
A = 
     medium      high        low    
     low         high        medium 
     high        medium      low    
>> D = double(A)
D =
     2     3     1
     1     3     2
     3     2     1
>> U = unique(A)
U = 
     low 
     medium 
     high 
>> L = cellstr(U)
L = 
    'low'
    'medium'
    'high'
>> I = double(U)
I =
     1
     2
     3
>> A_reconstructed = ordinal(D, L, I)
A_reconstructed = 
     medium      high        low    
     low         high        medium 
     high        medium      low

You’ll save (D, L, I) from old MATLAB and load it and reconstruct it with the triplets from the new MATLAB (I’d suggest using structs to keep track of the triplets). I know it’s a hairy mess!

 

Loading

MATLAB Gotchas: Adding whitespace in strcat()

strcat() is a very handy function in MATLAB that allows you to combine strings using a mixture of cellstr() and char strings and it will auto-expand the char strings to match the cellstr() if necessary.

However, by design intention, strcat() removes trailing white spaces by internally applying deblank() to all char string inputs. It does NOT deblank cellstr() inputs. So if you want to combine date and time with a space, you have to use {‘ ‘} instead of ‘ ‘:

date = '2000-01-01';
time = '00:00:01';
>> strcat(date, ' ', time)  % The ' ' is ignored
ans =
2000-01-0100:00:01
>> strcat(date, {' '}, time)  % The ' ' is ignored
ans = 
    '2000-01-01 00:00:01'
>>

I find this more confusing than helpful. Including myself and other users, we naturally resort to processing line by line using cellfun() or other tricks just to get around the deblank() problem without taking a second look at the documentation because

  • rolling our own implementation is marginally as annoying as the deblank()
  • we expect cellstr() to match the dimensions without auto-expanding. I naturally thought it would expand only if it’s a char string.

Well, somebody asked this question on the newsgroup before, so obviously it’s not an intuitive design. It make sense to do it the way MATLAB designed strcat() because we need some way to tell MATLAB whether I want my inputs deblanked or not.

I think it’s more intuitive to have MATLAB’s default strcat() not to deblank() char strings at all and have a strcat_deblanked() that deblanks the inputs before feeding into strcat().

Unfortunately this behavior is there for a long time, so it’s too late to change it without affecting compatibility. Might as well live with it, but this is one of the very few unnatural (or slightly illogical design choice) of MATLAB to keep in mind.

Loading

MATLAB Features: Persistent Excel ActiveX (DCOM) for xlsread() and xlswrite() R2015b

xlsread() and everything that calls it, such as readtable(), is terribly slow, especially when you have a boatload of Excel files to process. The reason behind it is that xlsread() closes that DCOM handle (which closes the Excel COM session) after it finishes, and restart Excel (DCOM) again when you call xlsread() again to load another spreadsheet file.

That means there’s a lot of opening and closing of the Excel application if you process multiple spreadsheets. The better strategy, which is covered extensively in MATLAB’s File Exchange (FEX), is to have your program open one Excel handle, and process all the spreadsheets within the same DCOM handle before closing it.

This strategy is quite overwhelming for a beginner, and even if you use FEX entries, you still cannot get around the fact that you have to know there’s a handle that manages the Excel session and remember to close it after you are done with it. Nothing beats having xlsread() do it automatically for you.

Starting with R2015b, the Excel DCOM handle called by xlsread() is now persistent: that after you make the first call to xlsread(), Excel.exe will stay in the memory unless you explicitly clear persistent variables or exit MATLAB, so you can reuse them every time xlsread() or xlswrite() is called. Finally!

The code itself is pretty slick. You can find it in ‘matlab.io.internal.getExcelInstance()’. Well, I guess it’s not hard to come up with it, but I guess in TMW, they must have a heated debate about whether it’s a good idea to keep Excel around (taking up resources) when you are done with it. With the computation power required to run R2015b, an extra Excel.exe lying around should be insignificant. Good call!

 

Loading