MATLAB Features: Persistent Excel ActiveX (DCOM) for xlsread() and xlswrite() R2015b

xlsread() and everything that calls it, such as readtable(), is terribly slow, especially when you have a boatload of Excel files to process. The reason behind it is that xlsread() closes that DCOM handle (which closes the Excel COM session) after it finishes, and restart Excel (DCOM) again when you call xlsread() again to load another spreadsheet file.

That means there’s a lot of opening and closing of the Excel application if you process multiple spreadsheets. The better strategy, which is covered extensively in MATLAB’s File Exchange (FEX), is to have your program open one Excel handle, and process all the spreadsheets within the same DCOM handle before closing it.

This strategy is quite overwhelming for a beginner, and even if you use FEX entries, you still cannot get around the fact that you have to know there’s a handle that manages the Excel session and remember to close it after you are done with it. Nothing beats having xlsread() do it automatically for you.

Starting with R2015b, the Excel DCOM handle called by xlsread() is now persistent: that after you make the first call to xlsread(), Excel.exe will stay in the memory unless you explicitly clear persistent variables or exit MATLAB, so you can reuse them every time xlsread() or xlswrite() is called. Finally!

The code itself is pretty slick. You can find it in ‘matlab.io.internal.getExcelInstance()’. Well, I guess it’s not hard to come up with it, but I guess in TMW, they must have a heated debate about whether it’s a good idea to keep Excel around (taking up resources) when you are done with it. With the computation power required to run R2015b, an extra Excel.exe lying around should be insignificant. Good call!

 

1,628 total views, no views today

MATLAB Techniques: Who’s your daddy? Ask dbstack(). Unusual uses of dbstack()

Normally having your function know about its caller (other than through the arguments we pass onto the stack) is usually a very bad idea in program organization, because it introduces unnecessary coupling and hinders visibility.

Nonetheless, debugger is a built-in feature of MATLAB and it provides dbstack() so you have access to your call stack as part of your program. Normally, I couldn’t come up with legitimate uses of it outside debugging.

One day, I was trying to write a wrapper function that does the idiom (mentioned in my earlier blog post)

fileparts( mfilename('fullpath') );

because I want the code to be self-documenting. Let’s call the function mfilepath(). Turns out it’s a difficult problem because mfilename(‘fullpath’) reports the path of the current function executing it. In the case of a wrapper, it’s the path of the wrapper function, not its caller that you are hoping to extract its path from.

In other words, if you write a wrapper function, it’s the second layer of the stack that you are interested in. So it can be done with dbstack():

function p = mfullpath()
  ST = dbstack('completenames');
  try
    p = ST(2).file;
  catch
    p = '';
  end

Since exception handling is tightly knit into MATLAB (unlike C++, which you pay for the extra overhead), there aren’t much performance penalty to use a try…catch() block than if I checked if ST actually have a second layer (i.e. has a non-base caller). I can confidently do that because there is only one way for this ST(2).file access operation to fail: mfullpath() is called from the base workspace.

Speaking of catchy titles, I wonder why Loren Shure, a self-proclaimed lover of puns and the blogger of the Art of MATLAB, haven’t exploited the built-in functions ‘who’ and ‘whos’ in her April Fools jokes like

whos your daddy
who let the dogs out

Note that these are legitimate MATLAB syntax that won’t throw you an exception. Unless you have ‘your’, ‘daddy’, ‘let’, ‘the’, ‘dogs’, ‘out’ as variable names, the above will quietly show nothing. It’d be hilarious if they pass that as an easter egg in the official MATLAB. They already have ‘why’,

why not

Error using rng (line 125)
First input must be a nonnegative integer seed less than 2^32, 'shuffle', 'default', or generator settings captured previously using S = RNG.

Error in why (line 10)
 dflt = rng(n,'v5uniform');

 

 

 

817 total views, no views today

Ebay customer support’s on-hold music: Weird Al?!!

A couple of months ago, before this blog started, I called eBay’s customer support and I was put on hold for a whole hour. It would have made my day miserable, but the on-hold music they played made my day: it’s Weird Al Yankovic’s parody song eBay!

Couldn’t stop chuckling for the first half hour despite the gruesome wait … I just couldn’t get enough of it even if loops every minute or two! I kept thinking: this can’t be true!

Yes, the eBay’s official hotline plays Weird Al’s parody of eBay to their customers while they wait! They didn’t even bleep out the word “crap” as in “this crap, shows up in, bubble wrap, almost every day“. Well, I guess they have to, as “crap” is essential to the spirit of the entire parody!

I love these people. They really have a sense of humor, and they are willing to make fun of themselves!

681 total views, 1 views today

Title of this blog site

Initially I started with “TMI: Too Much Information” as the title of this blog, given that my plan was to put fragments of technical information or insights I came across that might be useful for solving problems. That means the blog posts contain more than what you want to know, unless you are looking to solve a specific problem with the help of the post or you are just outright nerdy.

But soon I realized I have some non-technical stuff like gags, music, and the technical stuff covers more than just electronic measurement instruments, so I need a title that’s less common and more catchy.

Today, I came across a reddit post, which user “llllIlllIllIlIRogue Sysadmin “says:

If you don’t they’ll just hear jargon and glaze over completely and not even try to follow you. If you draw a pretty layout of everything, though, they’ll make some token effort to follow along.

They’ll still get lost but now you’re not just a nerd rambling… you’re a rambling nerd with a plan.

Very catchy! Also, I liked the original comment because it covers:

  • Passion for geeky topics
  • It’s important to communicate well so that people will bother to follow what we have to say.

I did a google search with quotes “rambling nerd with a plan” and only one entry: the original post, showed up, so it’s not a commonly used phrase. I’ll take it 🙂

807 total views, 1 views today

MATLAB Quirks: struct with no fields are not empty

As far as struct() is concerned, I’m more inclined to using Struct of Array (SoA) over Array of Structs (AoS), unless all the use cases screams for SoA. Performance and memory overhead are the obvious reasons, but the true motivation for me to use SoA is that I’m thinking in terms of table-oriented programming (which I’ll discuss in later posts. See table() objects.): each field of a struct is a column in a table (heterogeneous array).

Since a table() is considered empty (by isempty()) if it has EITHER 0 rows INCLUSIVE OR 0 columns (no fields) and the default constructor creates a 0 \times 0 table, I thought struct() would do the same. NOT TRUE!

First of all, the default constructor of struct() gives ONE struct with NO FIELDS (so it’s supposed to correspond to a 1 \times 0 table). What’s even harder to remember is that struct2table(struct()) gives a 0 \times 0 table.

The second thing I missed is that a struct() with NO fields is NOT empty. You can have 3 structs with NO fields! So isempty(struct()) is always false!

I usually run into this problem when I want to seed the execution with an empty struct() and have the loop expand the fields if the file has contents in it, and I’ll check if the seeded struct was untouched to see if I can read data from the file. Next time I will remember to call struct([]) instead of struct(). What a trap!

At the end of the day, while struct is powerful, but I rarely find AoS necessary to do what I wanted once table() is out. AoS has pretty much the same restrictions as in table() that you cannot put different types in the same field across the AoS, but table allows you to index with variables (struct’s field) or rows (struct array index) without changing the data structure (AoS <-> SoA). So unless it’s a performance critical piece of the code, I’ll stick with tables() for most of my struct() needs.

 

3,277 total views, 1 views today

MATLAB Techniques: onCleanup() ‘destructor’

If your program opens a file, creates a timer(), or whatever resources that needs to be closed when they are no longer needed, before R2008a, you have to put your close resource calls at two places: one at the end of successful execution, the other at the exception handling in try…catch block:

FID = fopen('a.txt')
try
   // ... do something here
   fclose(FID);
catch
   fclose(FID);
end

Not only it’s messy that you have to duplicate your code, it’s also error prone when you add code in between. If your true intention is to close the resource whenever you exit the current program scope, there’s a better way for you: onCleanup() object. The code above can be simplified as:

FID = fopen('a.txt')
obj = onCleanup(@() fclose(FID));
// ... do something with FID here

The way onCleanup() works is that it creates an object which you define what its destructor (delete()) does on creation (by the constructor of course) by specifying a function handle. This way when ‘obj’ is cleared (either as it goes out of scope or your explicitly cleared it with ‘clear’), the destructor in ‘obj’ will be activated and do the cleanup actions you specified.

Due to copyright reasons, I won’t copy the simple code here. Just open onCleanup.m in MATLAB editor and you’ll see it that the code (excluding comments) has less words than the description above. Pretty neat!

Normally we use onCleanup() inside a function. The best place to put is is right after you opened a resource because anything in between can go wrong (i.e. might throw exceptions): you want ‘obj’ to be swept (i.e. its destructors called) when that happens.

Technically, you can make an onCleanup() object in the base (root) workspace (aka command window). The destructor will be triggered either when you clear the ‘obj’ explicitly using ‘clear’ or when you exit MATLAB. You can see for yourself with this:

obj = onCleanup(@() pause);

It kind of let you do a one-off cleanup on exit instead of a recurring cleanup in finish.m.

So the next time you open a resource that needs to be closed whether the program exits unexpectedly or not, use onCleanup()! It’s one of the elegant, smart uses of OOP.

 

1,181 total views, 1 views today

MATLAB Techniques: Self-identifying (by type) methods

We all know MATLAB by default fill numbers with 0 if we haven’t specified them (such as expanding a matrix by injecting values beyond the original matrix size). Cells are default filled with {[]} even if you meant to have cellstr()  {''} across the board. Sometimes it’s not what we wanted. 0 can be legitimate value, so we want NaN to denote undefined values. Same as cellstr(), we don’t want to choke the default string processing programs because one stupid {[]} turns the entire cell array into to a non-cellstr array.

For compatibility reasons (and it’s also hard to do so), it’s not a good idea to simply modify the factory behavior. I have something similar to table() objects that requires similar expansion on arbitrary data types, but MATLAB’s defaults proves to be clumsy.

Instead of writing switch-case statements or a bunch of if statements that relies on type information like this:

function x = makeUndefined(x)
  switch class(x)
    case {'double', 'single'}
      x = NaN(size(x));
    case 'cell'
      if( iscellstr(x) )
        x = repmat({''}, size(x));
      end
    % ...
  end

I found a slick way to do it so I don’t have to keep track of it again if I need the same defaults in other places: take advantage of the fact that MATLAB selectively will load the right method depending on the first input argument(s)*:

Create a commonly named method (e.g. makeUndefined()) for the PODs and put it under the POD’s @folder (e.g. /@double/makeUndefined.m, /@cell/makeUndefined.m). The functions look something like this:

function y = makeUndefined(x)
% This function must be put under /@double
  y = NaN(size(x));
function x = makeUndefined(x)
% This function must be put under /@cell
  if( iscellstr(x) )
    x = repmat({''}, size(x));
  end

Similarly, you can make your isundefined() implementation for @double, @cell, etc, just like the factory-shipped @categorical/isundefined() corresponding to the same rules you set for makeUndefined().

Actually, the switch-case approach is analogous to the common abuses of RTTI in C++: every time a new type is added, you have to open all the methods that depends on the type info and update them, instead of having the classes implement those methods (with proper object hierarchy and overloading).

[Scott Meyers, “Effective C++”] Anytime you find yourself writing code of the form “if the object is of type T1, then do something, but if it’s of type T2, then do something else,” slap yourself

MATLAB does not have proper polymorphism, but can call the right method based on the first argument (or the latter ones if they have a proper dominance relationship: mind you that most PODs don’t), but this approach is as close as it can get to proper OO design despite we are just talking about PODs here.


This technique is especially valuable when you and TMW (or other users) have different ideas of what an English word (e.g. empty, defined, numeric) means. Like do you consider boolean (logical) numeric? TMW says no through isnumeric().

To give you an example, I made a tool to nicely plot arbitrary features in my @table over time (the equivalent of @timetable before TMW introduced it). It only make sense if the associated dependent variable (y-axis) can be quantified, so what I meant is broader than isnumeric(): it’s isConvertibleToDouble() since I casted my dependent variables with double() in between.

Boolean (logical) and categorical variables have quantifiable levels, so double() can be applied to them, they should return TRUE for isConvertibleToDouble() despite isnumeric() returns FALSE. They have the same behavior for basic types like double(), single(), char(), cellstr(), struct(), etc.

In summary,

  1. You say what you really mean (by introducing nomenclature), NOT what it typically does
    – this is like creating another indirection like half(x) instead of directly writing x/2 or x>>1.
    – spend 90% of your time coming up with a very intuitive yet precise name. ‘Misspoke’ == Bug!
  2. The new data types self-manage through implementing methods used by your code.
    – assume nothing about input type other than the interfaces that are accessed through
    (the traditional approach knows exactly what inputs they’re going to see)
    – if you did #1 correctly, there’s no reason to foresee/prepare-for new input types
    (just implement the methods for the input data types that you want it to run for now)
    – no sweep (switch-otherwise) case to mishandle** unexpected new input data types
    (because it won’t run on an input data type until all called methods are implemented)
    – introducing new input data types won’t break the core code for existing types.
    (new input data types can only break themselves if they implemented the methods wrong)

 


* This is tricky business. MATLAB doesn’t have polymorphism, but will look into the FIRST dominant input argument and load the appropriate classes. Usually it’s the first argument, but for non-POD classes, you can specify the dominance relationship (Inferior classes). Actually little has been said about such relationship in PODs in the official documentation.

I called support and found that there’s no dominance relationship in PODs, so it’s pretty much the first argument. That means this trick does not work if you want to overload bsxfun() for say, nominal() objects (which doesn’t have a bsxfun() implementation) keeping the same input argument order because the first argument is a function handle for both the factory and the user method. Bummer!

This is why the new ‘*_fun‘ functions I write, I always put the object to operate on as the first argument whenever possible. Gets a little bit ugly when I want to support multiple inputs like cellfun(), so I have to weight whether it’s worth the confusion for the overloading capability.

** Unless you want to torture yourself by listing all recognized types and make sure the ‘switch-otherwise‘ case fails.

900 total views, 1 views today

Quote of the day: You can’t learn too much linear algebra

Benedict Gross, Professor of Mathematics at Harvard said in his E-222 lecture: ‘You can’t learn too much linear algebra’.

If you read my WordPress pages in Signals and Systems, I’m trying a new approach to explain the materials done under the classical approach in a much more compact way using basic ideas in linear algebra.

Linear algebra is fun and easy to learn (at intro levels), once you get used to the fact that it’s a carefully picked set of restrictions that most physical problems boils down to. In high school algebra, it’s disguised as ‘method of detached coefficients’ when you solve systems of simultaneous equations.

Once you model your problem into linear equations (the hardest part), you will see the equivalence (or mathematically equivalent analogs) of different problems ranging from economics to circuits and mechanics. The beauty of the ‘detached coefficients’ approach separates the context of the problem from its mathematical structure as the application specific variables are often grouped as a vector (or matrix) that you deal with as a unit. In fact, most problem boils down to this:

    \[ \bf{Ax = b} \]

It’s your job to construct \bf{A}, \bf{x} and \bf{b} and tell people what they mean.

I agree with Gilbert Strang, Professor of Mathematics at MIT that it’s time to make linear algebra the standard curriculum over calculus. Digital control used to be a very advanced graduate class, but after a few decades, it’s undergraduate material. Linear algebra has very few knowledge dependencies (hard pre-requisites), so it’s great material to teach at any level.

2,837 total views, 2 views today