C Traps and Pitfalls

Here’s a concise paper describing common C programming pitfalls by Andrew Koening (www.literateprogramming.com/ctraps.pdf) corresponding to be book with the same title. 

As a reminder to myself, I’ll spend this page summarizing common mistakes and my history with it.

Here are the mistakes that I don’t make because of certain programming habits:

  • Operator precedence: I use enough parenthesis to not rely on operator precedence
  • Pointer dereferencing: I always do *(p++) instead of *p++ unless it’s idiomatic.
  • for() or if() statements executing only first line: I always surround the block with {} even if it’s just one line. Too often we need to inject an extra line and without {} it becomes a trap.
  • Undefined side effect order: I never do something like y[i++]=x[i]
  • char* p, q: very tempting since C++ style emphasize on pointer as a type over whether the variable is a pointer. I almost never declare multiple variables in one line.
  • Macro repeating side effects: use inline functions instead whenever possible. Use templates in C++.
  • Unexpected macro associations: guard expressions with (). Use typedef.

Did these once before, adjusted my programming habits to avoid it:

  • counting down in for() loop with unsigned running variable: I stick with signed running variables in general. If I’m forced to use unsigned, I’ll remind myself that I can only stop AFTER hitting 1, but not 0 (i.e. i=0 never got executed). 

Haven’t got a chance to run into these, but I’ll program defensively:

  • Integer overflow: do a<b instead of (a-b)<0. Calculate mean by adding halfway length to the smaller number (i.e. (a+b)/2 == a + (b-a)/2 given a<b). Shows up in binary search.
  • Number of digits to shift is always unsigned (i.e. -1 is a big number!)

What I learned from the paper:

  • stdio buffer on stack (registed with setbuf()) freed before I/O flushed: use static buffer (or just make sure the buffer lives outside the function call).
  • char type might be signed (128 to 255 are -128 to -1) so it sign extends during upcast. Use unsigned char go guarantee zero extend for upcasting.
  • toupper()/tolower() might be implemented as a simple macro (no checks, incorrect /w side effects)
  • Can index into a string literal: “abcdefg”[3] gives ‘d’

Mistakes that I usually make when I switch back from full-time MATLAB programming:

  • Logical negation using ~ operator instead of ! operator.

Common mistakes I rarely make because of certain understanding:

  • Forgetting to break at every case in switch block. It’s hard to forget once you’re aware of the Duff’s device.
  • sizeof(a[])/sizeof(a[0]) after passing into a function does not give array length: hard to get it wrong once you understand that array (declared on stack) has meta-info that cannot be accessed beyond the stack level it’s initialized. 

43 total views, no views today

MATLAB Techniques: Self-identifying (by type) methods

We all know MATLAB by default fill numbers with 0 if we haven’t specified them (such as expanding a matrix by injecting values beyond the original matrix size). Cells are default filled with {[]} even if you meant to have cellstr() {”} across the board. Sometimes it’s not what we want. 0 can be legitimate value, so we want NaN to denote undefined values. Same as cellstr(), we don’t want to choke the default string processing programs because one stupid {[]} turns the entire cell array into to a non-cellstr array.

For compatibility reasons (and it’s also hard to do so), it’s not a good idea to simply modify the factory behavior. I have something similar to table() objects that requires similar expansion on arbitrary data types, but MATLAB’s defaults proves to be clumsy.

Instead of writing swtich()-case() statements or a bunch of if() statements that relies on type information like this:

function x = makeUndefined(x)
  switch class(x)
    case {'double', 'single'}
      x = NaN(size(x));
    case 'cell'
      if( iscellstr(x) )
        x = repmat({''}, size(x));
      end
    % ...
  end

I found a slick way to do it so I don’t have to keep track of it again if I need the same defaults in other places: take advantage of the fact that MATLAB selectively will load the right method depending on the first input argument(s)*:

Create a commonly named method (e.g. makeUndefined()) for the PODs and put it under the POD’s @folder (e.g. /@double/makeUndefined.m, /@cell/makeUndefined.m). The functions look something like this:

function y = makeUndefined(x)
% This function must be put under /@double
  y = NaN(size(x));
function x = makeUndefined(x)
% This function must be put under /@cell
  if( iscellstr(x) )
    x = repmat({''}, size(x));
  end

Similarly, you can make your isundefined() implementation for @double, @cell, etc, just like the factory-shipped @categorical/isundefined() corresponding to the same rules you set for makeUndefined().

Actually, the switch-case approach is analogous to the common abuses of RTTI in C++: every time a new type is added, you have to open all the methods that depends on the type info and update them, instead of having the classes implement those methods (with proper object hierarchy and overloading).

MATLAB does not have proper polymorphism, but can call the right method based on the first argument (or the latter ones if they have a proper dominance relationship: mind you that most PODs don’t), but this approach is as close as it can get to proper OO design despite we are just talking about PODs here.


* This is tricky business. MATLAB doesn’t have polymorphism, but will look into the FIRST dominant input argument and load the appropriate classes. Usually it’s the first argument, but for non-POD classes, you can specify the dominance relationship (Inferior classes). Actually little has been said about such relationship in PODs in the official documentation.

I called support and found that there’s no dominance relationship in PODs, so it’s pretty much the first argument. That means this trick does not work if you want to overload bsxfun() for say, nominal() objects (which doesn’t have a bsxfun() implementation) keeping the same input argument order because the first argument is a function handle for both the factory and the user method. Bummer!

This is why the new ‘*_fun’ functions I write, I always put the object to operate on as the first argument whenever possible. Gets a little bit ugly when I want to support multiple inputs like cellfun(), so I have to weight whether it’s worth the confusion for the overloading capability.

 

121 total views, no views today