C++ annoyances (and reliefs): operator[] in STL map-based containers

I recently watched Louis Brandy’s CppCon presentation “Curiously Recurring C++ Bugs at Facebook” on youtube.

For bug#2, which is a well-known trap for STL map-based containers, operator[] will insert the requested key (associated with a default-constructed value) if it is not found. 

He mentioned a few workarounds and their disadvantages, like

  • use at() method: requires exception handling
  • const protect: noobs try to defeat that, transferred to non-const (stripped)
  • ban operator[] calls: makes the code ugly

but would like to see something neater. In bug#3, he added that a very common usage is to return a default when the key is not found. The normal approach requires returning a copy of the default (expensive if it’s large), which tempts noobs to return a local reference (to destroyed temporary variables: guaranteed bug).


Considering how much productivity drain a clumsy interface can cause, I think it’s worth spending a few hours of my time approaching it, since I might need to use STL map-based containers myself someday.

Here’s my thought process for the design choices:

  • Retain the complete STL interface to minimize user code/documentation changes
  • Endow a STL map-based container with a default_value (common use case), so that the new operator[] can return a reference without worrying about temporaries getting destroyed.
  • Give users a easy read-only access interface (make intentions clear with little typing)

The code (with detailed comment about design decisions and test cases) can be downloaded here: MapWithDefault. For the experienced, here’s the meat:

#include <unordered_map>
#include <map>

#include <utility>  // std::forward

// Legend (for extremely simple generic functions)
// ===============================================
// K: key
// V: value
// C: container
// B: base (class)
template <typename K, typename V, template <typename ... Args> class C = std::map, typename B = C<K,V> >
class MapWithDefault : private B 
{
public:
    // Make default_value mandatory. Everything else follows the requested STL container
    template<typename... Args>
    MapWithDefault(V default_value, Args&& ... args) : B(std::forward<Args>(args)...), default_value(default_value) {};

public:
    using B::operator=;
    using B::get_allocator;

    using B::at;

    using B::operator[];

    // Read-only map (const object) uses only read-only operator[]
    const V& operator[](const K& key) const
    {
        auto it = this->find(key);
        return (it==this->end()) ? default_value : it->second;
    }

    using B::begin;
    using B::cbegin;
    using B::end;
    using B::cend;
    using B::rbegin;
    using B::crbegin;
    using B::rend;
    using B::crend;

    using B::empty;
    using B::size;
    using B::max_size;

    using B::clear;
    using B::insert;
    // using B::insert_or_assign;   // C++17
    using B::emplace;
    using B::emplace_hint;
    using B::erase;
    using B::swap;

    using B::count;
    using B::find;
    using B::equal_range;
    using B::lower_bound;
    using B::upper_bound;

public:
    const               V default_value;
    const MapWithDefault& read_only = static_cast<MapWithDefault&>(*this);
};

Note that this is private inheritance (can go without virtual destructors since STL doesn’t have it). I have not exposed all the private members and methods back to public with the ‘using’ keyword yet, but you get the idea.


This is how I normally want the extended container to be used:

int main()
{
    MapWithDefault<string, int> m(17);  // Endowed with default of 17
    cout << "pull rabbit from m.read_only:  " << m.read_only["rabbit"] << endl;   // Should read 17

    // Demonstrates commonly unwanted behavior of inserting requested key when not found
    cout << "pull rabbit from m:            " << m["rabbit"] << endl; // Should read 0 because the key was inserted (not default anymore)

    // Won't compile: demonstrate that it's read only
    // m.read_only["rabbit"] = 42;

    // Demonstrate writing
    m["rabbit"] = 42;

    // Confirms written value
    cout << "pull rabbit from m_read_only:  " << m.read_only["rabbit"] << endl;   // Should read 42
    cout << "pull rabbit from m:            " << m["rabbit"] << endl;             // Should read 42

    return 0;
}

Basically, for read-only operations, always operate directly on the chained ‘m.read_only‘ object reference: it will make sure the const protected version of the methods (including read-only operator[]) is called.


Please let me know if it’s a bad idea or there’s some details I’ve missed!

 

Loading

Super-simplified: Programming high performance code by considering cache

  • Code/data locality (compactness, % of each cache line that gets used)

  • Predictable access patterns: pre-fetch (instructions and data) friendly. This explains branching costs, why linear transversal might be faster than trees at smaller scales because of pointer chasing, why bubble sort is the fastest if the chunks fit in the cache.

  • Avoid false sharing: shared cache line unnecessarily with other threads/cores (due to how the data is packed) might have cache invalidating each other when anyone writes.

Loading

C Traps and Pitfalls

Here’s a concise paper describing common C programming pitfalls by Andrew Koening (www.literateprogramming.com/ctraps.pdf) corresponding to be book with the same title.

As a reminder to myself, I’ll spend this page summarizing common mistakes and my history with it.

Here are the mistakes that I don’t make because of certain programming habits:

  • Operator precedence: I use enough parenthesis to not rely on operator precedence
  • Pointer dereferencing: I always do *(p++) instead of *p++ unless it’s idiomatic.
  • for() or if() statements executing only first line: I always surround the block with {} even if it’s just one line. Too often we need to inject an extra line and without {} it becomes a trap.
  • Undefined side effect order: I never do something like y[i++]=x[i]
  • char* p, q: very tempting since C++ style emphasize on pointer as a type over whether the variable is a pointer. I almost never declare multiple variables in one line.
  • Macro repeating side effects: use inline functions instead whenever possible. Use templates in C++.
  • Unexpected macro associations: guard expressions with (). Use typedef.

Did these once before, adjusted my programming habits to avoid it:

  • counting down in for() loop with unsigned running variable: I stick with signed running variables in general. If I’m forced to use unsigned, I’ll remind myself that I can only stop AFTER hitting 1, but not 0 (i.e. i=0 never got executed).

Haven’t got a chance to run into these, but I’ll program defensively:

  • Integer overflow: do a<b instead of (a-b)<0. Calculate mean by adding halfway length to the smaller number (i.e. (a+b)/2 == a + (b-a)/2 given a<b). Shows up in binary search.
  • Number of bits to shift is always unsigned (i.e. -1 is a big number!)

What I learned from the paper:

  • stdio buffer on stack (registered with setbuf()) freed before I/O flushed: use static buffer (or just make sure the buffer lives outside the function call).
  • char type might be signed (128 to 255 are -128 to -1) so it sign extends during upcast. Use unsigned char go guarantee zero extend for upcasting.
  • toupper()/tolower() might be implemented as a simple macro (no checks, incorrect /w side effects)
  • Can index into a string literal: "abcdefg"[3] gives 'd'

Mistakes that I usually make when I switch back from full-time MATLAB programming:

  • Logical negation using ~ operator instead of ! operator.

Common mistakes I rarely make because of certain understanding:

  • Forgetting to break at every case in switch block. It’s hard to forget once you’re aware of the Duff’s device.
  • sizeof(a[])/sizeof(a[0]) after passing into a function does not give array length: hard to get it wrong once you understand that array (declared on stack) has meta-info that cannot be accessed beyond the stack level it’s initialized.

Loading

Switch between 32-bit and 64-bit user written software like CVX

CVX is a very convenient convex optimization package that allows the user to specify the optimization objective and constraints directly instead of manually manipulating them (by various transformations) into forms that are accepted by commonly available software like quadprog().

What I want to show today is not CVX, but a technique to handle the many different versions of the same program targeted at each system architecture (32/64-bit, Windows/Mac/Linux). Here’s a snapshot of what’s available with cvx:

OS 32/64 mexext Download links
Linux 32-bit mexglx cvx-glx.zip
64-bit mexa64 cvx-a64.zip
Mac 32-bit mexmaci cvx-maci.zip
64-bit mexmaci64 cvx-maci64.zip
Windows 32-bit mexw32 cvx-w32.zip
64-bit mexw64 cvx-w64.zip

You can download all packages for different architectures, but make a folder for each of them by their mexext() name. For example, 32-bit Windows’ implementation can go under /mexw32/cvx. Then you can programmatically initialize the right package for say, your startup.m file:

run( fullfile(yourLibrary, mexext(), 'cvx', 'cvx_startup.m') );

I intentionally put the /[mexext()] above /cvx, not the other way round because if you have many different software packages and want to include them in the path, you can do it in one shot without filtering for the platform names:

addpath( genpath( fullfile(yourLibrary, mexext()) ) );

You can consider using computer(‘arch’) in place of mexext(), but the names are different and you have to name your folders accordingly. For CVX, it happens to go by mexext(), so I naturally used mexext() instead.

Loading