Do not help the compiler at the expense of readability Unless you read the assembly code emitted at the bottleneck and did benchmarks

Compilers has gotten smarter and smarter nowadays that they’d be able to analyze our code for common patterns (or logically deduce away steps that doesn’t have to be  performed at runtime).

Matt Godbolt gave a nice presentation at CppCon 2017 named “What Has My Compiler Done for Me Lately?”. Through observing the emitted assembly code at different optimization levels, he showed that the compiler doesn’t need to be micromanaged (through performance hacks in our code) anymore, as it will emit instructions as the performance-hacked code intended when it is better to do so.

It means the compiler writers already know our bag of performance hack tricks way better than we do. Their efforts spared us from premature optimization and leave us more time to find a better data structure or algorithm to solve the problem.

What I got from the lecture is NOT that we are free to write clumsy code and let the compiler sort it out (though it occasionally can, like factoring a loop doing simple arithmetic series into a one line closed form solution), but we don’t have to make difficult coding choices to accommodate performance anymore.

The most striking facts I learned from his lecture are

  • The compiler can emit a one-line CPU instruction that does not have a corresponding native operation C/C++ if your hardware architecture supports it. (e.g. clang can convert a whole loop that counts the number of set bits into just ‘popcnt eax, edi‘)
  • Through Link-Time Optimization (LTO), we don’t have to pay the performance penalty for language features that are ultimately necessary for the current compilation (e.g. virtuals are automatically dropped if the linker finds that nowhere in the output currently needs it)

With such LTO,  why not do away the virtual specifier and make everything unspecified virtual by default anyway (like Java)? For decades, we’ve been making up stories that some classes are not meant to be derived (like STL containers), but the underlying motive is that we don’t want to pay for vtable if we don’t have to.

Instead of confusing new programmers about when should they make a method virtual (plenty of rule-of-thumbs became dogma), focus on telling them whenever they (choose to upcast a reference/pointer to the parent anywhere in their code and) invoke the destructor through the parent reference/pointer, they will pay a ‘hefty’ price of vtable and vptr.

I don’t think anybody (old codebase) will get harmed by turning on virtuals by default and let the linker decide if those virtuals can be dropped. If it changes anything, it might turn buggy code with the wrong destructor called into correct code which runs slower and takes up more space. In terms of correctness, this change might break low-level hacks that expects the objects to be of certain size (e.g. alignment) without vptr.

Even better, add a class specifier that mandates that all uses of its child must not invoke vtable (have the compiler catch that) unless explicitly overridden (the users decide to pay for the vtable). This way the compiler can warn about performance and space issues for the migration.

The old C++’s ideal was “you only pay for the language features you used (written)”, but as compilers gets better, we might be able change it to “you pay extra only for the language features that are actually used (in the finally generated executable) with your permission”.


I’d also like to add Return Value Optimization (RVO) into my list of compiler advances that changes the way we code. C++11 added move semantics, but I think it’s something that the compiler in the future could be able to manage themselves. Even with an old C++ compiler like the one shipped with VisualDSP 5.0, the copy constructor was not called (yes, skipping it is legal even if the copy constructor has side effects) when I do this:

Matrix operator+(const Matrix& a, const Matrix& b)
{
  Matrix c(a.dim);
  // ... for all element i, c.raw[i] = a.raw[i]+b.raw[i]
  return c;
}
Matrix c = a + b;

Actually, the compiler at that time was not that smart about RVO, the actual code I wrote originally had two return branches, which defeats RVO (it’s a defined behavior by the specs):

Matrix operator+(Matrix a, Matrix b)
{
  Dims m = a.dims;
  if( m == b.dims ) // Both inputs must have same dimensions
  {
    Matrix c(m); // Construct matrix c with same dimension as a
    // ... for all i, c.raw[i] = a.raw[i] + b.raw[i]
    return c;
  } 
  else 
  {
    return Matrix::dummy; // A static member, which is a Matrix object
  }
}

To take advantage of RVO, I had to reword my code

Matrix operator+(Matrix a, Matrix b)
{
  Dims m = a.dims;
  if( m == b.dims ) // Both inputs must have same dimensions
  {
    Matrix c(m); // Construct matrix c with same dimension as a
    // ... for all i, c.raw[i] = a.raw[i] + b.raw[i]
  } 
  else 
  {
    Matrix c = Matrix::dummy; // or just "Matrix c";
  }
  return c;
}

I think days are counting before C++ compilers can do “copy-on-write” like MATLAB does if independent compilation are no longer mandatory!

Given my extensive experience with MATLAB, I’d say it took me a while to get used designing my code with “copy-on write” behavior in mind. Always start with expressive, maintainable, readable and correct code keeping in mind the performance concerns only happens under certain conditions (i.e. passed object gets modified inside the function).

If people start embracing the mentality of letting the compiler do most of the mechanical optimization, we’ll move towards a world that debugging work are gradually displaced by performance-bottleneck hunting. In my view, anything that can be done systematically by programming (like a boilerplate code or idioms) can eventually be automated by better compiler/linker/IDE and language design. It’s the high-level business logic that needs a lot of software designers/engineers to translate fuzzy requirements into concrete steps.


Matt also developed a great website (http://godbolt.org/) that compiles your code repeatedly on the fly and shows you the corresponding assembly code. Here’s an example of how I use it to answer my question of “Should I bother to use std::div() if I want both the quotient and remainder without running the division twice?”:

The website also included a feature to share the pasted code through an URL.

As seen from the emitted assembly code, the answer is NO. The compiler can figure out that I’m repeating the division twice and do only one division and use the quotient (stored in eax) and remainder (stored in edx). Trying to enforce one division through std::div() requires an extra function call, which is strictly worse.

The bottom line: don’t help the compiler! Modern compiler does context free optimizations better than we do. Use the time and energy to rethink about the architecture and data structure instead!

Loading

C++ annoyances (and reliefs): operator[] in STL map-based containers

I recently watched Louis Brandy’s CppCon presentation “Curiously Recurring C++ Bugs at Facebook” on youtube.

For bug#2, which is a well-known trap for STL map-based containers, operator[] will insert the requested key (associated with a default-constructed value) if it is not found. 

He mentioned a few workarounds and their disadvantages, like

  • use at() method: requires exception handling
  • const protect: noobs try to defeat that, transferred to non-const (stripped)
  • ban operator[] calls: makes the code ugly

but would like to see something neater. In bug#3, he added that a very common usage is to return a default when the key is not found. The normal approach requires returning a copy of the default (expensive if it’s large), which tempts noobs to return a local reference (to destroyed temporary variables: guaranteed bug).


Considering how much productivity drain a clumsy interface can cause, I think it’s worth spending a few hours of my time approaching it, since I might need to use STL map-based containers myself someday.

Here’s my thought process for the design choices:

  • Retain the complete STL interface to minimize user code/documentation changes
  • Endow a STL map-based container with a default_value (common use case), so that the new operator[] can return a reference without worrying about temporaries getting destroyed.
  • Give users a easy read-only access interface (make intentions clear with little typing)

The code (with detailed comment about design decisions and test cases) can be downloaded here: MapWithDefault. For the experienced, here’s the meat:

#include <unordered_map>
#include <map>

#include <utility>  // std::forward

// Legend (for extremely simple generic functions)
// ===============================================
// K: key
// V: value
// C: container
// B: base (class)
template <typename K, typename V, template <typename ... Args> class C = std::map, typename B = C<K,V> >
class MapWithDefault : private B 
{
public:
    // Make default_value mandatory. Everything else follows the requested STL container
    template<typename... Args>
    MapWithDefault(V default_value, Args&& ... args) : B(std::forward<Args>(args)...), default_value(default_value) {};

public:
    using B::operator=;
    using B::get_allocator;

    using B::at;

    using B::operator[];

    // Read-only map (const object) uses only read-only operator[]
    const V& operator[](const K& key) const
    {
        auto it = this->find(key);
        return (it==this->end()) ? default_value : it->second;
    }

    using B::begin;
    using B::cbegin;
    using B::end;
    using B::cend;
    using B::rbegin;
    using B::crbegin;
    using B::rend;
    using B::crend;

    using B::empty;
    using B::size;
    using B::max_size;

    using B::clear;
    using B::insert;
    // using B::insert_or_assign;   // C++17
    using B::emplace;
    using B::emplace_hint;
    using B::erase;
    using B::swap;

    using B::count;
    using B::find;
    using B::equal_range;
    using B::lower_bound;
    using B::upper_bound;

public:
    const               V default_value;
    const MapWithDefault& read_only = static_cast<MapWithDefault&>(*this);
};

Note that this is private inheritance (can go without virtual destructors since STL doesn’t have it). I have not exposed all the private members and methods back to public with the ‘using’ keyword yet, but you get the idea.


This is how I normally want the extended container to be used:

int main()
{
    MapWithDefault<string, int> m(17);  // Endowed with default of 17
    cout << "pull rabbit from m.read_only:  " << m.read_only["rabbit"] << endl;   // Should read 17

    // Demonstrates commonly unwanted behavior of inserting requested key when not found
    cout << "pull rabbit from m:            " << m["rabbit"] << endl; // Should read 0 because the key was inserted (not default anymore)

    // Won't compile: demonstrate that it's read only
    // m.read_only["rabbit"] = 42;

    // Demonstrate writing
    m["rabbit"] = 42;

    // Confirms written value
    cout << "pull rabbit from m_read_only:  " << m.read_only["rabbit"] << endl;   // Should read 42
    cout << "pull rabbit from m:            " << m["rabbit"] << endl;             // Should read 42

    return 0;
}

Basically, for read-only operations, always operate directly on the chained ‘m.read_only‘ object reference: it will make sure the const protected version of the methods (including read-only operator[]) is called.


Please let me know if it’s a bad idea or there’s some details I’ve missed!

 

Loading

MATLAB Techniques: Self-identifying (by type) methods

We all know MATLAB by default fill numbers with 0 if we haven’t specified them (such as expanding a matrix by injecting values beyond the original matrix size). Cells are default filled with {[]} even if you meant to have cellstr()  {''} across the board. Sometimes it’s not what we wanted. 0 can be legitimate value, so we want NaN to denote undefined values. Same as cellstr(), we don’t want to choke the default string processing programs because one stupid {[]} turns the entire cell array into to a non-cellstr array.

For compatibility reasons (and it’s also hard to do so), it’s not a good idea to simply modify the factory behavior. I have something similar to table() objects that requires similar expansion on arbitrary data types, but MATLAB’s defaults proves to be clumsy.

Instead of writing switch-case statements or a bunch of if statements that relies on type information like this:

function x = makeUndefined(x)
  switch class(x)
    case {'double', 'single'}
      x = NaN(size(x));
    case 'cell'
      if( iscellstr(x) )
        x = repmat({''}, size(x));
      end
    % ...
  end

I found a slick way to do it so I don’t have to keep track of it again if I need the same defaults in other places: take advantage of the fact that MATLAB selectively will load the right method depending on the first input argument(s)*:

Create a commonly named method (e.g. makeUndefined()) for the PODs and put it under the POD’s @folder (e.g. /@double/makeUndefined.m, /@cell/makeUndefined.m). The functions look something like this:

function y = makeUndefined(x)
% This function must be put under /@double
  y = NaN(size(x));
function x = makeUndefined(x)
% This function must be put under /@cell
  if( iscellstr(x) )
    x = repmat({''}, size(x));
  end

Similarly, you can make your isundefined() implementation for @double, @cell, etc, just like the factory-shipped @categorical/isundefined() corresponding to the same rules you set for makeUndefined().

Actually, the switch-case approach is analogous to the common abuses of RTTI in C++: every time a new type is added, you have to open all the methods that depends on the type info and update them, instead of having the classes implement those methods (with proper object hierarchy and overloading).

[Scott Meyers, “Effective C++”] Anytime you find yourself writing code of the form “if the object is of type T1, then do something, but if it’s of type T2, then do something else,” slap yourself


This technique is especially valuable when you and TMW (or other users) have different ideas of what an English word (e.g. empty, defined, numeric) means. Like do you consider boolean (logical) numeric? TMW says no through isnumeric().

To give you an example, I made a tool to nicely plot arbitrary features in my @table over time (the equivalent of @timetable before TMW introduced it). It only make sense if the associated dependent variable (y-axis) can be quantified, so what I meant is broader than isnumeric(): it’s isConvertibleToDouble() since I casted my dependent variables with double() in between.

Boolean (logical) and categorical variables have quantifiable levels, so double() can be applied to them, they should return TRUE for isConvertibleToDouble() despite isnumeric() returns FALSE. They have the same behavior for basic types like double(), single(), char(), cellstr(), struct(), etc.

In summary,

  1. You say what you really mean (by introducing nomenclature), NOT what it typically does
    – this is like creating another indirection like half(x) instead of directly writing x/2 or x>>1.
    – spend 90% of your time coming up with a very intuitive yet precise name. ‘Misspoke’ == Bug!
  2. The new data types self-manage through implementing methods used by your code.
    – assume nothing about input type other than the interfaces that are accessed through
    (the traditional approach knows exactly what inputs they’re going to see)
    – if you did #1 correctly, there’s no reason to foresee/prepare-for new input types
    (just implement the methods for the input data types that you want it to run for now)
    – no sweep (switch-otherwise) case to mishandle** unexpected new input data types
    (because it won’t run on an input data type until all called methods are implemented)
    – introducing new input data types won’t break the core code for existing types.
    (new input data types can only break themselves if they implemented the methods wrong)

* This is tricky business. MATLAB doesn’t have function overloading (by signature), but will look into the type/class of FIRST dominant input argument and load the appropriate classes. Usually it’s the first argument, but for non-POD classes, you can specify the dominance relationship (Inferior classes). Actually little has been said about such relationship in PODs in the official documentation.

I called support and found that there’s no dominance relationship in PODs, so it’s pretty much the first argument. That means this trick does not work if you want to overload bsxfun() for say, nominal() objects (which doesn’t have a bsxfun() implementation) keeping the same input argument order because the first argument is a function handle for both the factory and the user method. Bummer!

This is why the new ‘*_fun‘ functions I write, I always put the object to operate on as the first argument whenever possible. Gets a little bit ugly when I want to support multiple inputs like cellfun(), so I have to weight whether it’s worth the confusion for the overloading capability.

** Unless you want to torture yourself by listing all recognized types and make sure the ‘switch-otherwise‘ case fails.

Loading