begin() and end() is defined for arrays in C++11 and above

I was a little confused the first time I saw range-based for-loop in C++11 on “Tour of C++” that it works right out of the box for both array* and STL containers and yet the text says it requires begin() and end() to be defined for the object for range-based for-loop to run.

I later learned that despite typical STL usage example writes v.begin(), v.end(), the most bulletproof way is to write begin(v), end(v) instead (Herb Sutter recommends it). Then I started to suspect that C++11 must have defined free-form (non-member) begin(), end() functions that takes in arbitrary arrays. I pulled up my code editor and ran this:

#include <iostream>
int main()
{
    int v[4]={1,2,3,4};
    std::cout << *(std::crend(v)-1) << std::endl;

    return 0;
}

It compiled and ran uneventfully, printing ‘1’ as expected (I’m using crend(), to make sure they implemented the more obscure ones). It makes more sense now why range-based for-loop works for arbitrary arrays without making an exception to the begin(), end() requirement.

To confirm that it is the case (since “Tour of C++” didn’t say anything about why arbitrary array works for range-based for loop), I looked up the STL source code from LLVM, namely <iterator>, and saw this:

template <class T, size_t N> constexpr T* begin(T (&array)[N]);

Bingo! There’s a mechanism to do so. Now I’ll need to learn more about template auto deduction rules to figure out how the compiler recognizes the N and the syntax that distinguishes arrays from pointers. I’m sure they compiler treats them differently as this won’t compile:

int main()
{
    int v[4]={1,2,3,4};
    int* p = v;

    std::cout << *(std::crend(p)-1) << std::endl; // Cannot compile
    return 0;
}

The error message is: “no matching function for call to ‘crend(int*&)'”, so clearly the free-form template function is not taking pointers for arrays!

But before I close, Stephan T. Lavavej (Mr. STL) mentioned that the template quoted above is no longer required (by the standard) to implement range-based for-loop in C++11.

Now the conclusion becomes that begin(), end() that takes in arrays exist (which completes the logic behind range-based for-loop), but the range-based for loop can (and typically will) handle arrays without these templated functions defined.


* In order for the observations in this blog post to make sense, I have to be very picky about the difference between an array and a pointer: a pointer only contains a memory location, while an array already have memory allocated to hold the data.

The confusion comes from the fact that array names are always seen as pointers anywhere in C, but when an array name is referred in places that the scope happens know the allocated size, namely

  • Global arrays: everybody knows the size
  • Local arrays: only the instantiating function knows its size
    • regardless of storage duration, i.e. local static arrays behaves the same as stack arrays.
    • VLA in C99 are considered stack arrays

, the array name itself has a superpower that pointers lack: report the allocated data size (NOT pointer size) using sizeof(array).

Heap allocations always return a pointer, so the variable name is never an array name in the first place: there’s simply nowhere native to embed the allocated size info.

This is the entire logic needed to decide whether sizeof(x)/sizeof(x[0]) gives you the number of array elements, or how many times your machine pointer type is bigger than the element storage.

 

2 total views, 2 views today

Do not help the compiler at the expense of readability Unless you read the assembly code emitted at the bottleneck and did benchmarks

Compilers has gotten smarter and smarter nowadays that they’d be able to analyze our code for common patterns (or logically deduce away steps that doesn’t have to performed at runtime).

Matt Godbolt gave a nice presentation at CppCon 2017 named “What Has My Compiler Done for Me Lately?”. Through observing the emitted assembly code at different optimization levels, he showed that the compiler doesn’t need to be micromanaged (through performance hacks in our code) anymore, as it will emit instructions as the performance-hacked code intended when it is better to do so.

It means the compiler writers already know our bag of performance hack tricks way much better than we do. Their efforts spare us from premature optimization and leave us more time to find a better data structure or algorithm to solve the problem.

What I got from the lecture is NOT that we are free to write clumsy code and let the compiler sort it out (though it occasionally can, like factoring a loop doing simple arithmetic series into a one line closed form solution), but we don’t have to make difficult coding choices to accommodate performance anymore.

The most striking facts I learned from his lecture are

  • The compiler can emit a one-line CPU instruction that does not have a corresponding native operation C/C++ if your hardware architecture supports it. (e.g. clang can convert a whole loop that counts the number of set bits into just ‘popcnt eax, edi‘)
  • Through Link-Time Optimization (LTO), we don’t have to pay the performance penalty for language features that are ultimately necessary for the current compilation (e.g. virtuals are automatically dropped if the linker finds that nowhere in the output currently needs it)

With such LTO,  why not do away the virtual specifier and make everything unspecified virtual by default anyway (like Java)? For decades, we’ve been making up stories that some classes are not meant to be derived (like STL containers), but the underlying motive is that we don’t want to pay for vtable if we don’t have to.

Instead of confusing new programmers about when should they make a method virtual (plenty of rule-of-thumbs became dogma), focus on telling them whenever they (choose to upcast a reference/pointer to the parent anywhere in their code and) invoke the destructor through the parent reference/pointer, they will pay a ‘hefty’ price of vtable and vptr.

I don’t think anybody (old codebase) will get harmed by turning on virtuals by default and let the linker decide if those virtuals can be dropped. If it changes anything, it might turn buggy code with the wrong destructor called into correct code which runs slower and takes up more space. In terms of correctness, this change might break low-level hacks that expects the objects to be of certain size (e.g. alignment) without vptr.

Even better, add a class specifier that mandates that all uses of its child must not invoke vtable (have the compiler catch that) unless explicitly overridden (the users decide to pay for the vtable). This way the compiler can warn about performance and space issues for the migration.

The old C++’s ideal was “you only pay for the language features you used (written)”, but as compilers gets better, we might be able change it to “you pay extra only for the language features that are actually used (in the finally generated executable) with your permission”.


I’d also like to add Return Value Optimization (RVO) into my list of compiler advances that changes the way we code. C++11 added move semantics, but I think it’s something that the compiler in the future could be able to manage themselves. Even with an old C++ compiler like the one shipped with VisualDSP 5.0, the copy constructor was not called (yes, skipping it legal even if the copy constructor has side effects) when I do this:

Matrix operator+(const Matrix& a, const Matrix& b)
{
  Matrix c(a.dim);
  // ... for all element i, c.raw[i] = a.raw[i]+b.raw[i]
  return c;
}
Matrix c = a + b;

Actually, the compiler at that time was not that smart about RVO, the actual code I wrote originally had two return branches, which defeats RVO (it’s a defined behavior by the specs):

Matrix operator+(Matrix a, Matrix b)
{
  Dims m = a.dims;
  if( m == b.dims ) // Both inputs must have same dimensions
  {
    Matrix c(m); // Construct matrix c with same dimension as a
    // ... for all i, c.raw[i] = a.raw[i] + b.raw[i]
    return c;
  } 
  else 
  {
    return Matrix::dummy; // A static member, which is a Matrix object
  }
}

To take advantage of RVO, I had to reword my code

Matrix operator+(Matrix a, Matrix b)
{
  Dims m = a.dims;
  if( m == b.dims ) // Both inputs must have same dimensions
  {
    Matrix c(m); // Construct matrix c with same dimension as a
    // ... for all i, c.raw[i] = a.raw[i] + b.raw[i]
  } 
  else 
  {
    Matrix c = Matrix::dummy; // or just "Matrix c";
  }
  return c;
}

I think days are counting before C++ compilers can do “copy-on-write” like MATLAB does if independent compilation are no longer mandatory!

Given my extensive experience with MATLAB, I’d say it took me a while to get used designing my code with “copy-on write” behavior in mind. Always start with expressive, maintainable, readable and correct code keeping in mind the performance concerns only happens under certain conditions (i.e. passed object gets modified inside the function).

If people start embracing the mentality of letting the compiler do most of the mechanical optimization, we’ll move towards a world that debugging work are gradually displaced by performance-bottleneck hunting. In my view, anything that can be done systematically by a programming (like a boilerplate code or idioms) can eventually be automated by better compiler/linker/IDE and language design. It’s the high-level business logic that needs software designers/engineers to translate fuzzy requirements into concrete steps.


Matt also developed a great website (http://godbolt.org/) that compiles your code on the repeatedly on the fly and shows you the corresponding assembly code. Here’s an example of how I use it to answer my question of “Should I bother to use std::div() if I want both the quotient and remainder without running the division twice?”:

The website also included a feature to share the pasted code through an URL.

As seen from the emitted assembly code, the compiler can figure out that I’m repeating the division twice and do only one division and use the quotient (stored in eax) and remainder (stored in edx). Trying to enforce one division through std::div() requires an extra function call, which is strictly worse.

The bottom line: don’t help the compiler! Modern compiler does context free optimizations better than we do. Use the time and energy to rethink about the architecture and data structure instead!

13 total views, 3 views today

C Traps and Pitfalls

Here’s a concise paper describing common C programming pitfalls by Andrew Koening (www.literateprogramming.com/ctraps.pdf) corresponding to be book with the same title. 

As a reminder to myself, I’ll spend this page summarizing common mistakes and my history with it.

Here are the mistakes that I don’t make because of certain programming habits:

  • Operator precedence: I use enough parenthesis to not rely on operator precedence
  • Pointer dereferencing: I always do *(p++) instead of *p++ unless it’s idiomatic.
  • for() or if() statements executing only first line: I always surround the block with {} even if it’s just one line. Too often we need to inject an extra line and without {} it becomes a trap.
  • Undefined side effect order: I never do something like y[i++]=x[i]
  • char* p, q: very tempting since C++ style emphasize on pointer as a type over whether the variable is a pointer. I almost never declare multiple variables in one line.
  • Macro repeating side effects: use inline functions instead whenever possible. Use templates in C++.
  • Unexpected macro associations: guard expressions with (). Use typedef.

Did these once before, adjusted my programming habits to avoid it:

  • counting down in for() loop with unsigned running variable: I stick with signed running variables in general. If I’m forced to use unsigned, I’ll remind myself that I can only stop AFTER hitting 1, but not 0 (i.e. i=0 never got executed). 

Haven’t got a chance to run into these, but I’ll program defensively:

  • Integer overflow: do a<b instead of (a-b)<0. Calculate mean by adding halfway length to the smaller number (i.e. (a+b)/2 == a + (b-a)/2 given a<b). Shows up in binary search.
  • Number of digits to shift is always unsigned (i.e. -1 is a big number!)

What I learned from the paper:

  • stdio buffer on stack (registed with setbuf()) freed before I/O flushed: use static buffer (or just make sure the buffer lives outside the function call).
  • char type might be signed (128 to 255 are -128 to -1) so it sign extends during upcast. Use unsigned char go guarantee zero extend for upcasting.
  • toupper()/tolower() might be implemented as a simple macro (no checks, incorrect /w side effects)
  • Can index into a string literal: “abcdefg”[3] gives ‘d’

Mistakes that I usually make when I switch back from full-time MATLAB programming:

  • Logical negation using ~ operator instead of ! operator.

Common mistakes I rarely make because of certain understanding:

  • Forgetting to break at every case in switch block. It’s hard to forget once you’re aware of the Duff’s device.
  • sizeof(a[])/sizeof(a[0]) after passing into a function does not give array length: hard to get it wrong once you understand that array (declared on stack) has meta-info that cannot be accessed beyond the stack level it’s initialized. 

59 total views, no views today