Understanding the difference between recognized arrays and pointers 'Recognized' means sizeof(array_name) gives the underlying allocated size

array≠ pointer:

pointer only contains a memory location,
while an array already have memory allocated to hold the data.


The confusion comes from the fact that array names are always seen as pointers anywhere in C, but when an array name is referred in places that the scope happens know the allocated size, namely

  • Global arrays: everybody knows the size
  • Local arrays: only the instantiating function knows its size.

, the array name itself has a superpower that pointers lack: report the underlying allocated data size (NOT pointer size) using sizeof(array).


Definition: An array is ‘recognized‘ if the array name is used in the scope that knows the underlying data size.

Corollary: Calling the array name with sizeof() gives the underlying allocated data size.

Examples of consequences that can be derived from the definition above:

  • Heap allocations always return a pointer type, NOT an array name!
    So heap arrays are never recognized.
  • VLA in C99 are considered local stack arrays, so it’s recognized
  • x[] is just a cosmetic shorthand for *x: it doesn’t prevent any recognized array from decaying into a pointer across boundary.
  • The storage duration (static or not) does not matter. e.g.
    • Heap pointers at global level are not recognized arrays
    • Static local array still loses the recognition across function boundaries
      (unless passed carefully by data type T (&array)[N]).

Most often recognized arrays cannot be aliased without decaying into a pointer. However, we can bind a recognized array to a reference to an array, which is a completely different type. Example:

int v[]{1,2,3,4};
int (&w)[4]=v;  // w is a reference to an array of size 4

int* p = v;     // Decays v to a pointer. Size information lost.
// int &w[4]=v; // Does not compile: this means an array of 4 references.

Note that the syntax requires a bracket for reference name. Omitting it will lead the compiler to misinterpret it as an array of references, which cannot* be compiled.

This means contrary to common beliefs, you can pass a recognized array across functions through reference, but this is rarely done because of the hassle of explicitly entering the number of elements (4 for the example above) as part of the data type. This can still be done through templates/constexpr, but for such inconvenience, we’re better off using std::vector (or std::array if you want near zero overhead).

However, so far I haven’t found a way to re-recognize an array from a pointer. That means there is no way to keep a local array’s recognition across function boundaries in C since it does not have references like C++.


To summarize with a usage example: this post has described the entire logic needed to decide whether sizeof(x)/sizeof(x[0]) gives you the number of array elements, or how many times your machine pointer type is bigger than the element storage.


* references must be bound on creation. Declaring an array of references means you want to bound references in batches. There are no mechanisms to do so as of C++14.

 

159 total views, no views today

Do not help the compiler at the expense of readability Unless you read the assembly code emitted at the bottleneck and did benchmarks

Compilers has gotten smarter and smarter nowadays that they’d be able to analyze our code for common patterns (or logically deduce away steps that doesn’t have to performed at runtime).

Matt Godbolt gave a nice presentation at CppCon 2017 named “What Has My Compiler Done for Me Lately?”. Through observing the emitted assembly code at different optimization levels, he showed that the compiler doesn’t need to be micromanaged (through performance hacks in our code) anymore, as it will emit instructions as the performance-hacked code intended when it is better to do so.

It means the compiler writers already know our bag of performance hack tricks way much better than we do. Their efforts spare us from premature optimization and leave us more time to find a better data structure or algorithm to solve the problem.

What I got from the lecture is NOT that we are free to write clumsy code and let the compiler sort it out (though it occasionally can, like factoring a loop doing simple arithmetic series into a one line closed form solution), but we don’t have to make difficult coding choices to accommodate performance anymore.

The most striking facts I learned from his lecture are

  • The compiler can emit a one-line CPU instruction that does not have a corresponding native operation C/C++ if your hardware architecture supports it. (e.g. clang can convert a whole loop that counts the number of set bits into just ‘popcnt eax, edi‘)
  • Through Link-Time Optimization (LTO), we don’t have to pay the performance penalty for language features that are ultimately necessary for the current compilation (e.g. virtuals are automatically dropped if the linker finds that nowhere in the output currently needs it)

With such LTO,  why not do away the virtual specifier and make everything unspecified virtual by default anyway (like Java)? For decades, we’ve been making up stories that some classes are not meant to be derived (like STL containers), but the underlying motive is that we don’t want to pay for vtable if we don’t have to.

Instead of confusing new programmers about when should they make a method virtual (plenty of rule-of-thumbs became dogma), focus on telling them whenever they (choose to upcast a reference/pointer to the parent anywhere in their code and) invoke the destructor through the parent reference/pointer, they will pay a ‘hefty’ price of vtable and vptr.

I don’t think anybody (old codebase) will get harmed by turning on virtuals by default and let the linker decide if those virtuals can be dropped. If it changes anything, it might turn buggy code with the wrong destructor called into correct code which runs slower and takes up more space. In terms of correctness, this change might break low-level hacks that expects the objects to be of certain size (e.g. alignment) without vptr.

Even better, add a class specifier that mandates that all uses of its child must not invoke vtable (have the compiler catch that) unless explicitly overridden (the users decide to pay for the vtable). This way the compiler can warn about performance and space issues for the migration.

The old C++’s ideal was “you only pay for the language features you used (written)”, but as compilers gets better, we might be able change it to “you pay extra only for the language features that are actually used (in the finally generated executable) with your permission”.


I’d also like to add Return Value Optimization (RVO) into my list of compiler advances that changes the way we code. C++11 added move semantics, but I think it’s something that the compiler in the future could be able to manage themselves. Even with an old C++ compiler like the one shipped with VisualDSP 5.0, the copy constructor was not called (yes, skipping it is legal even if the copy constructor has side effects) when I do this:

Matrix operator+(const Matrix& a, const Matrix& b)
{
  Matrix c(a.dim);
  // ... for all element i, c.raw[i] = a.raw[i]+b.raw[i]
  return c;
}
Matrix c = a + b;

Actually, the compiler at that time was not that smart about RVO, the actual code I wrote originally had two return branches, which defeats RVO (it’s a defined behavior by the specs):

Matrix operator+(Matrix a, Matrix b)
{
  Dims m = a.dims;
  if( m == b.dims ) // Both inputs must have same dimensions
  {
    Matrix c(m); // Construct matrix c with same dimension as a
    // ... for all i, c.raw[i] = a.raw[i] + b.raw[i]
    return c;
  } 
  else 
  {
    return Matrix::dummy; // A static member, which is a Matrix object
  }
}

To take advantage of RVO, I had to reword my code

Matrix operator+(Matrix a, Matrix b)
{
  Dims m = a.dims;
  if( m == b.dims ) // Both inputs must have same dimensions
  {
    Matrix c(m); // Construct matrix c with same dimension as a
    // ... for all i, c.raw[i] = a.raw[i] + b.raw[i]
  } 
  else 
  {
    Matrix c = Matrix::dummy; // or just "Matrix c";
  }
  return c;
}

I think days are counting before C++ compilers can do “copy-on-write” like MATLAB does if independent compilation are no longer mandatory!

Given my extensive experience with MATLAB, I’d say it took me a while to get used designing my code with “copy-on write” behavior in mind. Always start with expressive, maintainable, readable and correct code keeping in mind the performance concerns only happens under certain conditions (i.e. passed object gets modified inside the function).

If people start embracing the mentality of letting the compiler do most of the mechanical optimization, we’ll move towards a world that debugging work are gradually displaced by performance-bottleneck hunting. In my view, anything that can be done systematically by programming (like a boilerplate code or idioms) can eventually be automated by better compiler/linker/IDE and language design. It’s the high-level business logic that needs a lot of software designers/engineers to translate fuzzy requirements into concrete steps.


Matt also developed a great website (http://godbolt.org/) that compiles your code repeatedly on the fly and shows you the corresponding assembly code. Here’s an example of how I use it to answer my question of “Should I bother to use std::div() if I want both the quotient and remainder without running the division twice?”:

The website also included a feature to share the pasted code through an URL.

As seen from the emitted assembly code, the answer is NO. The compiler can figure out that I’m repeating the division twice and do only one division and use the quotient (stored in eax) and remainder (stored in edx). Trying to enforce one division through std::div() requires an extra function call, which is strictly worse.

The bottom line: don’t help the compiler! Modern compiler does context free optimizations better than we do. Use the time and energy to rethink about the architecture and data structure instead!

149 total views, no views today

C Traps and Pitfalls

Here’s a concise paper describing common C programming pitfalls by Andrew Koening (www.literateprogramming.com/ctraps.pdf) corresponding to be book with the same title.

As a reminder to myself, I’ll spend this page summarizing common mistakes and my history with it.

Here are the mistakes that I don’t make because of certain programming habits:

  • Operator precedence: I use enough parenthesis to not rely on operator precedence
  • Pointer dereferencing: I always do *(p++) instead of *p++ unless it’s idiomatic.
  • for() or if() statements executing only first line: I always surround the block with {} even if it’s just one line. Too often we need to inject an extra line and without {} it becomes a trap.
  • Undefined side effect order: I never do something like y[i++]=x[i]
  • char* p, q: very tempting since C++ style emphasize on pointer as a type over whether the variable is a pointer. I almost never declare multiple variables in one line.
  • Macro repeating side effects: use inline functions instead whenever possible. Use templates in C++.
  • Unexpected macro associations: guard expressions with (). Use typedef.

Did these once before, adjusted my programming habits to avoid it:

  • counting down in for() loop with unsigned running variable: I stick with signed running variables in general. If I’m forced to use unsigned, I’ll remind myself that I can only stop AFTER hitting 1, but not 0 (i.e. i=0 never got executed).

Haven’t got a chance to run into these, but I’ll program defensively:

  • Integer overflow: do a<b instead of (a-b)<0. Calculate mean by adding halfway length to the smaller number (i.e. (a+b)/2 == a + (b-a)/2 given a<b). Shows up in binary search.
  • Number of bits to shift is always unsigned (i.e. -1 is a big number!)

What I learned from the paper:

  • stdio buffer on stack (registered with setbuf()) freed before I/O flushed: use static buffer (or just make sure the buffer lives outside the function call).
  • char type might be signed (128 to 255 are -128 to -1) so it sign extends during upcast. Use unsigned char go guarantee zero extend for upcasting.
  • toupper()/tolower() might be implemented as a simple macro (no checks, incorrect /w side effects)
  • Can index into a string literal: "abcdefg"[3] gives 'd'

Mistakes that I usually make when I switch back from full-time MATLAB programming:

  • Logical negation using ~ operator instead of ! operator.

Common mistakes I rarely make because of certain understanding:

  • Forgetting to break at every case in switch block. It’s hard to forget once you’re aware of the Duff’s device.
  • sizeof(a[])/sizeof(a[0]) after passing into a function does not give array length: hard to get it wrong once you understand that array (declared on stack) has meta-info that cannot be accessed beyond the stack level it’s initialized.

231 total views, no views today