Rationale Behind C++ Commandments (2) – Philosophy of C

Everything is seen as a bitstream

  • pointers are just integers to memory locations
    – [25] integer and pointers might be indistinguishable in signature resolution
  • code (CPU instructions) and addresses are treated the same way as a stream of data
    – concept of function pointers leads to lambda (functors)
    – classes came from structs containing data and function pointers (combined with namespaces)!
  • unchecked type declarations: the compiler trusts your interpretations of data
    – leads to run-time features such as overriding (virtual methods)
  • handles (pointers and references) has unrestricted power
    – [29] can const_cast it away if the handle is exposed (bad idea)

Performance-first design choice

  • do not pay performance penalty for features not used
    – static compilation and binding by default
    – unchecked type declarations (see above)
  • static compilation: the compiler tries to know everything at compile time
  • static binding by default (cheapest)
    – pay extra to use virtual methods (overriding)
    – [38] default parameter values are statically bound and not stored in vtable (i.e. overridden child method’s default values are ignored and parent’s default values are used ONLY WHEN called through up-casts)
  • inline is at the mercy of the optimizer (which can choose to emit an object if decided inlining is counter-productive). Mechanism that forces a function pointer to exist (pointing the function, virtual functions creates the pointer in vtable)

Toolchain

  1. preprocessor (parser & macros)
  2. compiler (create object files per translation unit, which is .c file in C)
    – access control (encapsulation) extends the old trick of emulating private in C++ through macros by marking functions as static (local within translation unit) in C.
  3. linker (combine object files and adjust the addresses)

Templates behaves like a combination of macros (copy-and-paste with parameters) except it’s spread across the toolchain like inline optimizations:

  • Code bloat (one copy per type combination)
  • Can only live in the header files (it’s a template, not realized code, so no object is emitted like a .cpp file)

Parsing (language design)

  • most vexing parse [Effective STL Item 6]: if something can be interpreted as a function declaration, it will be interpreted as a function declaration

Plain Old Data Types (C++ classes tried to emulate in their operator overloading behavior)

  • [15] allow (a=b)=c chaining by returning *this for operator=
  • [21] disallow rvalue assignment (a+b)=c by returning const object

Loading

Rationale Behind C++ Commandments (1) – Introduction

If you’ve programmed in (or studied) C++ long enough, like you have read Scott Meyers’s Effective C++, which is a book organized in, jokingly, commandments like ‘thou shall make destructors virtual’. There’s a lot of stuff to remember.

I’ve found an approach to make the ideas stick: by understanding the rationale behind these commandments through the lens of ‘What would you do if you were to make C++ (features) out of C?

C++ is not a language designed from scratch. A lot of quirks and oddities in C++ came straight from the philosophy and the language features naturally available in C. With the right jargons (concepts), you will find a lot of the seemingly counter-intuitive behavior ‘it ought to be like this because of (insert design choice here)‘.

This is what we are going to explore in the “Rational Behind C++ Commandments” (RBCC) blog post series which came from my notes when I was going through Scott Meyer’s book. Once you get the ideas, you should be able to come up with the rules in Effective C on your own (so you don’t have to blindly remember them).

Loading

Oversimplified: Getting rid of data in STL containers Summary of Item 9 in "Effective STL"

Unless deleting a known range of elements directly through iterators (no conditions to match), which rangeerase() method can be used directly, targeting specific key/value/predicate requires understanding of the container’s underlying data structure.

I’d like to give a summary of Item#9 in “Effective STL” by defining the names and concepts so the complicated rules can be sensibly deduced by a few basic facts.


The words ‘remove‘ and ‘erase‘ has very specific meaning for STL that are not immediately intuitive.

 Lives inTarget to matchPurpose
remove_?()<algorithm>requiredRearrange wanted elements to front
erase()containernot acceptedBlindly deleting range/position given

There is a remove() method for lists, which is an old STL naming inconsistency (they should have called it erase() like for associative containers). Treat it as a historical mistake.

The usage is easy to remember once you understand it with the right wording above:

algorithm + containercontiguouslistsassociative
remove_?(): move frontStep 1Step 1
(Use remove_?() method)
unordered*: cannot rearrange
(Use erase(key) directly)
erase(): trim tailStep 2Step 2
(Use remove_?() method)
Use after find_?()
(Use erase(key) directly)

Note that there are two steps for sequential (contiguous+lists) containers , hence the erase-remove idiom. It’s really two steps:

auto tail = remove(c.begin(), c.end(), T); 
c.erase(tail, c.end());

but they can be combined in one line since the tail is only useful at one place. Hence

c.erase( remove(c.begin(), c.end(), T), c.end() ); 

Lists provides a efficient shortcut method (see table below) since linked-lists does not need to be rearranged (just short the pointers).

one-shot methodscontiguouslistsassociative
by contentN/A: use erase-remove idiomremove(T)erase(key)
by predicateN/A: use erase-remove_if idiomremove_if(pred)N/A: Use for-loops for now
No erase_if() yet as of C++17.

Never try range-based remove_?() for associative containers. It is a data corruption trap if any attempt is made to use anything named remove on associative containers.

The trap used to be possible since <algorithms> and containers were separate, but newer C++ protects you from the trap by checking if the element you are moving is of a MoveAssignable type. Since associative containers’ keys cannot be modified (without a rescan), the elements are not move-assignable.


As for erasing through for-loops (necessary if you want to sneak in an extra step while iterating), C++11 now returns an iterator following the last erased element uniformly across all containers. This helps to preserve the running iterator that gets invalidated immediately after the erase through i=c.erase(i);


* For brevity, I twisted the term unordered here to mean that the native (implementation) data order is dependent on the data involved.

When I said ‘cannot rearrange’, I meant ‘cannot efficiently rearrange’, since there are no cheap O(1) next() or prev() traversal.

It’s a mess to simply copy one element over another (during rearrangement), leaving orphans there, and re-balance a BST or re-hash a hash-map. Nobody wants to go through such pains to remove element when there are tons of more direct ways out there.

Loading

Understanding the difference between recognized arrays and pointers 'Recognized' means sizeof(array_name) gives the underlying allocated size

array≠ pointer:

pointer only contains a memory location,
while an array already has memory allocated to hold the data.


The confusion comes from the fact that array names are always seen as pointers anywhere in C, but when an array name is referred in places that the scope happens know the allocated size, namely

  • Global arrays: everybody knows the size
  • Local arrays: only the instantiating function knows its size.

, the array name itself has a superpower that pointers lack: report the underlying allocated data size (NOT pointer size) using sizeof(array).


Definition: An array is ‘recognized‘ if the array name is used in the scope that knows the underlying data size.

Corollary: Calling the array name with sizeof() gives the underlying allocated data size.

Examples of consequences that can be derived from the definition above:

  • Heap allocations always return a pointer type, NOT an array name!
    So heap arrays are never recognized.
  • VLA in C99 are considered local stack arrays, so it’s recognized
  • x[] is just a cosmetic shorthand for *x: it doesn’t prevent any recognized array from decaying into a pointer across boundary.
  • The storage duration (static or not) does not matter. e.g.
    • Heap pointers at global level are not recognized arrays
    • Static local array still loses the recognition across function boundaries
      (unless passed carefully by data type T (&array)[N]).

Most often recognized arrays cannot be aliased without decaying into a pointer. However, we can bind a recognized array to a reference to an array, which is a completely different type. Example:

int v[]{1,2,3,4};
int (&w)[4]=v;  // w is a reference to an array of size 4

int* p = v;     // Decays v to a pointer. Size information lost.
// int &w[4]=v; // Does not compile: this means an array of 4 references.

Note that the syntax requires a bracket for reference name. Omitting it will lead the compiler to misinterpret it as an array of references, which cannot* be compiled.

This means contrary to common beliefs, you can pass a recognized array across functions through reference, but this is rarely done because of the hassle of explicitly entering the number of elements (4 for the example above) as part of the data type. This can still be done through templates/constexpr, but for such inconvenience, we’re better off using std::vector (or std::array if you want near zero overhead).

However, so far I haven’t found a way to re-recognize an array from a pointer. That means there is no way to keep a local array’s recognition across function boundaries in C since it does not have references like C++.


To summarize with a usage example: this post has described the entire logic needed to decide whether sizeof(x)/sizeof(x[0]) gives you the number of array elements, or how many times your machine pointer type is bigger than the element storage.


* references must be bound on creation. Declaring an array of references means you want to bound references in batches. There are no mechanisms to do so as of C++14.

 

Loading

begin() and end() is defined for arrays in C++11 and above

I was a little confused the first time I saw range-based for-loop in C++11 on “Tour of C++” that it works right out of the box for both recognized array and STL containers and yet the text says it requires begin() and end() to be defined for the object for range-based for-loop to run.

I later learned that despite typical STL usage example writes v.begin(), v.end(), the most bulletproof way is to write begin(v), end(v) instead (Herb Sutter recommends it). Then I started to suspect that C++11 must have defined free-form (non-member) begin(), end() functions that takes in arbitrary recognized arrays. I pulled up my code editor and ran this:

#include <iostream>
int main()
{
    int v[4]={1,2,3,4};
    std::cout << *(std::crend(v)-1) << std::endl;

    return 0;
}

It compiled and ran uneventfully, printing ‘1’ as expected (I’m using crend(), to see if they implemented the more obscure ones). It makes more sense now why range-based for-loop works for arbitrary recognized arrays without making an exception to the begin(), end() requirement.

To confirm that it is the case (since “Tour of C++” didn’t say anything about why arbitrary array works for range-based for loop), I looked up the STL source code from libc++ in LLVM, namely <iterator>, and saw this:

template <class T, size_t N> constexpr T* begin(T (&array)[N]);

Bingo! There’s a mechanism to do so. But before I close, Stephan T. Lavavej (Mr. STL) mentioned that the template quoted above is no longer required (by the standard) to implement range-based for-loop in C++11.

Now the conclusion becomes that begin(), end() that takes in recognized arrays exist (which completes the logic behind range-based for-loop), but the range-based for loop can (and typically will) handle recognized arrays without these templated functions defined.

 

Loading