Programming Concepts: Initializer Lists (C++)

Many people have taught about the rules to C++’s initializers list and when they are used, but few have developed a model/idea to explain why things are the way they are, such as why initializers list always follows the order which the members are declared, not the order declared by the users.

There are just too many rules to remember, but there’s one easy explanation that you can quickly derive the behavior of initializer and use cases for initializer lists: if you were the compiler, how would you break classes into loose functions and variables (C++ without classes) which eventually what low level implementation (or assembly) sees: code and data?

In C++, you can think of classes = structs + other features recognized by the compiler. This means fundamentally when a struct exist, (non-heap) memory is already allocated for all its members (heap memory are referenced by pointer, so it’s just an integer data member of the class).

C++ is not Python where you can make up class members as you go (including inside the __init__ constructor) at runtime. This means by the time you get into a C++ constructor, ALL data members are already set up (just like variable declarations in loose/outside-class code) and you are merely calling a function (constructor is automatically called first) to modify the already established variables (data members).

This is what roughly goes on when you create/build an object in C++

{
   BLOCK declaring virtual base classes 
   (virtual inheritance blesses all methods with 
   virtuals which means adding an extra vptr to 
   the vtable to the object)

   virtual inheritance makes sure the grandparent
   is constructed by the child first to be SHARED
   by both parents in a diamond multiple inheritance
   setting so there's no ambiguity. This is not
   the normal order of construction like non-virtual
   inheritance

   BLOCK declaring non-virtual base classes

   BLOCK declaring ALL data members of A

   BODY of A's constructor
}

So if I had to make my own initializer list processor in a pinch, I’d just write a macro preprocessor which translates parses the list bock {: x(X), y(Y) ...} that goes right after the constructor declaration but right before the constructor body, and find the line matching the variables on the list and add {X} right after it, to illustrate

int x;
// after parsing x(X) item on initializer list becomes
int x {X};
// where X is a blind? substitution of the phrase

In C++ (also C if the concept came from there), there are a few constructs that doesn’t allow you to modify variables once it’s established. Therefore you need to intercept process of “establishing the memory for data members of the struct”, aka the variable declaration block, modified by the initializer list syntax.

Say you have a class A declared like this:

class A
{
  public: // struct is class with public by default
    Type1 m1;
    Type2 m2;
};
A a;

Without initializer list, what this is doing is conceptually equivalent to mangling (adding prefix to the variable names) scoped (namespaced) global variables with an instance prefix (say ‘a_’)

Type1 A::a_m1;
Type2 A::a_m2;
// Constructor of A

I intentionally use placeholders Type1 and Type2 here because you’ll realize there are a few types in C++ that cannot be declared without providing an initial value:

  • constants must never be changed, so must initialize on creation
  • references are bound on creation and cannot be reassigned (changed)
// Invalid: consts value must be known at ALL times
const bool A::a_m2; 
// Invalid: refs are bound and sealed on creation
      int& A::a_m1; 

Inintializer list gives you direct access to the variable declaration process before getting to the constructor where the declaration is already set and done, effectively doing this

const bool A::a_m2 = x; 
      int& A::a_m1 = y; 
// Constructor body of A
struct A
{
  A(int x, int y) : x(x), y(y) {}
  int x;
  int& y;
};

If you visualize initializer lists as it simply unrolls as scoped namespace declaration in the open space, it’s immediately obvious why the language compiler is anal about the construction order which also dictates the dependency on previous members

const bool A::a_m2 = x; 
      int& A::a_m1 = A::a_m2; 
// Constructor body of A
struct A
{
  A(int X) : x(X), y(x) {}
  int x;
  int& y;
};

This is the same as with inheritance, with B as the base (parent) class:

struct B { int z; };
struct A : public B
{
  A(int X) : B{7}, x(z+X) {}
  int x;
};
// it's unnamed temporary, 
// that's why I call it temp_b instead of b
B::temp_b_z = 7;      

// Create an alias due to inheritance
A::a_z& B::temp_b_z;  

// 7+X where X is the input to the constructor of A
A::a_x = A::a_z + X   

It’s obvious B (the base/parent) must be established first before the child exist, but this is also the case when we translate initializer lists to open function declaration.

It’s also obvious from the open function declaration view that if you can put the values you want in the function declaration (initializer lists), doing so saves you an extra step modifying established values, so obviously it has better performance and less stuff to debug.

The other case is when your member objects had no default constructor (not even the implict one, which happens when you declare a multivariate constructor yet didn’t declare a constructor with zero input arguments). I’ll skip A’s constructor for now

struct B
{
    B(int P, int Q) {y=P+Q;}
    int y;
};

struct A
{
    // Default constructor method of A goes here
    int x;
    B b;
};

Since there is no zero-argument (default) constructor for B() {...}, this first line which is established by default

int A::a_x;

// Invalid, no zero-arg/default constructor B()
B A::b;  

// Constructor body of A

The only way to establish B is to explicitly call the 2-argument constructor B(int, int):

int A::a_x;

// Internally sets A::b.y = 7
B A::b(3, 4);  

// Constructor body of A

And b must be built in the order A was declared, which is x forst, then b (an instance of class B), then the constructor for A can start.

struct B
{
    B(int P, int Q) {y=P+Q;}
    int y;
};

struct A
{
    A() : B(3, 4) {}
    int x;
    B b;
};

The situation is exactly the same as the case when class B is the parent/base, which is the same as the declaration of B being moved to the first line (since parents needs to be established first before child exists). Unlike consts/references that the initial value needs to be at the declaration because it’s later unmodifiable, lacking default/zero-arg constructor means the default attempt to use them at declaration (unless otherwise specified, aka by initializer lists) will fail.

Setting these values must be done at declaration
(through initialization lists)

Cannot modify after creation
(aka in constructor body)

  • Consts
    const int x = 5;
  • References (bound on creation)
    int& x = y;

Default initialization is not possible
(class/object has no default ctor)

struct B { 
  B(int x, int y) 
    {z=x+y};
  int z; 
};
  • Composition (Member) objects
    class A { B b; };
  • Inheritance (Parent) class
    class A : public B { … };

Loading

Leave a Reply

Your email address will not be published. Required fields are marked *