Rationale Behind C++ Commandments (3) – Classes came from emulating POD data types through struct and namespaces

In structured programming (like C and C++), the building abstractions is program (functions) and data (variables).

Under the hood, especially in von-Neumann architecture’s perspective, functions and variables are both just data (a stream of numbers) that can be moved and manipulated the same way just like data. It’s all up to how the program designer and the hardware choose to give meaning to the bit stream.


Namespaces

In C, we can only scope our variables 3 ways: global, static (stays within same file/translation unit) and local. Sharing variables across functions in different translation units can only be done through

  • globals (pollutes namespace and it’s difficult to keep track of who is doing what to the variables and the state at any time)
  • passing (the more solid way that gives tighter control and clearer data flow, but managing how to pass many variables in many places is messy, even with struct syntax)

Bundling program with data gives a new way to tightly control the scope of variables: you can specify a group functions allowed to share the same set of variables in the bundle WITHOUT PASSING arguments.

The toolchain modified to recognize the user-defined scope boundaries which bundles program and data into packages, thus reducing root namespace pollution. The is implemented as namespace keyword in C++

Organizing with namespaces is basically justifying the mentality of using globals (in place of passing variables around intended functions) except it’s in a more controlled manner to keep the damages at bay. The same nasty things with gloabls can still appear if we didn’t design the namespace boundaries tightly so certain functions have access to variables that’s not intended for it.

Therefore, namespaces works nearly identical to a super-simple purely static class (see below) except you lose inheritance and access modifiers in classes in exchange for allowing anonymous namespaces.

Basically namespaces + structs + inheritance + encapsulation (access modifiers) = classes


Classes

Classes extends the idea of namespaces by allowing objects (each assigned their own storage space for the variables following the same variable layout) to be instantiated, so they behave like POD (Plain Old Data) in C. We should observe that when overloading operators

  • [15] allow (a=b)=c chaining by returning *this for operator=
  • [21] disallow rvalue assignment (a+b)=c by returning const object

In the most primitive form (no dynamic binding and types, aka virtuals and RTTI), function (method) info is not stored within instantiated objects as the compiler will sort out what classes/namespace they belong to. So it screams struct in C!

C struct is what makes (instantiates) objects from classes!

Note that C structs do not allow ‘static fields’ because static members is solely a construct of namespaces idea in C++! C++ has chosen to expand structs to be synonymous to classes that defaults to private access (if not specified) so code written as C structs behaves as expected in C++.

Loading

‘Static classes’ are unlike instantiable (object-bearing) classes in many ways

Technically there’s no static class in C, but a class with all members and functions declared static.

Static classes are like namespaces in many ways. Because no object is constructed (it’s just holding a bunch of variables and functions in the free space), a lot of features and syntax with regular classes do not make sense with static classes.

Because no objects are instantiated

  • No constructors or destructors (no objects to make/destroy)
  • No operator overloading (you need an instantiation to pass arguments to operator methods)
  • No overriding because there are no objects for you to upcast
    (nor there’s an object to store the vtable from the virtual keyword)!

Static members and methods are treated as free objects scoped by namespaces

  • Like C, static members variables live in .bss (not explicitly initialized ones will be zero-initialized) or .data (initialized) sections, not on stack/heap!
    Exception: static const int is internally seen as enum, which the compiler uses it to plug values in the code instead of allocating space for it.
  • Therefore the syntax is pretty much like free static/global variables
  • No constructor to build member variables within the class definition, so they must be defined OUTSIDE the class definition at the top level (just like static/globals), with a SRO (scope resolution operator).
  • Static methods acts like (and function overloads the same way as) free functions.
    That’s why we often use static methods for helpers.

Namespaces has no access modifiers (public/protected/private/friend), but in return only namespaces can be unnamed/anonymous (which behaves as private)!

Namespaces cannot be inherited, but static classes can!

  • Inherited members ARE REFERENCES to the parent!
    There’s no extra copies of underlying data if that member is successfully inherited (not shadowed)!
  • Members (function or variables) can only be shadowed in the child (never overridden since it’s not an object), which creates a NEW stack variable and hid the reference to the parent member

Static class’s inheritance behavior is the same across static classes object-bearing classes! It’s actually more explicit with static members as you’ll need two declarations outside the classes if you shadow.

I am pointing this out to show that inheriting static classes IS NOT cloning namespaces! Static classes behaves as if it’s just ONE CHILD object created on the .bss/.data section (the section for static variables).

This means unlike object-bearing classes, the static class Parent cannot exist on its own if its children are defined!

C++ rules are almost always sensible and coherent; but when combined, sometimes the implications could be surprising on the first sight! When we try to extrapolate expected behaviors in C++, very often we have to think not in terms of the convenient syntax, but the implications of its ground rules (a lot of them stems from C)!

Loading

Rationale Behind C++ Commandments (2) – Philosophy of C

Everything is seen as a bitstream

  • pointers are just integers to memory locations
    – [25] integer and pointers might be indistinguishable in signature resolution
  • code (CPU instructions) and addresses are treated the same way as a stream of data
    – concept of function pointers leads to lambda (functors)
    – classes came from structs containing data and function pointers (combined with namespaces)!
  • unchecked type declarations: the compiler trusts your interpretations of data
    – leads to run-time features such as overriding (virtual methods)
  • handles (pointers and references) has unrestricted power
    – [29] can const_cast it away if the handle is exposed (bad idea)

Performance-first design choice

  • do not pay performance penalty for features not used
    – static compilation and binding by default
    – unchecked type declarations (see above)
  • static compilation: the compiler tries to know everything at compile time
  • static binding by default (cheapest)
    – pay extra to use virtual methods (overriding)
    – [38] default parameter values are statically bound and not stored in vtable (i.e. overridden child method’s default values are ignored and parent’s default values are used ONLY WHEN called through up-casts)
  • inline is at the mercy of the optimizer (which can choose to emit an object if decided inlining is counter-productive). Mechanism that forces a function pointer to exist (pointing the function, virtual functions creates the pointer in vtable)

Toolchain

  1. preprocessor (parser & macros)
  2. compiler (create object files per translation unit, which is .c file in C)
    – access control (encapsulation) extends the old trick of emulating private in C++ through macros by marking functions as static (local within translation unit) in C.
  3. linker (combine object files and adjust the addresses)

Templates behaves like a combination of macros (copy-and-paste with parameters) except it’s spread across the toolchain like inline optimizations:

  • Code bloat (one copy per type combination)
  • Can only live in the header files (it’s a template, not realized code, so no object is emitted like a .cpp file)

Parsing (language design)

  • most vexing parse [Effective STL Item 6]: if something can be interpreted as a function declaration, it will be interpreted as a function declaration

Plain Old Data Types (C++ classes tried to emulate in their operator overloading behavior)

  • [15] allow (a=b)=c chaining by returning *this for operator=
  • [21] disallow rvalue assignment (a+b)=c by returning const object

Loading

Rationale Behind C++ Commandments (1) – Introduction

If you’ve programmed in (or studied) C++ long enough, like you have read Scott Meyers’s Effective C++, which is a book organized in, jokingly, commandments like ‘thou shall make destructors virtual’. There’s a lot of stuff to remember.

I’ve found an approach to make the ideas stick: by understanding the rationale behind these commandments through the lens of ‘What would you do if you were to make C++ (features) out of C?

C++ is not a language designed from scratch. A lot of quirks and oddities in C++ came straight from the philosophy and the language features naturally available in C. With the right jargons (concepts), you will find a lot of the seemingly counter-intuitive behavior ‘it ought to be like this because of (insert design choice here)‘.

This is what we are going to explore in the “Rational Behind C++ Commandments” (RBCC) blog post series which came from my notes when I was going through Scott Meyer’s book. Once you get the ideas, you should be able to come up with the rules in Effective C on your own (so you don’t have to blindly remember them).

Loading

Dart Language: late binding in lambda (no capture syntax)

Dart’s documentation at the time of writing (2021-10-27) is not as detailed as I hoped for so I don’t know if the Lambda (anonymous function) is late-binding or early-binding.

For those who don’t know what late/early-binding is:

Bindingearlylate
Expressionclosed (closure)open (not a closure)
Self-containedyesno
Free symbols
(relevant variables
in outer workspace)
bound/captured at creation
(no free symbols)
left untouched until evaluated/executed
(has free symbols)
Snapshotduring creationduring execution

Late-binding is almost a always a gotcha as it’s not natural. Makes great Python interview problems since the combination of ‘everything is a reference‘ and ‘reference to unnamed objects‘ spells trouble with lazy evaluation.

I’d say unlike MATLAB, which has a more mature thinking that is not willing to trade a slow-but-right program for a fast-but-wrong program, Python tries to squeeze all the new cool conceptual gadgets without worrying about how to avoid certain toxic combinations.

I wrote a small program to find out Dart’s lambda is late-bound just like Python:

void main()
{
  int y=3;
    var f = (int x) => y+x;	
    // can also write it as 
    // f(int x) => y+x;
  y=1000;	

  print( f(10) );
}
The result showed 1010, which means Dart use open lambdas (late-binding)

LanguageBinding rules for lambda (anonymous functions)
DartLate binding
Need partial application (discussed below) to enforce early binding.
PythonLate binding.
Can capture (bind early) by endowing running (free) variables (symbols) with defaults based on workspace variables
C++11Early binding (no access to caller workspace variables not captured within the lambda)
MATLABEarly binding (universal snapshot, like [=] capture with C++11 lambdas)

I tried to see if there’s a ‘capture’ syntax in Dart like in C+11 but I couldn’t find any.

I also tried to see if I can endow a free symbol with a default (using an outer workspace variable) in Dart, but the answer is no because unlike Python, default value expressions are required to be constexpr (literals or variables that cannot change during runtime) in Dart.

So far the only way to do early binding is with Partial Application (currying is different: it’s doing partial application of multiple variables by breaking it into a cascade of function compositions when you partially substitute one free variable/symbol at a time):

  1. Add an extra outer function layer (I call it the capture/binder wrapper) putting all free variables you want to bind as input arguments (which shadows the variables names outside the lambda expression if you chose not to rename the parameter variables to avoid conceptual confusion),
  2. then immediately EVALUATE the wrapper with the with the outer workspace variables (free variables) to capture them into the functor (closed lambda, or simply closure) which binder wrapper spits out.

Note I used a different style of lambda syntax in the partial application code example below

void main()
{
  var y=3;
    f(int x) => y+x;
	
    // Partial Application
    g(int w) => (int x) => w+x;	
    // Meat: EVALUATING g() AT y saves the content of y in h()
    var h = g(y);
	
  y=1000;	

  // y is free in f 
  print( f(10) );	

  // y is not involved in g until now (y=1000)	
  print( g(y)(10) );		

  // y is bounded for h when it was still y=3
  print( h(10) );
}

Only h() captures y when it’s still 3. The snapshot won’t happen if you cascade it with other lambda (since the y remains free as they show up in the lambda expression).

You MUST evaluate it at y at the instance you want y captured. In other words, you can defer capturing y until later points in your code, though most likely you’d want to do it right after the lambda expression was declared.


As a side note, I could have used ‘y’ instead of ‘w’ as parameter in the partial application statement

g(int y) => (int x) => y + x

but the ‘y‘ inside g() has shadowed (no longer refers to) the ‘y‘ in the outer workspace!

What makes it confusing here is that quite a few authors think it’s helpful (mnemonics-wise) to use the free variable (outer scope) name you are going to inject into the dummy (argument) as the name of the dummy! This gives the false impression of the variable being free while it’s bounded (through feeding it as a parameter in the wrapper/outer function)!

Scoping rules is a nasty source of confusion in understanding lambda calculus so I decide to give it a different name ‘w‘. I’m generally dismissive of shadowing under any circumstances, to the extent that I find Python’s @ syntactic sugar for decorators shadowing the underlying function which could have been reused somewhere (because it’s implicitly doing f=g(f) instead doing h=g(f). I hate it when you see the function with name f then f was repurposed to not mean the definition of f you saw right below but you have to keep in mind that it’s actually a g(f)). See my rants here.

Notational ambiguity, even if resolvable, is NOT helpful at all here especially when there are so many level of abstractions squeezed into so few symbols! People jumping into learning the subject quickly for the first time should not have unnecessarily keep track of the obscure scoping rules in their head to resolve the ambiguities!

Loading