Experimental Worldview Framework Desires x Problems x Mechanisms x Device

Posted on November 28, 2017 by admin

I am experimenting with a framework to summarize how I observe things that are going on around me, analyzing situations and coming up with solution approaches. Currently this is what I have:

{Desires} × {Problems} × {Mechanisms} × {Devices}

Everything I see can be analyzed as a result of the cross-product (a fancy word for combinations of contents) between these 4 broad categories. To make it easier to remember, they can be factored into 2 major categories:

{Questions} × {Answers}

Where obviously

[Questions] Desires (objectives) lead to problems (practicalities) to solve
[Answers] Mechanisms (abstract concepts) hosted by a device (implementation) to address questions

Why the cross product? By tabulating everything I learned or know in 4 columns (categories), I can always select a few of them (subset) and notice a lot of recurring themes (questions) and common solution approaches (answers). This corresponds to an old saying “there’s nothing new under the sun”.

Then what about innovations? Are we constantly creating something new? Yes, we still are, but if you look closely, there are very few ideas that are fundamentally new that cannot be synthesized by combining the old ones (sometimes recursively).

Let me use the framework itself as an example on how to apply this framework (yes, it’s recursive):

Desires: predict and understand many phenomenon
Problems: mental capacity is limited
Mechanisms: this framework (breaking observations into 4 categories)
Devices: tabulation (cross-products, order reduction)

Feedback as an example:

Desires: have good outcomes (or meet set objectives)
Problems: not there yet
Mechanism: take advantage of past data for future outputs (through correction)
Devices: feedback path (e.g. regulator or control systems.)

Feedforward as an example that shares a lot of properties as feedback:

Desires: have good outcomes (or meet set objectives)
Problems: not there yet
Mechanism: take advantage of past data for future outputs (through prediction)
Devices: predictor (e.g. algorithm or formula)

Abstraction as an example:

Desires: understand complexities (e.g. large code base)
Problems: limited mental capacity (programmers are humans after all)
Mechanism: abstraction (generic view grouping similar ideas together)
Devices: black-boxes (e.g. functions, classes)

Trade as an example:

Desires: improves utility (utility = happiness in economics lingo)
Problems: one cannot do everything on its own (limited capacity)
Mechanism: exchange competitive advantages
Devices: market (goods and services)

Business as an example:

Desires: improves utility (through trade)
Problems: need to offer something for trade
Mechanism: create value
Devices: operations

Money (and Markets) as two examples:

Desires: facilitate trade
Problems: difficult valuation and transfer through barter, decentralized
Mechanism: a common medium
Devices: money (currencies), markets (platform for trade)

Law as an example:

Desires: make the pie bigger by working cooperatively
Problems: every individual tries to maximize their own interest but mentally too limited to consider working together to grow the pie (pareto efficient solutions)
Mechanism: set rules and boundaries (I personally think it’s a sloppy patch fix that is way overused and way abused) and get everybody to buy it
Devices: law and enforcement

Religion as an example

Desires: coexist peacefully
Problems: irreconcilable differences
Mechanism: blind unverified trust (faith)
Devices: Deities and religion

Just with the examples above, many desires can be consolidated along the lines of making ourselves better off, and many problems can be consolidated along the lines of we’re too stupid. Of course it’s not everything, but it shows the power of tabulating into 4 categories and consolidating the parts into few unique themes.

I chose to abstract the framework into 4 broad categories instead of 2 because two are too simplified to be useful: since the framework is a way to organize (compactify) observations into manageable number of unique items, there will be too many distinct entries if I have only two categories. Nonetheless, I would refrain from having more than 7 categories because most humans cannot reason effectively with that many levels of nested for-loops (that’s what cross products boils down to).

I also separated desires from problems because I realized that way too often people (manager, clients, customers, government, etc.) ask the wrong question (because they narrowed it to the wrong problem) that leads to a lot of wasted work and frequent direction changes. People are too often asked to solve hard problems that turns out to be the wrong ones for fulfilling the motivating desires. Very few learn to trace it back to the source (desires) and find the correct problem to address, which often have easy solutions that’s more valuable to the requester than what was originally asked. This often leads to unhappy outcomes for everybody that’s avoidable. An emphasis on desires is one of my frequently used approaches to prevent these kind of mishaps.

Oversimplified: Getting rid of data in STL containers Summary of Item 9 in "Effective STL"

Posted on November 26, 2017 by admin

Unless deleting a known range of elements directly through iterators (no conditions to match), which range–erase() method can be used directly, targeting specific key/value/predicate requires understanding of the container’s underlying data structure.

I’d like to give a summary of Item#9 in “Effective STL” by defining the names and concepts so the complicated rules can be sensibly deduced by a few basic facts.

The words ‘remove‘ and ‘erase‘ has very specific meaning for STL that are not immediately intuitive.

	Lives in	Target to match	Purpose
`remove_?()`	`<algorithm>`	required	Rearrange wanted elements to front
`erase()`	container	not accepted	Blindly deleting range/position given

There is a remove() method for lists, which is an old STL naming inconsistency (they should have called it erase() like for associative containers). Treat it as a historical mistake.

The usage is easy to remember once you understand it with the right wording above:

algorithm + container	contiguous	lists	associative
`remove_?()`: move front	Step 1	~~Step 1~~ (Use `remove_?()` method)	unordered*: cannot rearrange (Use `erase(key)` directly)
`erase()`: trim tail	Step 2	~~Step 2~~ (Use `remove_?()` method)	~~Use after `find_?()`~~ (Use `erase(key)` directly)

Note that there are two steps for sequential (contiguous+lists) containers , hence the erase-remove idiom. It’s really two steps:

auto tail = remove(c.begin(), c.end(), T); 
c.erase(tail, c.end());

but they can be combined in one line since the tail is only useful at one place. Hence

c.erase( remove(c.begin(), c.end(), T), c.end() );

Lists provides a efficient shortcut method (see table below) since linked-lists does not need to be rearranged (just short the pointers).

one-shot methods	contiguous	lists	associative
by content	N/A: use `erase-remove` idiom	`remove(T)`	`erase(key)`
by predicate	N/A: use `erase-remove_if` idiom	`remove_if(pred)`	N/A: Use for-loops for now No erase_if() yet as of C++17.

Never try range-based remove_?() for associative containers. It is a data corruption trap if any attempt is made to use anything named remove on associative containers.

The trap used to be possible since <algorithms> and containers were separate, but newer C++ protects you from the trap by checking if the element you are moving is of a MoveAssignable type. Since associative containers’ keys cannot be modified (without a rescan), the elements are not move-assignable.

As for erasing through for-loops (necessary if you want to sneak in an extra step while iterating), C++11 now returns an iterator following the last erased element uniformly across all containers. This helps to preserve the running iterator that gets invalidated immediately after the erase through i=c.erase(i);

* For brevity, I twisted the term unordered here to mean that the native (implementation) data order is dependent on the data involved.

When I said ‘cannot rearrange’, I meant ‘cannot efficiently rearrange’, since there are no cheap O(1) next() or prev() traversal.

It’s a mess to simply copy one element over another (during rearrangement), leaving orphans there, and re-balance a BST or re-hash a hash-map. Nobody wants to go through such pains to remove element when there are tons of more direct ways out there.

Understanding the difference between recognized arrays and pointers 'Recognized' means sizeof(array_name) gives the underlying allocated size

Posted on November 25, 2017 by admin

array≠ pointer:

a pointer only contains a memory location,
while an array already has memory allocated to hold the data.

The confusion comes from the fact that array names are always seen as pointers anywhere in C, but when an array name is referred in places that the scope happens know the allocated size, namely

Global arrays: everybody knows the size
Local arrays: only the instantiating function knows its size.

, the array name itself has a superpower that pointers lack: report the underlying allocated data size (NOT pointer size) using sizeof(array).

Definition: An array is ‘recognized‘ if the array name is used in the scope that knows the underlying data size.

Corollary: Calling the array name with sizeof() gives the underlying allocated data size.

Examples of consequences that can be derived from the definition above:

Heap allocations always return a pointer type, NOT an array name!
So heap arrays are never recognized.
VLA in C99 are considered local stack arrays, so it’s recognized
x[] is just a cosmetic shorthand for *x: it doesn’t prevent any recognized array from decaying into a pointer across boundary.
The storage duration (static or not) does not matter. e.g.
- Heap pointers at global level are not recognized arrays
- Static local array still loses the recognition across function boundaries
  (unless passed carefully by data type T (&array)[N]).

Most often recognized arrays cannot be aliased without decaying into a pointer. However, we can bind a recognized array to a reference to an array, which is a completely different type. Example:

int v[]{1,2,3,4};
int (&w)[4]=v;  // w is a reference to an array of size 4

int* p = v;     // Decays v to a pointer. Size information lost.
// int &w[4]=v; // Does not compile: this means an array of 4 references.

Note that the syntax requires a bracket for reference name. Omitting it will lead the compiler to misinterpret it as an array of references, which cannot* be compiled.

This means contrary to common beliefs, you can pass a recognized array across functions through reference, but this is rarely done because of the hassle of explicitly entering the number of elements (4 for the example above) as part of the data type. This can still be done through templates/constexpr, but for such inconvenience, we’re better off using std::vector (or std::array if you want near zero overhead).

However, so far I haven’t found a way to re-recognize an array from a pointer. That means there is no way to keep a local array’s recognition across function boundaries in C since it does not have references like C++.

To summarize with a usage example: this post has described the entire logic needed to decide whether sizeof(x)/sizeof(x[0]) gives you the number of array elements, or how many times your machine pointer type is bigger than the element storage.

* references must be bound on creation. Declaring an array of references means you want to bound references in batches. There are no mechanisms to do so as of C++14.

begin() and end() is defined for arrays in C++11 and above

Posted on November 24, 2017 by admin

I was a little confused the first time I saw range-based for-loop in C++11 on “Tour of C++” that it works right out of the box for both recognized array and STL containers and yet the text says it requires begin() and end() to be defined for the object for range-based for-loop to run.

I later learned that despite typical STL usage example writes v.begin(), v.end(), the most bulletproof way is to write begin(v), end(v) instead (Herb Sutter recommends it). Then I started to suspect that C++11 must have defined free-form (non-member) begin(), end() functions that takes in arbitrary recognized arrays. I pulled up my code editor and ran this:

#include <iostream>
int main()
{
    int v[4]={1,2,3,4};
    std::cout << *(std::crend(v)-1) << std::endl;

    return 0;
}

It compiled and ran uneventfully, printing ‘1’ as expected (I’m using crend(), to see if they implemented the more obscure ones). It makes more sense now why range-based for-loop works for arbitrary recognized arrays without making an exception to the begin(), end() requirement.

To confirm that it is the case (since “Tour of C++” didn’t say anything about why arbitrary array works for range-based for loop), I looked up the STL source code from libc++ in LLVM, namely <iterator>, and saw this:

template <class T, size_t N> constexpr T* begin(T (&array)[N]);

Bingo! There’s a mechanism to do so. But before I close, Stephan T. Lavavej (Mr. STL) mentioned that the template quoted above is no longer required (by the standard) to implement range-based for-loop in C++11.

Now the conclusion becomes that begin(), end() that takes in recognized arrays exist (which completes the logic behind range-based for-loop), but the range-based for loop can (and typically will) handle recognized arrays without these templated functions defined.

Do not help the compiler at the expense of readability Unless you read the assembly code emitted at the bottleneck and did benchmarks

Posted on November 24, 2017 by admin

Compilers has gotten smarter and smarter nowadays that they’d be able to analyze our code for common patterns (or logically deduce away steps that doesn’t have to be performed at runtime).

Matt Godbolt gave a nice presentation at CppCon 2017 named “What Has My Compiler Done for Me Lately?”. Through observing the emitted assembly code at different optimization levels, he showed that the compiler doesn’t need to be micromanaged (through performance hacks in our code) anymore, as it will emit instructions as the performance-hacked code intended when it is better to do so.

It means the compiler writers already know our bag of performance hack tricks way better than we do. Their efforts spared us from premature optimization and leave us more time to find a better data structure or algorithm to solve the problem.

What I got from the lecture is NOT that we are free to write clumsy code and let the compiler sort it out (though it occasionally can, like factoring a loop doing simple arithmetic series into a one line closed form solution), but we don’t have to make difficult coding choices to accommodate performance anymore.

The most striking facts I learned from his lecture are

The compiler can emit a one-line CPU instruction that does not have a corresponding native operation C/C++ if your hardware architecture supports it. (e.g. clang can convert a whole loop that counts the number of set bits into just ‘popcnt eax, edi‘)
Through Link-Time Optimization (LTO), we don’t have to pay the performance penalty for language features that are ultimately necessary for the current compilation (e.g. virtuals are automatically dropped if the linker finds that nowhere in the output currently needs it)

With such LTO, why not do away the virtual specifier and make everything unspecified virtual by default anyway (like Java)? For decades, we’ve been making up stories that some classes are not meant to be derived (like STL containers), but the underlying motive is that we don’t want to pay for vtable if we don’t have to.

Instead of confusing new programmers about when should they make a method virtual (plenty of rule-of-thumbs became dogma), focus on telling them whenever they (choose to upcast a reference/pointer to the parent anywhere in their code and) invoke the destructor through the parent reference/pointer, they will pay a ‘hefty’ price of vtable and vptr.

I don’t think anybody (old codebase) will get harmed by turning on virtuals by default and let the linker decide if those virtuals can be dropped. If it changes anything, it might turn buggy code with the wrong destructor called into correct code which runs slower and takes up more space. In terms of correctness, this change might break low-level hacks that expects the objects to be of certain size (e.g. alignment) without vptr.

Even better, add a class specifier that mandates that all uses of its child must not invoke vtable (have the compiler catch that) unless explicitly overridden (the users decide to pay for the vtable). This way the compiler can warn about performance and space issues for the migration.

The old C++’s ideal was “you only pay for the language features you used (written)”, but as compilers gets better, we might be able change it to “you pay extra only for the language features that are actually used (in the finally generated executable) with your permission”.

I’d also like to add Return Value Optimization (RVO) into my list of compiler advances that changes the way we code. C++11 added move semantics, but I think it’s something that the compiler in the future could be able to manage themselves. Even with an old C++ compiler like the one shipped with VisualDSP 5.0, the copy constructor was not called (yes, skipping it is legal even if the copy constructor has side effects) when I do this:

Matrix operator+(const Matrix& a, const Matrix& b)
{
  Matrix c(a.dim);
  // ... for all element i, c.raw[i] = a.raw[i]+b.raw[i]
  return c;
}
Matrix c = a + b;

Actually, the compiler at that time was not that smart about RVO, the actual code I wrote originally had two return branches, which defeats RVO (it’s a defined behavior by the specs):

Matrix operator+(Matrix a, Matrix b)
{
  Dims m = a.dims;
  if( m == b.dims ) // Both inputs must have same dimensions
  {
    Matrix c(m); // Construct matrix c with same dimension as a
    // ... for all i, c.raw[i] = a.raw[i] + b.raw[i]
    return c;
  } 
  else 
  {
    return Matrix::dummy; // A static member, which is a Matrix object
  }
}

To take advantage of RVO, I had to reword my code

Matrix operator+(Matrix a, Matrix b)
{
  Dims m = a.dims;
  if( m == b.dims ) // Both inputs must have same dimensions
  {
    Matrix c(m); // Construct matrix c with same dimension as a
    // ... for all i, c.raw[i] = a.raw[i] + b.raw[i]
  } 
  else 
  {
    Matrix c = Matrix::dummy; // or just "Matrix c";
  }
  return c;
}

I think days are counting before C++ compilers can do “copy-on-write” like MATLAB does if independent compilation are no longer mandatory!

Given my extensive experience with MATLAB, I’d say it took me a while to get used designing my code with “copy-on write” behavior in mind. Always start with expressive, maintainable, readable and correct code keeping in mind the performance concerns only happens under certain conditions (i.e. passed object gets modified inside the function).

If people start embracing the mentality of letting the compiler do most of the mechanical optimization, we’ll move towards a world that debugging work are gradually displaced by performance-bottleneck hunting. In my view, anything that can be done systematically by programming (like a boilerplate code or idioms) can eventually be automated by better compiler/linker/IDE and language design. It’s the high-level business logic that needs a lot of software designers/engineers to translate fuzzy requirements into concrete steps.

Matt also developed a great website (http://godbolt.org/) that compiles your code repeatedly on the fly and shows you the corresponding assembly code. Here’s an example of how I use it to answer my question of “Should I bother to use std::div() if I want both the quotient and remainder without running the division twice?”:

The website also included a feature to share the pasted code through an URL.

As seen from the emitted assembly code, the answer is NO. The compiler can figure out that I’m repeating the division twice and do only one division and use the quotient (stored in eax) and remainder (stored in edx). Trying to enforce one division through std::div() requires an extra function call, which is strictly worse.

The bottom line: don’t help the compiler! Modern compiler does context free optimizations better than we do. Use the time and energy to rethink about the architecture and data structure instead!

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Rambling Nerd with a Plan

Hoi Wong's blog

Monthly Archives: November 2017

Experimental Worldview Framework Desires x Problems x Mechanisms x Device

Oversimplified: Getting rid of data in STL containers Summary of Item 9 in "Effective STL"

Understanding the difference between recognized arrays and pointers 'Recognized' means sizeof(array_name) gives the underlying allocated size

begin() and end() is defined for arrays in C++11 and above

Do not help the compiler at the expense of readability Unless you read the assembly code emitted at the bottleneck and did benchmarks