Librarize! Free variables/functions school of thought (as compared to OOP)

When programming C++, I have prefer to stick to free functions and refactor everything generic into libraries. However it doesn’t sound like the norm out there. I’m glad after sawing this video that that I’m not the only one.

My rationale is that classes are merely a mental model built on Von Neuman architecture saying that data (variable) and program (functions) aren’t that different after all. Combined with structs, and a little help of the compiler treating function pointers differently (as methods), we can bundle action and data into one unit called a Class (of objects).

Classes is a useful idea but I would not over-objectify, like writing a class that holds 5 constants or frame a collection of loosely coupled or unrelated generic helper functions into a class (it should be organized into packages or namespaces). Over-objectifying reeks cargo cult programming.

My primary approach to program design is self-documenting. I prefer present the code in a way (not just syntax, but the code and data structure) that’s the easiest to understand without material sacrifices to performance or maintainability. I use classes when the problem statement happened to naturally align with what classes has to offer without mental gymnastics.

My decision process goes roughly like this:

  • If a problem naturally screams data type (like matrices), which is heady on operator overloading, I’d use classes in a heartbeat as data types are mathematical objects.
  • Then I’ll look into whether the problem is heavy on states. In other words, if it’s necessary for one method to drop something in the mailbox for another method to pick it up without meeting each other (through parameter passing calls), I’ll consider classes.
  • If the problem statement screams natural interactions between objects, like a chess on the chessboard, I’d consider classes even if I don’t need OOP-specific features

The last thing I want to use OOP as a tool for:

  1. Hiding sloppy generic methods that is correct only given your class’s implicit assumptions that are not spelt out, like sorting 7 digit phone numbers that are unique.
  2. Abusing data members to pretend you are not casually using globals and static all over the place (poor encapsulation) as if you would have done it in a C program.

1) Free functions for generic operations

The first one is an example that calls for free functions. Instead of writing a special sort function that makes the assumption that the numbers are unique and 7 math digits. A free function bitmap_sort() should be written and put in a sorting library if there aren’t an off-the-shelf package that does it already.

In the process of refactoring parts of your program out into generic free functions (I just made up the word ‘librarize’ to mean this), you go through the immensely useful mental process of

  • Explicitly understanding the exact assumptions that specifically applies to your problem that you want to take advantage of or work around the restrictions. You can’t be sure that your code is correct no matter how many tests you’ve written if you aren’t even clear about under what inputs your code is correct and what unexpected inputs will break the assumptions.
  • Discover the nomenclature for the concept you are using
  • Knowing the nomenclature, you have a very good chance of finding work already done on it so you don’t have to reinvent the wheel … poorly
  • If the concept hasn’t been implemented yet, you can contribute to code reuse that others and your future self can take advantage of.
  • Decoupling generic operation from business logic (the class itself) allows you to focus on your problem statement and easily swap out the generic implementation, whether it’s for debugging or performance improvement, or hand the work over to others without spending more time explaining what you wanted than writing the code yourself.

This is much better than jumping into writing a half-assed implementation of an idea that you haven’t fully understood the quirks (assumptions). You learn a new concept well rather than repeating similar gruntwork over and over and it doesn’t benefit anybody else, and you likely have to debug the tangled mess when you run into a corner case because you didn’t understand the assumptions well enough to decouple a generic operation from the class.

Polymorphism in OOP is a lot broader than just function overloading. While virtuals (which the run-time polymorphism make sense only with inheritance) so it has to be an OOP thing. Templates (which also applies to free functions), and function overloading (which also applies to free function) are compile-time polymorphism.

Polymorhpishm isn’t exclusive to OOP the way Bajrne defined it. C++ can overload free functions. You don’t need to put things into classes just because you want a context (signature) dependent dispatch (aka compiler figuring which version of the function with the same name to call).

2) Classes are not excuses to hide unnecessary uses of global/statics

Data members in classes are namespace-scoped version of global/static variables that could be optionally localized/bound to instances. Private/Public access specifiers in C++ were global/file scope variables switched through static modifier (file scope).

If you don’t think it’s a good habit to sprinkle global scope all over the place in C, try not to go wild using more data members than necessary either.

Data members give an illusion that you are encapsulating better and ended up incentivising less defensive programming practices. Instead of not polluting in the first place (designing your data flow using the mentality of global variables), it merely contained the pollution with namespace/class scopes.

For example, if you want to pass a message (say an error code) directly from one method to another and NOBODY else (other methods) are involved, you simply pass the message as an input argument.

Globals or data members are more like a mechanism that you drop a letter to a mailbox instead of handing it your intended recipient and hope somehow the right person(s) will reach it and the right recipient will get it. Tons of things can go wrong with this uncontrolled approach: somebody else could intercept it or the intended recipient never knew the message is waiting for him.

With data members, even if you marked them as private, you are polluting the namespace of your class’ scope (therefore not encapsulating properly) if there’s any method that can easily access data members that it doesn’t need.

How I figured this out on my own based on my experience in MATLAB

Speaking of insidious ways to litter your program design the globalist mentality (pun intended), data members are not the only offenders. Nested functions (not available in C++ but available in modern MATLAB and Python) is another hack that makes you FEEL less guilty structuring your program in terms of global variables. Everything visible one level above the nested function is relatively global to the nested function. You are literally polluting the variable space of the nested function with local variables of the function one level above, which is a lot more disorganized than data members that you kind of acknowledge what you’ve signed up for.

Librarize is the approach I came up with for MATLAB development: keep a folder tree of user MATLAB classes and free functions organized in sensible names. Every time I am tempted to reinvent the wheel, I try to think of the best name for it. If the folder with the same name exist, chances are I already did something similar before and just needed a little reminder. This way I always have high quality in-house generic functions (which I could expand the use cases with backward compatibility as needed).

This approach works because I’m confident with my ability to naturally come up with sensible names consistently. When I did research in undergrad, the new terminologies I came up with happened to coincide with wavelets before I studied wavelets, as in hindsight what I was doing was pretty much the same idea as wavelets except it doesn’t have the luxury of orthogonal basis.

If a concept has multiple names, I often drop breadcrumbs with dummy text files suggesting the synonym or write a wrapper function with a synonymous name to call the implemented function.

C++ could simply overload free functions by signatures, but not too many people know MATLAB can overload free functions too polymorphic by ONLY BY THE FIRST ARGUMENT (can’t do signatures because MATLAB supports variable arguments which defeats the concept of signatures). It’s a very advanced technique I came up with which allow the same code to work for many different data types, doing generics without templates available in C++.

I also understand that commercial development are often rushed so not everybody could afford the mental energy to do things properly (like considering free functions first). All I’m saying is that there’s a better way than casually relying on data members more than needed, and using data member should have the same stench as using global variables: it might be the right thing to do in some cases, but most often not.

Loading

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments