Explain XOR tricks succinctly with two properties Change detector, toggler

Most teaching materials on XOR starts from its straight definition then a bunch of recipes for exploiting XOR. It is not convenient to remember and not intuitive to come up with new tricks on our own. Instead, I’d like to describe XOR as two intuitive operations:

  1. Change detector (The logic definition itself)
  2. Toggling light switches (Flipping twice undoes it)

The second interpretation (toggling) is actually way more powerful and easy to remember than the first interpretation (definition).


With the toggling interpretation, you don’t have to think hard to come up with the four algebraic properties:

  1. Associativity and commutativity: we only care about whether the individual light switches (bits) are flipped odd (flipped) or even (not-flipped) number of times at the end of the day. When it happens in what order does not matter.
  2. Identity: flipping NONE of the switches (a mask of all 0) leaves it alone
  3. Inverse: pick only the switches that are already turned on (using itself as mask) and flip them (xor) guarantees to turn everything off (0)

The XOR swap trick can be constructed by exploiting the fact that flipping the same switch twice (at any point) reverts it back to where it started:

x y
Start x_0 y_0
Mix y_0 with x_0 x_1 = x_0 \oplus y_0 No info is lost, still have y_0 somewhere to undo if we wish.
The incumbent y_0 is used to cancel the y_0 inside x_1 to recover x_0 \begin{array}{rcl} y_2 & = & y_0 \oplus x_1 \\ & = & y_0 \oplus (x_0 \oplus y_0)\\ & = & x_0 \end{array} Since y_0 is already saved (mixed with) x_1, we don’t have to worry about losing it as long as we get to our main goal of putting x_0 in y, which will be used as a recovery key later.
Now use the x_0 stored in y_2 to unmix x_1 to recover y_0 \begin{array}{rcl} x_3 & = & x_1 \oplus y_2 \\ & = & (x_0 \oplus y_0) \oplus x_0\\ & = & y_0 \end{array}

Note that this trick is obsolete for modern computer architecture because

  • it enforces strict dependence that kills any predictive pipeline (speedup) in modern computer architecture, and
  • will yield zero all the time if you try to swap with itself in-place (same memory location), which is wrong because you expect the swap to do nothing.

Adding a check for the degenerate case (self-swap) slows the program down even further, making it worse than using a temporary storage for swapping. Therefore there’s no good reason to use it unless you have an exotic use case.

This is mainly used as a teaching device (or homework problem) to teach that XOR-ing even instances of the same variable in the chain cancels it.


 

Loading

Python 3 Scientific Installation To-do List

Although I am a big fan of MATLAB, it’s time for me to really try out Python so I can fairly compare the pros and cons of both languages.

The first tiny hurdle for Python is its scattered installation process for Windows. I thought Python(x,y) will give me everything in one place, but turns out the Spyder is stuck in Python 2.7. To install Python 3.7, I’ll need to do it from scratch. Here are the steps:

  1. Download official Python 3. You will need that for the “pip” package manager located in {python37}/scripts
  2. Update PIP first to avoid complaints. You can run it anywhere in command prompt
    python -m pip install --upgrade pip
    You don’t call PIP to update PIP because you an executable cannot write itself in WindowsNote that for 32-bit Python, you might run into Python37\python.exe: No module named pip, so you might want to use ensurepip to bootstrap: python -m ensurepip
  3. Now I’ll need Spyder3, a MATLAB-like IDE. Qt5 is one of the pre-req:
    pip install PyQt5
  4. And finally Spyder3
    pip install Spyder
    pip does not install icons in your start menu. So I’ll need to manually create a shortcut
    {Python37}/Scripts/spyder3.exe
    .py files are not associated with Spyder3 (normally it’ll just directly run the python script with python3). I usually manually change the association in Windows to Sypder3.
  5. PyVISA is the analog of “Instrument Control Toolbox” in MATLAB.
    pip install pyvisa
    MATLAB’s Instrument Control Toobox also cover serial ports, which is done in Python by PySerial
    pip install PySerial
  6. Numpy is included with scipy:
    pip install scipy
  7. Turns out that only NumPy and IPython is installed with SciPy, not the entire ecosystem.
    pip install pandas
    pip install matplotlib
    If you know the power of dataset/table objects in MATLAB like I do, you’ll jump for dataframes in panadas.
  8. SymPy, the analog of MATLAB’s symoblic math toolbox, needs to be installed separately
    pip install sympy
  9. IPython gives the ‘notebook’ feel in Mathematica, MathCAD and Maple, where the returned results are directly pasted in the same area where your command/syntax is. I rarely cared for it because I usually want the max visual real estate for my plots.

Update: I tried Anaconda (2019.03, Python 3.7.3 x64) which supposedly have everything in one place, but the Spyder it included crashes right out of the box. Jyupter is confusing as it relies on the web-browser to render the results. Feels patchy and doesn’t look like it adds more than the steps above. Uninstalled it without hesitation.

Update: To update the packages, tack -U switch at the end of each of the above pip install commands. Remember to follow the order of dependencies (e.g. update PyQt5 before Spyder)

Loading

MATLAB Techniques: variadic (variable input and output) arguments

I’ve seen a lot of ugly implementations from people trying to deal with variable number of input and output arguments. The most horrendous one I’ve seen so far came from MIT’s Physionet’s WFDB MATLAB Toolbox. Here’s a snippet showing how wfdbdesc.m handles variable input and output arguments:

function varargout=wfdbdesc(varargin)
% [siginfo,Fs,sigClass]=wfdbdesc(recordName)
...
%Set default pararamter values
inputs={'recordName'};
outputs={'siginfo','Fs','sigClass'};

for n=1:nargin
    if(~isempty(varargin{n}))
        eval([inputs{n} '=varargin{n};'])
    end
end
...
if(nargout>2)
    %Get signal class
    sigClass=getSignalClass(siginfo,config);
end
for n=1:nargout
    eval(['varargout{n}=' outputs{n} ';'])
end

 

The code itself reeks a very ‘smart’ beginner who didn’t RTFM. The code is so smart (shows some serious thoughts):

  • Knows to use nargout to control varargout to avoid the side effects when no output is requested
  • [Cargo cult practice]: (unnecessarily) track your variable names
  • [Cargo cult practice]: using varargin so it can be symmetric to varargout (also handled similarly). varargout might have a benefit mentioned above, but there is absolutely no benefit to use varargin over direct variable names when you are not forwarding or use inputParser().
  • [Cargo cult practice]: tries to be efficient to skip processing empty inputs. Judicially non-symmetric this time (not done to output variables)!

but yet so dumb (hell of unwise, absolutely no good reason for almost every ‘thoughtful’ act put in) at the same time. Definitely MIT: Make it Tough!

This code pattern is so wrong in many levels:

  • Unnecessarily obscuring the names by using varargin/varargout
  • Managing a list of variable names manually.
  • Loop through each item of varargin and varargout cells unnecessarily
  • Use eval() just to do simple cell assignments! Makes me cringe!

Actually, eval() is not even needed to achieve all the remaining evils above. Could have used S_in = cell2struct(varargin) and varargout=struct2cell(S_out) instead if one really wants to control the list of variable names manually!


The hurtful sins above came from not knowing a few common cell packing/unpacking idioms when dealing with varargin and varargout, which are cells by definition. Here are the few common use cases:

  1. Passing variable arguments to another function (called perfect forwarding in C++): remember C{:} unpacks to comma separated lists!
    function caller(varargin)
       callee(varargin{:});
  2. Limiting the number of outputs to what is actually requested: remember [C{:}] on left hand side (assignment) means the outputs are distributed as components of C that would have been unpacked as comma separated lists, i.e. [C{:}] = f(); means [C{1}, C{2}, C{3}, ...] = f();
    function varargout = f()
    // This will output no arguments when not requested,
    // avoiding echoing in command prompt when the call is not terminated by a semicolon
        [varargout{1:nargout}] = eig(rand(3));
  3. You can directly modify varargin and varargout by cells without de-referencing them with braces!
    function varargout = f(varargin)
    // This one is effectively deal()
        varargout = varargin(1:nargout);
    end
    
    function varargout = f(C)
    // This one unpacks each cell's content to each output arguments
        varargout = C(1:nargout);
    end

One good example combining all of the above is to achieve the no-output argument example in #2 yet neatly return the variables in the workspace directly by name.

function [a, b] = f()
// Original way to code: will return a = 4 when "f()" is called without a semicolon
    a = 4;
    b = 2;
end

function varargout = f()
// New way: will not return anything even when "f()" is called without a semicolon
    a = 4;
    b = 2;
    varargout = manage_return_arguments(nargout, a, b);
end

function C = manage_return_arguments(nargs, varargin)
    C = varargin(1:nargs);
end

I could have skipped nargs in manage_return_arguments() and use evalin(), but this will make the code nastily non-transparent. As a bonus, nargs can be fed with min(nargout, 3) instead of nargout for extra flexibility.


With the technique above, wfdbdesc.m can be simply rewritten as:

function varargout = wfdbdesc(recordName)
% varargout: siginfo, Fs, sigClass
...
varargout = manage_return_arguments(nargout, siginfo, Fs, sigClass);

Unless you are forwarding variable arguments (with technique#1 mentioned above), input arguments can be (and should be) named explicitly. Using varargin would not help you avoid padding the unused input arguments anyway, so there is absolutely no good reason to manage input variables with a flexible list. MATLAB already knows to skip unused arguments at the end as long as the code doesn’t need it. Use exist('someVariable', 'var') instead.

 

 

Loading

MATLAB Practices: Code and variable transparency. eval() is one letter away from evil() Transparency means that all references to variables must be visible in the text of the code

The general wisdom about eval() is: do not use it! At least not until you are really out of reasonable options after consulting more than 3 experts on the newsgroups, forums and support@mathworks.com (if your SMS is current)!

Abusing eval() turns it into evil()!

The elves running inside MATLAB needs to be able to track your variables to reason through your code because:

  • it helps your code run much faster (eval() cannot be pre-compiled)
  • able to use parallel computing toolbox (it has to know absolutely for sure about any shared writes)
  • mlint can warn you about potentially pitfalls through code smell.
  • it keeps you sane while debugging!

This is called ‘transparency’: MATLAB has to see what you are doing every step of the way. According to MATLAB’s parallel computing toolbox’s documentation,

Transparency means that all reference to variables must be visible in the text of the code

which I used as a subtitle of this post.


The 3 major built-in functions that breaks transparency are:

  1. eval(), evalc(): arbitrary code execution resulting in read and write access to its caller workspace.
  2. assignin(): it poofs variables in its caller’s workspace!
  3. evalin(): it breaks open the stack and read the variables in its caller’s workspace!

They should have been replaced by skillful use of dynamic field names, advanced uses of left assignment techniques, and freely passing variables as input arguments (remember MATLAB uses copy-on-write: nothing is copied if you just read it).


 

There are other frequently used native MATLAB functions (under certain usages) that breaks transparency:

  1. load(): poof variables from the file to the workspace. The best practice is to load the file as a struct, like S=load('file.mat'); , which is fully transparent. Organizing variables into structs actually reduces mental clutter (namespace pollution)!
  2. save(), who(), whos(): basically anything that takes variable names as input and act on the requested variable violates transparency because it’s has the effect of evalin(). I guess the save() function chose to use variable names instead of the actual variables as input because copy-on-write wasn’t available in early days of MATLAB. A example workaround would be:
    function save_transparent(filename, varargin)
        VN = arrayfun(@inputname, (2:nargin)', 'UniformOutput', false);     
        if( any( cellfun(@isempty, VN) ) )
            error('All variables to save must be named. Cannot be temporaries');
        end
        
        S = cell2struct(varargin(:), VN, 1);
        save(filename, '-struct', 'S');
    end
    
    function save_struct_transparent(filename, S)
        save(filename, '-struct', 'S');
    end

The good practices to avoid non-transparent load/save also reduces namespace pollution. For example, inputname() peeks into the caller to see what the variable names are, which should not be used lightly. The example above is one of the few uses that I consider justified. I’ve seen novice abusing inputname() because they were not comfortable with cells and structs yet, making it a total mindfuck to reason through.

Loading

Oversimplified: Getting rid of data in STL containers Summary of Item 9 in "Effective STL"

Unless deleting a known range of elements directly through iterators (no conditions to match), which rangeerase() method can be used directly, targeting specific key/value/predicate requires understanding of the container’s underlying data structure.

I’d like to give a summary of Item#9 in “Effective STL” by defining the names and concepts so the complicated rules can be sensibly deduced by a few basic facts.


The words ‘remove‘ and ‘erase‘ has very specific meaning for STL that are not immediately intuitive.

 Lives inTarget to matchPurpose
remove_?()<algorithm>requiredRearrange wanted elements to front
erase()containernot acceptedBlindly deleting range/position given

There is a remove() method for lists, which is an old STL naming inconsistency (they should have called it erase() like for associative containers). Treat it as a historical mistake.

The usage is easy to remember once you understand it with the right wording above:

algorithm + containercontiguouslistsassociative
remove_?(): move frontStep 1Step 1
(Use remove_?() method)
unordered*: cannot rearrange
(Use erase(key) directly)
erase(): trim tailStep 2Step 2
(Use remove_?() method)
Use after find_?()
(Use erase(key) directly)

Note that there are two steps for sequential (contiguous+lists) containers , hence the erase-remove idiom. It’s really two steps:

auto tail = remove(c.begin(), c.end(), T); 
c.erase(tail, c.end());

but they can be combined in one line since the tail is only useful at one place. Hence

c.erase( remove(c.begin(), c.end(), T), c.end() ); 

Lists provides a efficient shortcut method (see table below) since linked-lists does not need to be rearranged (just short the pointers).

one-shot methodscontiguouslistsassociative
by contentN/A: use erase-remove idiomremove(T)erase(key)
by predicateN/A: use erase-remove_if idiomremove_if(pred)N/A: Use for-loops for now
No erase_if() yet as of C++17.

Never try range-based remove_?() for associative containers. It is a data corruption trap if any attempt is made to use anything named remove on associative containers.

The trap used to be possible since <algorithms> and containers were separate, but newer C++ protects you from the trap by checking if the element you are moving is of a MoveAssignable type. Since associative containers’ keys cannot be modified (without a rescan), the elements are not move-assignable.


As for erasing through for-loops (necessary if you want to sneak in an extra step while iterating), C++11 now returns an iterator following the last erased element uniformly across all containers. This helps to preserve the running iterator that gets invalidated immediately after the erase through i=c.erase(i);


* For brevity, I twisted the term unordered here to mean that the native (implementation) data order is dependent on the data involved.

When I said ‘cannot rearrange’, I meant ‘cannot efficiently rearrange’, since there are no cheap O(1) next() or prev() traversal.

It’s a mess to simply copy one element over another (during rearrangement), leaving orphans there, and re-balance a BST or re-hash a hash-map. Nobody wants to go through such pains to remove element when there are tons of more direct ways out there.

Loading