notes

Compact bitwise token encoding

Encoding

Tokens are encoded with a variable number of bits:

This means that the bits do not end at a byte (or, in general, a BitStore) boundary. If the final byte were to be filled with 0s, then most programs would decode with extra S tokens. Likewise, if filled with 1s, there would be extra L tokens. To resolve this ambiguity, if the final bit is a 0, then a marker 1 bit is appended before the trailing zeros, which is ignored when unpacking.

History

I have a tradition of including bit packing in each of my major Whitespace implementations: Respace, Nebula, yspace, and Nebula 2. Nebula 2 has configurable bit order unlike the others, which were Msb0.

As far as I can tell, Respace was the first implementation of this algorithm and had been discovered independently. A similar converter between 2- and 3-letter alphabets was built by r.e.s. for a Whitespace coding challenge in 2012, but it operates on characters, instead of bits.