5.4.4 Byte-Aligned Codes
Even though a few tricks can help us decode bit-aligned codes quickly, codes of
variable bit length are cumbersome on processors that process bytes. The proces
sor is built to handle bytes efficiently, not bits, so it stands to reason that byte
aligned codes would be faster in practice.
There are many examples of byte-aligned compression schemes, but we con
sider only one popular method here. This is the code commonly known as v-byte,
which is an abbreviation for “variable byte length.” The v-byte method is very
similar to UTF-8 encoding, which is a popular way to represent text (see section
3.5.1).
Like the other codes we have studied so far, the v-byte method uses short codes
for small numbers and longer codes for longer numbers. However, each code is a
series of bytes, not bits. So, the shortest v-byte code for a single integer is one byte.
In some circumstances, this could be very space-inefficient; encoding the number
1 takes eight times as much space in v-byte as in Elias-γ . Typically, the difference
in space usage is not quite so dramatic.
The v-byte code is really quite simple. The low seven bits of each byte con
tain numeric data in binary. The high bit is a terminator bit. The last byte of each
code has its high bit set to 1; otherwise, it is set to 0. Any number that can be
represented in seven binary digits requires one byte to encode. More information
about space usage is shown in Table 5.4.
Some example encodings are shown in Table 5.5. Numbers less than 128 are
stored in a single byte in traditional binary form, except that the high bit is set.
For larger numbers, the least significant seven bits are stored in the first byte. The
next seven bits are stored in the next byte until all of the non-zero bits have been
stored.
Storing compressed data with a byte-aligned code has many advantages over a
bit-aligned code. Byte-aligned codes compress and decompress faster, since pro-