5.4.3 Bit-Aligned Codes
The code we invented in section 5.4.1 is a bit-aligned code, meaning that the
breaks between the coded regions (the spaces) can happen after any bit position. In this section we will describe some popular bit-aligned codes. In the next
section, we will discuss methods where code words are restricted to end on byte
boundaries. In all of the techniques we’ll discuss, we are looking at ways to store
small numbers in inverted lists (such as word counts, word positions, and deltaencoded document numbers) in as little space as possible.
One of the simplest codes is the unary code. You are probably familiar with binary, which encodes numbers with two symbols, typically 0 and 1. A unary number system is a base-1 encoding, which means it uses a single symbol to encode
numbers. Here are some examples: