Current relational DBMSs were designed to pad
attributes to byte or word boundaries and to store values in
their native data format. It was thought that it was too
expensive to shift data values onto byte or word
boundaries in main memory for processing. However,
CPUs are getting faster at a much greater rate than disk
bandwidth is increasing. Hence, it makes sense to trade
CPU cycles, which are abundant, for disk bandwidth,
which is not. This tradeoff appears especially profitable in
a read-mostly environment.
There are two ways a column store can use CPU cycles
to save disk bandwidth. First, it can code data elements
into a more compact form. For example, if one is storing
an attribute that is a customer’s state of residence, then US
states can be coded into six bits, whereas the twocharacter
abbreviation requires 16 bits and a variable
length character string for the name of the state requires
many more. Second, one should densepack values in
storage. For example, in a column store it is
straightforward to pack N values, each K bits long, into N
* K bits. The coding and compressibility advantages of a