Compressing data using column-oriented compression algorithms
and keeping data in this compressed format as it is operated upon
has been shown to improve query performance by up to an order
of magnitude [4]. Intuitively, data stored in columns is more
compressible than data stored in rows. Compression algorithms
perform better on data with low information entropy (high data
value locality). Take, for example, a database table containing information
about customers (name, phone number, e-mail address,
snail-mail address, etc.). Storing data in columns allows all of the
names to be stored together, all of the phone numbers together,
etc. Certainly phone numbers are more similar to each other than
surrounding text fields like e-mail addresses or names. Further,
if the data is sorted by one of the columns, that column will be
super-compressible (for example, runs of the same value can be
run-length encoded).