C-Store operators have the capability to operate on
both compressed and uncompressed input. As will be
shown in Section 9, the ability to process compressed data
is the key to the performance benefits of C-Store. An
operator’s execution cost (both in terms of I/O and
memory buffer requirements) is dependent on the
compression type of the input. For example, a Select
over Type 2 data (foreign order/few values, stored as a
delta-encoded bitmaps, with one bitmap per value) can be
performed by reading only those bitmaps from disk whose
values match the predicate (despite the column itself not
being sorted). However, operators that take Type 2 data as
input require much larger memory buffer space (one page
of memory for each possible value in the column) than
any of the other three types of compression. Thus, the cost
model must be sensitive to the representations of input and
output columns.
The major optimizer decision is which set of
projections to use for a given query. Obviously, it will be
time consuming to construct a plan for each possibility,
and then select the best one. Our focus will be on pruning
this search space. In addition, the optimizer must decide
where in the plan to mask a projection according to a
bitstring. For example, in some cases it is desirable to
push the Mask early in the plan (e.g, to avoid producing a
bitstring while performing selection over Type 2
compressed data) while in other cases it is best to delay
masking until a point where it is possible to feed a
bitstring to the next operator in the plan (e.g., COUNT) that
can produce results solely by processing the bitstring.