However, there is no requirement that one store multiple
copies in the exact same way. C-Store allows redundant
objects to be stored in different sort orders providing
higher retrieval performance in addition to high
availability. In general, storing overlapping projections
further improves performance, as long as redundancy is
crafted so that all data can be accessed even if one of the
G sites fails. We call a system that tolerates K failures Ksafe.
C-Store will be configurable to support a range of
values of K.
It is clearly essential to perform transactional updates,
even in a read-mostly environment. Warehouses have a
need to perform on-line updates to correct errors. As well,
there is an increasing push toward real-time warehouses,
where the delay to data visibility shrinks toward zero. The
ultimate desire is on-line update to data warehouses.
Obviously, in read-mostly worlds like CRM, one needs to
perform general on-line updates.
There is a tension between providing updates and
optimizing data structures for reading. For example, in
KDB and Addamark, columns of data are maintained in
entry sequence order. This allows efficient insertion of
new data items, either in batch or transactionally, at the
end of the column. However, the cost is a less-thanoptimal
retrieval structure, because most query workloads
will run faster with the data in some other order.
However, storing columns in non-entry sequence will
make insertions very difficult and expensive.
C-Store approaches this dilemma from a fresh
perspective. Specifically, we combine in a single piece of
system software, both a read-optimized column store and
an update/insert-oriented writeable store, connected by a
tuple mover, as noted in Figure 1. At the top level, there
is a small Writeable Store (WS) component, which is
architected to support high performance inserts and
updates. There is also a much larger component called the
Read-optimized Store (RS), which is capable of
supporting very large amounts of information. RS, as the
name implies, is optimized for read and supports only a
very restricted form of insert, namely the batch movement
of records from WS to RS, a task that is performed by the
tuple mover of Figure 1.