The load utilities for data warehouses have to deal with much
larger data volumes than for operational databases. There is
only a small time window (usually at night) when the
warehouse can be taken offline to refresh it. Sequential loads
can take a very long time, e.g., loading a terabyte of data can
take weeks and months! Hence, pipelined and partitioned
parallelism are typically exploited 6. Doing a full load has the
advantage that it can be treated as a long batch transaction
that builds up a new database. While it is in progress, the
current database can still support queries; when the load
transaction commits, the current database is replaced with the
new one. Using periodic checkpoints ensures that if a failure
occurs during the load, the process can restart from the last
checkpoint.