An important part of understanding the performance of advanced storage
arrays is to look at how they manage a data pipeline. The term refers to
preloading into memory items that might be needed next so that access times
are minimized. CPU chip sets that are advertised as including L2 cache include
extra memory to pipeline data and instructions, which is why, for some
CPU-intensive jobs, a Pentium III with a large L2 cache could outperform a
Pentium IV, all other things being equal.
Pipelining algorithms are extensively implemented in many components
of modern storage hardware, especially the HBA but also in the drive controller.
These algorithms may be dumb or smart. A so-called dumb algorithm
has the controller simply read blocks physically located near the requested
blocks, on the assumption that the next set of blocks that are part of the
same request will be those blocks. This tends to be a good assumption, unless
a disk is badly fragmented. A smart pipelining algorithm may be able to
access the filesystem information and preread blocks that make up the next
part of the file, whether they are nearby or not. Note that for some storage
systems, “nearby” may not mean physically near the other blocks on the disk
but rather logically near them. Blocks in the same cylinder are not physically
nearby, but are logically nearby for example.
Although the combination of OS-level caching and pipelining is excellent
for reading data, writing data is a more complex process. Operating systems
are generally designed to ensure that data writes are atomic, or at least as
much as possible, given the actual hardware constraints. Atomic, in this case
means “in one piece.” Atoms were named that before people understood
that there were such things as subatomic physics, with protons, electrons,
neutrons, and such. People thought of an atom as the smallest bit of matter,
which could not be subdivided further.
This analogy may seem odd, but in fact it’s quite relevant. Just as atoms
are made up of protons, neutrons, and electrons, a single write operation can
involve a lot of steps. It’s important that the operating system not record the
write operation as complete until all the steps have completed. This means
waiting until the physical hardware sends an acknowledgment, or ACK, that
the write occurred.
One optimization is to ACK the write immediately, even though the data
hasn’t been safely stored on disk. That’s risky, but there are some ways to make
it safer. One is to do this only for data blocks, not for directory information
and other blocks that would corrupt the file system. (We don’t recommend
this, but it is an option on some systems.) Another way is to keep the data
to be written in RAM that, with the help of a battery, survives reboots. Then
the ACK can be done as soon as the write is safely stored in that special
RAM. In that case, it is important that the pending blocks be written before
the RAM is removed. Tom moved such a device to a different computer, not
realizing that it was full of pending writes. Once the new computer booted up,
the pending writes wrote onto the unsuspecting disk of the new system, which
was then corrupted badly. Another type of failure might involve the hardware
itself. A failed battery that goes undetected can be a disaster after the next
power failure.