Another issue in the choice of RAID implementations is at the level of hardware.
RAID can be implemented with no change at the hardware level, using only software
modification. Such RAID implementations are called software RAID. However,
there are significant benefits to be had by building special-purpose hardware
to support RAID,which we outline below; systemswith special hardware support
are called hardware RAID systems.
Hardware RAID implementations can use nonvolatile RAM to record writes
before they are performed. In case of power failure, when the system comes back
up, it retrieves information about any incomplete writes from nonvolatile RAM
and then completes thewrites.Without such hardware support, extra work needs
to be done to detect blocks that may have been partially written before power
failure (see Practice Exercise 10.3).
Even if all writes are completed properly, there is a small chance of a sector
in a disk becoming unreadable at some point, even though it was successfully
written earlier. Reasons for loss of data on individual sectors could range from
manufacturing defects, to data corruption on a track when an adjacent track
is written repeatedly. Such loss of data that were successfully written earlier is
sometimes referred to as a latent failure, or as bit rot.When such a failure happens,
if it is detected early the data can be recovered from the remaining disks in the
RAID organization. However, if such a failure remains undetected, a single disk
failure could lead to data loss if a sector in one of the other disks has a latent
failure.
To minimize the chance of such data loss, good RAID controllers perform
scrubbing; that is, during periods when disks are idle, every sector of every disk
is read, and if any sector is found to be unreadable, the data are recovered from
the remaining disks in the RAID organization, and the sector is written back. (If
the physical sector is damaged, the disk controllerwould remap the logical sector
address to a different physical sector on disk.)