Full exploitation of a cluster hardware configuration requires some enhancements
to a single-system operating system.
FAILURE MANAGEMENT How failures are managed by a cluster depends on the
clustering method used (Table 17.2). In general, two approaches can be taken to
dealing with failures: highly available clusters and fault-tolerant clusters. A highly
available cluster offers a high probability that all resources will be in service. If a
failure occurs, such as a system goes down or a disk volume is lost, then the queries
in progress are lost. Any lost query, if retried, will be serviced by a different computer
in the cluster. However, the cluster operating system makes no guarantee
about the state of partially executed transactions.This would need to be handled at
the application level.
A fault-tolerant cluster ensures that all resources are always available. This is
achieved by the use of redundant shared disks and mechanisms for backing out uncommitted
transactions and committing completed transactions.
The function of switching applications and data resources over from a failed
system to an alternative system in the cluster is referred to as failover. A related
function is the restoration of applications and data resources to the original system
once it has been fixed; this is referred to as failback. Failback can be automated, but
this is desirable only if the problem is truly fixed and unlikely to recur. If not, automatic
failback can cause subsequently failed resources to bounce back and forth between
computers, resulting in performance and recovery problems.
LOAD BALANCING A cluster requires an effective capability for balancing the load
among available computers.This includes the requirement that the cluster be incrementally
scalable.When a new computer is added to the cluster, the load-balancing
facility should automatically include this computer in scheduling applications. Middleware
mechanisms need to recognize that services can appear on different members
of the cluster and may migrate from one member to another.
PARALLELIZING COMPUTATION In some cases, effective use of a cluster requires
executing software from a single application in parallel. [KAPP00] lists three general
approaches to the problem:
• Parallelizing compiler: A parallelizing compiler determines, at compile time,
which parts of an application can be executed in parallel. These are then split
off to be assigned to different computers in the cluster. Performance depends
on the nature of the problem and how well the compiler is designed. In general,
such compilers are difficult to develop.
• Parallelized application: In this approach, the programmer writes the application
from the outset to run on a cluster, and uses message passing to
move data, as required, between cluster nodes. This places a high burden
on the programmer but may be the best approach for exploiting clusters
for some applications.