In most sites, some systems or sets of systems must be available for other
systems to shut down or to boot cleanly. A machine that tries to boot when
machines and services that it relies on are not available will fail to boot
properly. Typically, the machine will boot but will fail to run some of the
programs that it usually runs on start-up. These programs might be services
that others rely on or programs that run locally on someone’s desktop. In
either case, the machine will not work properly, and it may not be apparent
why. When shutting down a machine, it may need to contact file servers,
license servers, or database servers that are in use in order to properly terminate
the link. If the machine cannot contact those servers, it may hang
for a long time or indefinitely, trying to contact those servers before completing
the shutdown process. It is important to understand and track machine
dependencies during boot-up and shutdown. You do not want to have
to figure it out for the first time when a machine room unexpectedly loses
power
The most critical systems, such as console servers, authentication servers,
name-service machines, license servers, application servers, and data servers,
typically need to be booted before compute servers and desktops. There also
will be dependencies between the critical servers. It is vital to maintain a
boot-sequence list for all data center machines, with one or more machines
at each stage, as appropriate. Typically, the first couple of stages will have
few machines, maybe only one machine in them, but later stages will have
many machines. All data center machines should be booted before any nondata-
center machines, because no machine in a data center should rely on any
machine outside that data center (see Section 5.1.7).
One site created the shutdown/boot list as shown in Table 20.2. The
shutdown sequence is typically very close to, if not exactly the same as, the
reverse of the boot sequence. There may be one or two minor differences.
The shutdown sequence is a vital component to starting work at the
beginning of the maintenance window. The machines operated on at the start
of the maintenance window typically have the most dependencies on them, so