If you found out you had to power off an entire data center, do a lot of
maintenance, then bring it all back up, would you know how to manage the
event? Some companies are lucky enough to be able to do this every quarter
or once a year. SAs delay tasks that require interruption of service, such as
hardware upgrades, parts replacement, or network changes, until this window.
Sometimes a weekly timeslot is allocated for major and risky changes to
consolidate downtime to a specific time when customers will be least affected.
Other times we are forced to do this because of physical maintenance such
as construction, power or cooling upgrades, or office moves. Other times we
need to do this for emergency reasons, such as a failing cooling system. This
chapter describes as a technique for managing such major planned outages.
Along the way will be tips useful in less dramatic settings. Projects like this require
more planning, more orderly execution, and considerably more testing.
We call this the flight director technique, named after the role of the flight
director in NASA space launches.1