It is highly likely that at some point, you will be asked to keep service available
for a particular group during a maintenance window. It may be something
unforeseen, such as a newly discovered bug that engineering needs to work
on all weekend, or it may be a new mode of operation for a division, such
as customer support switching to 24/7 service and needing continuous access
to its systems to meet its contracts. Internet services, remote access, global
networks, and new-business pressure reduce the likelihood that a full and
complete outage will be permitted.
Planning for this requirement could involve rearchitecting some services
or introducing added layers of redundancy to the system. It may involve
making groups more autonomous or distinct from one another. Making
these changes to your network can be significant tasks by themselves,
likely requiring their own maintenance window; it is best to be prepared
for these requests before they arrive, or you may be left without time to
prepare.
To approach this task, find out what the customers will need to be
able to do during the maintenance window. Ask a lot of questions, and
use your knowledge of the systems to translate these needs into a set of
service-availability requirements. For example, customers will almost certainly
need name service and authentication service. They may need to be
able to print to specific printers and to exchange email within the company
or with customers. They may require access to services across widearea
connections or across the Internet. They may need to use particular
databases; find out what those machines depend on. Look at ways to make
the database machines redundant so that they can also be properly maintained
without loss of service. Make sure that the services they depend on
are redundant. Identify what pieces of the network must be available for
the services to work. Look at ways to reduce the number of networks that
must be available by reducing the number of networks that the group uses
and locating redundant name servers, authentication servers, and print
servers on the group’s networks. Find out whether small outages are acceptable,
such as a couple of 10-minute outages for reloading network
equipment. If not, the company needs to invest in redundant network
equipment.
Devise a detailed availability plan that describes exactly what services
and components must be available to that group. Try to simplify it by consolidating
the network topology and introducing redundant systems for those
networks. Incorporate availability planning into the master plan by ensuring
that redundant servers are not down simultaneously