If it isn’t monitored, it isn’t a service. Although we cover monitoring extensively
in Chapter 22, it’s worth noting here some special requirements for
monitoring storage service.
A large part of being able to respond to your customers’ needs is building
an accurate model of the state of your storage systems. For each storage server,
you need to know how much space is used, how much is available, and how
much more the customer anticipates using in the next planning time frame. Set
up historical monitoring so that you can see the level of change in usage over
time, and get in the habit of tracking it regularly. Monitor storage-access traffic,
such as local read/write operations or network file access packets, to build
up a model that lets you evaluate performance. You can use this information
proactively to prevent problems and to plan for future upgrades and changes.
Seeing monitoring data on a per volume basis is typical and most easily
supported by many monitoring tools. Seeing the same data by customer group
allows SAs to do a better job of giving each group individualized attention
and allows customers to monitor their own usage.
In addition to notifications about outages or system/service errors, you
should be alerted to such events as a storage volume reaching a certain percentage
of utilization or spikes or troughs in data transfers or in network
response. Monitoring CPU usage on a dedicated file server can be extremely
useful, as one sign of file services problems or out-of-control clients is an
ever-climbing CPU usage. With per group statistics, notifications can be sent
directly to the affected customers, who can then do a better job of selfmanaging
their usage. Some people prefer to be nagged over strictly enforced
space quotas.
By implementing notification scripts with different recipients, you can
emulate having hard and soft quotas. When the volume reaches, for instance,
70 percent full, the script could notify the group or department email alias
containing the customers of that volume. If the volume continues to fill up and
reaches 80 percent full, perhaps the next notification goes to the group’s manager,
to enforce the cleanup request. It might also be copied to the helpdesk
or ticket alias so that the site’s administrators know that there might be a
request for more storage in the near future.
To summarize, we recommend you monitor the following list of storagerelated
items: