Expiring data means deleting it. One might decide that data older than
2 years does not need to be retained at all. Alternatively, one might archive
such data to removable media—DVD or tape—in case it is ever needed.
Limiting disk space consumption by condensing the data or expiring it
affects the level of detail or historical perspective you can provide. Bear this
trade-off in mind as you look for a system for your historical data collection.
How you intend to use the data that you gather from the historical monitoring
will help to determine what level of detail you need to keep and for
how long. For example, if you are using the data for usage-based billing and
you bill monthly, you will want to keep complete details for a few years, in
case there is a customer complaint. You may then archive the data and expire
the online detailed data but save the graphs to provide online access for your
customers to reference. Alternatively, if you are simply using the graphs internally
for observing trends and predicting capacity needs, you might want
a system that keeps complete data for the past 48 hours, reasonably detailed
information for the past 2 weeks, somewhat less detailed information for the
past 2 months, and very condensed data for the previous 2 years, with everything
older than 2 years being discarded. Consider what you are going to use
the data for and how much space you can use when deciding on how much
to condense the data. Ideally, the amount of condensing that the system does
and the expiration time of the data should be configurable.
You also need to consider how the monitoring system gathers its data.
Typically, a system that performs historical data collection will want to poll
the systems that it monitors at regular intervals. Ideally, the polling interval
should be configurable. The polling mechanism should be able to use a standard
form of communication, such as SNMPv2, as well as the usual IP mechanisms,
such as Internet control message protocol (ICMP) echoes (pings) and
opening TCP connections on any port, sending some specific data down that
connection and checking the response received by using pattern matching. It
is also useful to have a monitoring system that records latency information,
or how long a transaction took. The latency correlates well to the end users’
experiences. Having a service that responds very slowly is practically the same
as having one that doesn’t respond at all. The monitoring system should support
as many other polling mechanisms as possible, preferably incorporating
a mechanism to feed in data from any source and parse the results from that
query. The ability to add your own tests is important, especially in highly
customized environments. On the other hand, a multitude of predefined tests
is also valuable, so that you do not need to write everything from scratch.