A real-time monitoring system tells you that a host is down, a service is not
responding, or some other problem has arisen. A real-time monitoring system
should be able to monitor everything you can think of that can indicate a
problem. The system should be able both to poll systems and applications
for status and to receive alerts directly from those systems if they detect a
problem at any time. As with historical monitoring, the system should be
able to use standard mechanisms, such as SNMPv2 polling, SNMPv2 traps,
ICMP pings, and TCP, as well as to provide a mechanism for incorporating
other forms of monitoring.
The system also should be capable of sending alerts to multiple recipients,
using a variety of mechanisms, such as email, paging, telephone, and opening
trouble tickets. Alerts should go to multiple recipients because an alert sent to
one person could fail if that person’s pager or phone has died or if the person
is busy or distracted with something else.
The storage requirements of a real-time monitoring system are minimal.
Usually, it stores the previous result of each query and the length of time
since the last status change. Sometimes, the system stores running averages
or high and low watermarks, but it rarely stores more than that. Unlike
historical monitoring, which is used for proactive system administration, realtime
monitoring is used to improve reactive system administration.
When evaluating a monitoring system, look at the things that it can monitor
natively to see how well it matches your needs. You should be considering
monitoring both availability and capacity. Availability monitoring means
detecting failures of hosts, applications, network devices, other devices, network
interfaces, or connections of any kind. Capacity monitoring means
detecting when some component of your infrastructure becomes, or is about