Beyond Redundancy: Applying High Availability Throughout Your Organization
Monitoring, recovery, and precise logs can often do more to reduce your number of outages and limit the scope of system failures than the typical panacea more hardware, more software, and more hot spares.
Tuesday, September 4th, 2007
Technical types are inclined to improve availability by applying technology: more hardware, more software, and more hot spares. But as Linux escapes the lab and moves into the machine room and increasingly the corner office, monitoring, recovery, and precise logs can often do more to reduce the number of outages, shorten the duration of each outage, and limit the scope of failures. Moreover, well-planned responses to IT “events” can significantly improve availability.
In many ways, an IT staff is like a fire department: all things are quiet until a call comes in, and then everyone leaps into action. And while a server crash or network hiccup isn’t as life threatening or dangerous as a house fire, an outage can still translate to dire consequences.
Let’s define an event as as anything out-of-the-ordinary that occurs in IT and that potentially or actually causes an outage. By this definition, any outage clearly involves one or more events. An event also occurs when the filesystem fills up or CPU utilization exceeds some threshold. The loss of a redundant component is also an event, even if it doesn’t cause a service outage, since it may have a potential impact, such as reducing your infrastructure to a single point of failure or impaired capacity.
If you can catalog the types of events that may occur in your organization and how each one may impact your IT service, you can define the right approach to monitoring. You can also ensure that you plan appropriately for how…
Please log in to view this content.
Read More
- Got Security? You're in Denial
- KDE 4.4: Does It Work Yet?
- Writing Custom Nagios Plugins with Python
- Power Up Linux GUI Apps
- Tweeting from the Command Line with Twyt
Rackspace
|