Not quite a ring of power, but Hobbit can take some of the pain out of machine, network and services management.
Hobbit is a tool for monitoring servers, applications, and networks, inspired by and compatible with Big Brother, but with the advantage of being open source and under active development. You install Hobbit on a central server, and a client for Hobbit on all the machines you want to monitor.
The central server collects information about the status of your computers and the applications on them, and displays it via an automatically updating Web page. You can also set up email (or text message) alerts for particular situations.
It can monitor various services (SSH, FTP, HTTP…), connectivity, disk space, CPU usage, Web sites, and a variety of other aspects of your machines and network. It’s very configurable, extendable, easy to use, and can handle large numbers of machines.
In this column, I’ll provide a quick overview of how to set it up, configure it, and use alerts.
First, identify the machine you wish to use as a central server — here I use hobbitserver.example.com. You’ll need to install a basic Apache setup (or have one already available). Hobbit is available as a Debian package for Debian or Ubuntu, or as an RPM Fedora Core 5, so should be very straightforward to install. (Or you can of course install from source if you prefer.)
All packages are available from the project’s SourceForge page, or possibly via your distribution’s repositories.
For the most basic setup (monitoring only itself), you only need to edit a single file, /etc/hobbit/bb-hosts, and update the Web server.
First, check that bb-hosts includes this line:
127.0.0.1 localhost # bbd http://localhost/ conn
Secondly, you’ll find instructions on how to set up Apache in /etc/hobbit/hobbit-apache.conf. Add the configuration as directed in this file, and reload your Web server.
Now restart the server (
/etc/init.d/hobbit restart). Wait for a few seconds — maybe as much as a minute — and then go to the Webpage at http://localhost/hobbit/ or http://hobbitserver.example.com/hobbit/ and you should see the single line entry for localhost. It should show the Hobbit server running OK, and will also have information about disk usage, CPU usage, and connection, among other things.
There are several possible statuses. Green means that status is OK; yellow is a warning; red means critical. Clear means that there is no data, purple that there is no report. Blue means that the monitoring aspect is disabled.
Next, try a test client. Install the Hobbit client software (see the sourceforge page for packages) on the test client. Next add your test client machine to the configuration, by editing the file /etc/hobbit/bb-hosts on the hobbit server again, and adding a similar line to the localhost one, but with the appropriate IP address and machine name.
You shouldn’t need to restart hobbit on the server, although obviously you will need to make sure that it is running on the client. Again, wait for a minute or so, then refresh the Web page, and the client will appear.
More Advanced Configuration
Using Hobbit, you can choose what services you check. For example, if your client webserver.example.com is running a Web site at http://www.example.com, you can add that to the services to check by editing the relevant line in the bb-hosts configuration file as follows:
192.0.2.25 webserver # conn http://www.example.com
You can list as many sites as you like, separated by a space. You can also monitor SSH and NFS:
192.0.2.30 nfsserver # conn ssh rpc=mount,nlockmgr,nfs
And various other services, including FTP, Telnet, SMTP, POP3, IMAP, spamd, and SSL-enabled services. You’ll find a full list on the Hobbit Web site. You can monitor any of these services by simply adding them to the relevant client line in bb-hosts on the server running Hobbit.
Specifying danger levels
You can specify the boundaries for alerts — what is defined as “warn” (yellow) or “panic” (red) — for individual machines, or change the defaults. You do this by editing /etc/hobbit/hobbit-clients.cfg. Here’s a sample:
DISK /data IGNORE
LOAD 5.0 10.0
DISK / 90 95
DISK %^/local.* 99 99.5
MEMPHYS 100 101
MEMSWAP 50 80
MEMACT 90 97
LOG %.* %WARNING %COLOR=yellow
Note that the default section must be the last one in the config file. The first section tells hobbit to ignore a particular disk — /data1 on client1. (Perhaps this data disk is known to be full and we don’t care about it.)
The next section sets various defaults:
- UP: sets “warn” status if the system has been up for less than 1 hour. If you add a second number, it will set “warn” status if the system has been up for longer than that.
- LOAD: warn if CPU usage is above 5.0, panic if it is above 10.0.
- DISK: specify a regular expression for the disk name, noting that any such regexp must begin with %. Again, panic and warn levels are set (for percentage disk use).
- The various memory options are taken from proc.
- LOG: The indicator will go yellow if a line in any log file contains WARNING.
You can alter these as you prefer. You’ll see a handful of other options, documentation of which is at the top of the hobbit-clients.cfg file.
To monitor logfiles, you need both the section in hobbit-clients.cfg, as above, and a section in /etc/hobbit/client-local.cfg (on the Hobbit server). The clients will pick up the configuration corresponding to their hostname or operating system from client-local.cfg when they connect to the server.
The default for Linux clients is just to check /var/log/messages, and to ignore lines beginning with
MARK. You may also want to add /var/log/syslog, and to ignore cronjob lines in that (cronjobs should email you if they go wrong, and there are often a great many cron lines in syslog!). The following section of client-local.cfg should do this:
The number at the end of the line is a maximum number of bytes that the client will send — it also only sends the last 30 minutes’ worth of data, so this is a suitable upper bound.
Email and Other Alerts
One of the most useful things about Hobbit is that you can configure it to send you email alerts in certain circumstances. To do this, you edit the /etc/hobbit/hobbit-alerts.cfg configuration file.
There are numerous available options to configure your alerts. For example, the following setup would email you if any machine (again, the regular expression must start with
%) had been unavailable for 30 minutes (
DURATION), would then repeat every 24 hours (1400 minutes) thereafter, and would email you on recovery (
MAIL firstname.lastname@example.org DURATION>30 REPEAT=1440 RECOVERED
You can also set more than one alert. For example, you might want to email a different address (your own home address, or the on-call address), if the service went down out of hours:
MAIL email@example.com DURATION>30 REPEAT=1440 RECOVERED
MAIL firstname.lastname@example.org DURATION>30 REPEAT=1440 RECOVERED TIME=*:1800:0800
Or for a particular host, there may be different people to mail in different situations, in which case the
SERVICE can be specified in the
MAIL email@example.com SERVICE=disk,ssh DURATION>30 REPEAT=1h RECOVERED
MAIL firstname.lastname@example.org SERVICE=http DURATION>30 REPEAT=1h RECOVERED
Here the repeat would be every hour. You can also use the
SCRIPT keyword instead of
MAIL, to run a particular script if there is a problem. For example, if monitoring an email service, sending an email in case of problems might not make a lot of sense! Hobbit provides various environment variables which will be passed to the script — see the documentation for further information.
This file will also utilize variables. So you can set up a default alert setting, and then specify that using that variable thereafter. To do this using the first example above:
$MAILADMIN=MAIL email@example.com REPEAT=1440 RECOVERED
Finally, you can exclude particular machines if you want. For example if you have a Windows machine that isn’t running ssh, you can avoid being warned that this is the case:
IGNORE keyword stops processing that section altogether at the point where it appears — so it must occur before any alerts to work properly.
Hobbit makes monitoring your services and network much more straightforward, and is a more manageable solution than the clutch of homebrew scripts that many sysadmins employ! In particular, the configurability of the alert system is excellent and makes it much easier to react in a timely fashion when problems occur. Given how easy it is to get running, it’s well worth taking the time to experiment with it.
has been playing with Linux systems for around 6 years now, after discovering that it was an excellent way to avoid Finals revision. She is currently sysadmin for the Astrophysics group at Imperial College, in London (UK), and is responsible for wrangling a Linux+Solaris network and its users.