I manage a whole number of device and servers, which are monitored by various utilities, including Nagios. I also have clients who do the same, as well as using other tools that produce notifications – build systems etc.

Nagios is the thing that tells me when my web server is unavailable, or the database has fallen over, or, more often, when my internet connection dies. I have similar setups in various client networks that I maintain.

It logs to the system log, sends me emails, and in some urgent cases, sends a ping to my phone. All very handy, but isn’t very handy for other casual users who may just want to see if things are running properly. For those users, who are somewhat non-technical, it’s a bit much to ask them to read logs, and emails often get lost.

For one of my clients we had a need to be able to collect these status updates from different sources together, make it more persistent, and make it visible in a much more accessible way than log messages (which has a very poor signal to noise ratio) or email alerts (which only go to specific people).

“Known” issues

A solution I came up with was to create a Known site for the network which can be used to log these notifications in a user friendly, chronological and searchable form.

I created an account for my Nagios process, and then, using my Known command line tools, I extended the Nagios script to use my Known site as a notification mechanism.

In commands.cfg:

Then in conf.d/contacts.cfg I extended my “Root” contact:

Finally, the script itself, which serves as a wrapper around the api tools and sets the appropriate path etc:

Consolidating rich logs

Of course, this is only just the beginning of what’s possible.

For my client, I’ve already modified their build system to post on successful builds, or build errors, with a link to the appropriate logs. This particular client was already using Known for internal communication, so this improvement was logical.

The rich content types that Known supports also raises the possibility of richer logging from a number of devices, here’s a few thoughts of some things I’ve got on my list to play with:

  • Post an image to the channel when motion is detected by a webcam pointed at the bird feeders (again, trivial to hook up – the software triggers a script when motion is detected, and all I have to do is take the resultant image and CURL it to the API)
  • Post an audio message when a voicemail is left (although that’d require me to actually set up asterisk, which has been on my list for a while now)
  • Attach debugging info & a core dump to automated test results

I might get to those at some point, but I guess my point is that APIs are cool.

8 thoughts on “Tracking system events with Nagios and @withknown

Leave a Reply