The majority of web servers retain a vast amount of data about their visitors in the form of log files. Other processes running on the server, like the system log, MTA log, etc, also store a raft of information.

These logs are typically retained (although often rotated at regular intervals to save space) basically until the admin is looking to reclaim some disk space or the server is reinstalled, so, from a practical standpoint that’s “forever”. This is very much part of the tech industry’s dataholic “collect everything” culture, which I’m personally trying to wean myself off of.

Thing is, at first glance, retention seems like such a good idea (and limited retention can be, more on that later). You need logs to find out how your server is performing, and what if something goes wrong? However, they’re mostly just noise, and they go stale very quickly… when was the last time you needed to look at a 4 month old apache log file?

The reality is that the vast majority of the time you’re only really interested in the last couple of lines. Why keep the rest?

What question are you trying to answer?

Log files have there use; they are invaluable to diagnose specific and immediate problems along the lines of “My web site keeps giving me a white page!”, or “Why on earth won’t my firewall start?”, or “What was the last thing Apache did before it crashed?”.

However, to answer the perhaps more useful questions like, “Am I seeing increased traffic?” or “Are my hard drives healthy?”, or even esoteric questions like, “Did spring cleaning my server save me money?“, your raw logs really aren’t going to be much use to you.

To answer the questions you’re really interested in, you’re going to have to cook this data into something tasty.

What I do…

This is the approach I’m currently using for myself, and which I been recommending to my clients. Obviously you need to adjust this based on specific requirements, for example, one client I had in the past had a legal requirement to retain all logs off line (of course nobody ever looked at them but rules is rules).

  1. Retain raw logs for a day: keep your raw logs for a short period of time, this will let you get at the raw text of any error messages should anything on your server die.
  2. Run an infrastructure monitoring tool: instead of keeping raw logs, what you should be keeping is the higher level statistical information that is produced by analysing your logs (and other sources) produced by a tool like munin. These results have all the noise (and any sensitive information) removed, and are far better at helping you diagnose problems.

Using this approach I was in the past able to, among many other things:

  • Spot a failing hard drive on a customer’s server before it became a problem (because over time the frequency of errors on that specific drive was increasing).
  • Optimise caches within a feedback loop (I could track configuration changes with a corresponding increase or decrease in cached pages served).
  • Isolate the cause of an intermittent failure on a client site (by seeing what the server was doing at the time of the outage, I could see that the mysql query cache was becoming full causing queries to run slowly and apache to block).
  • Link an increasing number of errors back to a configuration change made months ago (I had logged the time and date of the config change, and could look back at my graphs to see that I first started seeing problems after this time. Reverted the change and everything was a-ok).
  • …etc…

In each case the information was in the raw logs, but good luck trying to find it.

There are many tools out there that can help you, but the basic principle is the same – process your logs into a more usable statistical form from which it’s easy to gain insights from, and ditch the unnecessary raw logs which are mostly noise.

graphSo, last week I wrote about reading data off of a Current Cost EnviR. As you might have predicted from my blog recently, sure as day follows night, I wrote a munin plugin for it.

This plugin, written in python, allows you to graph both the energy usage and temperature over time. This is handy, since this information gives you a much more detailed historic record than the limited and low resolution bar charts on the device itself, giving a much more detailed picture of your household’s power usage.

Usage details are in the project Readme, code is in the usual place.

» Visit the project on Github…

BE-UnlimitedOk, so some of the regular readers of this blog will sense a bit of a theme with my recent posts, and get the feeling that I’m essentially trying to graph the world.

Guilty.

Anyway, I get my home internet through Be. The ADSL line around here was a little flakey some time ago, and after going through the 3rd splitter in as many weeks I got a BT engineer out to sort out the local switching equipment. Things are working fine and dandy now, however I thought it would be cool to keep an eye on the ADSL modem’s key stats, just in case I had any more problems.

Getting started

In the 3rd party contributor repository, there is a plugin called BeBoxSync, written by Alex Dekker. The documentation for this plugin appears to have only existed on his website, which appears to no longer be available and has expired from google cache.

The plugin consists of two perl scripts which use an expect script to pull information from the ADSL modem via telnet. These original scripts may work out of the box for you, however for my BeBox (Thomson TG585v7 modem running software version 8.2.7.7), I needed to make some changes to get it to work. Basically, I needed to get the expect script to probe for extended information (which is no longer provided by the adsl info command), and to look in a slightly different part of the output for some required data.

My modifications are available on github, and the originals are here. I’d suggest you try my version first, as the originals haven’t been maintained for a fairly long time.

Installation

First, you must install expect. Expect is a little tool that lets you script interactive sessions like telnet. It is quite often installed on default installations, but is considered rather oldskool, so may not be (it wasn’t on my Debian 6 server)…

apt-get install expect

Next, after you have downloaded the scripts to somewhere sensible, you will need to make the following modifications:

  • Edit beboxstats.expect and enter the IP address of your modem and your administrator password in the appropriate places.
  • Edit beboxstats and enter the absolute path to the beboxstats.expect script.
  • Edit beboxsync and do the same.
  • Ensure all three scripts are executable by the munin user

You can test things are working by executing each script from the terminal. You should see a whole bunch of data about your modem when executing the expect script, and the values for each key field listed in munin plugin format when executing the munin scripts. Check these values against the values on your modem’s stats page (default: http://192.168.1.254/cgi/b/dsl/dt/?be=0&l0=1&l1=0) to verify that you are getting the correct values reported.

Finally, link to your scripts from within your munin plugins directory in the usual way. If things are working, you should see some new graphs appear in your “network” section.

beboxsync-day beboxstats-day

As you can see from these images, I monitored a sharp drop in line quality with a corresponding drop in bandwidth. Still investigating the cause…