Further to my last post, I’d like to introduce you to a working implementation of the stats gathering mechanism using Etsy StatsD and NodeJS.

StatsD is a Node.JS stats server created by the people at etsy to provide a simple way of logging useful statistics from software. These statistics are an invaluable way of monitoring the performance of your application, monitoring the performance of software changes and diagnosing faults.

This plugin gives you an overview of what is happening in your Known install by logging important system level things – events, errors, exceptions etc. This lets you get a very clear idea of how your Known network is performing, and quickly see the effect that changes have on your users.

Installation

  • Install Node.JS, either from github or the package manager for your OS
  • Install StatsD
  • Not required, but highly recommended, install a Graphite server for graph visualisation
  • Place this plugin in IdnoPlugins/StatsD
  • Add the following to your config.ini

Optionally, you can specify one or more of the following extra options, (although the defaults are usually ok):

statsd_samplerate is handy on really busy systems (see Statsd’s notes on the subject), but in a nutshell, setting this to something like 0.1 (capture one in every 10 count or timer events) is handy if you find StatsD being overloaded.

If everything is working, you should now be happily graphing some useful stats.

» Visit the project on Github...

I have recently been doing a lot of development work for a very large Known installation. This installation is highly customised, has many active users, all doing unexpected and creative things with the platform, and makes use of many of Known’s more advanced features in often quite unexpected ways.

As with everything built by mankind, sometimes things go wrong, which is especially true with something as complicated as software. Simply waiting for a user to report a fault, and for that report to bubble up through IT/management, is poor customer service. The time between the fault being found, and the report being received, is often measured in weeks, and is often misleading/missing crucial information, leading to more time spent clarifying the fault.

So, recently, I’ve been exploring a number of ways to handle any issues in a much more proactive way, and to collect objective data, rather than subjective fault reports.

Crash reports

One thing I’ve been exploring is a simple mechanism whereby Known will send an email to one or more addresses when a fatal error or exception occurs. This email contains the details of the error, as well as who was logged in at the time.

You can try this for yourself if you’re tracking Known’s master branch by adding the following to your config.ini:

This is great for when something blows up, but often problems are much more subtle than that. For example, what happens if a change you’ve made causes an increase in page load for certain users? How would you track back and find out when it started, and what was the change that might have caused it?

Running stats and health metrics

In the latest master build, I’ve added a mechanism to start collecting useful metrics of a running system – page build time, events tracking, instances of errors, etc – which when properly analysed will give a much more useful overview of the general health of a running system.

I’m a big fan of graphs over logs for this sort of thing.

Currently the stats handler is a dummy which throws this information away, however it’d be a simple matter to extend this functionality to use something RRDTool or StatsD, with Graphite over the top to generate the graphs.

Recording your stats

If you’re a plugin writer, you can push your own statistics using the same tool, e.g.:

Give it a play!

Sometimes it is desirable to execute actions in the background and periodic intervals. Building on from last week’s post, I wanted to spotlight a new feature, which uses the asynchronous event queue, to allow you to do this – the periodic execution (cron) service.

After completing the configuration step for enabling the Asynchronous Event Queue, you can then run the Known console periodic execution service:

Once running, this service will periodically trigger an event to which code can listen to. Available events are cron/minute, cron/hourly and cron/daily.