Two factor authentication (also known as 2FA), is a mechanism to provide extra security to website accounts by requiring a special one time use code, in addition to a user name and password.

This code is typically generated by a hardware dongle or your phone, meaning that you must not only know the password, but also physically have the code generator.

I thought it would be cool if Known had this capability, and so I wrote a plugin to implement it!

How it works

Once the plugin is installed and activated by the admin user, each user will be able to enable two step authentication through a menu on their settings page.

Enabling two factor will generate a special code, which can be used to generate time limited access tokens using a program such as the Google Authenticator. To make setup easier, the plugin generates a special QR code which can be scanned by the reader.

From then on, when you log in, you will get an extra screen which will prompt you for a code.

Enter the code produced by your authenticator and you will be given access!

» Visit the project on Github...

password_strength

This is just a quick post to nudge you towards a little plugin I wrote for Known which enforces a minimum password strength for user passwords.

The plugin works by calculating the entropy of the password based on NIST recommendations, and rejecting passwords where the entropy is too low.

By default, the minimum entropy is 44, however this can be changed through a configuration setting.

For this plugin to work, until my pull request is merged into the core code, you’ll need to apply patches available from my password validation branch.

Anyway, give it a kick about!

» Visit the project on Github...

Image “Password Strength” by XKCD

closed_worksforme

The majority of web servers retain a vast amount of data about their visitors in the form of log files. Other processes running on the server, like the system log, MTA log, etc, also store a raft of information.

These logs are typically retained (although often rotated at regular intervals to save space) basically until the admin is looking to reclaim some disk space or the server is reinstalled, so, from a practical standpoint that’s “forever”. This is very much part of the tech industry’s dataholic “collect everything” culture, which I’m personally trying to wean myself off of.

Thing is, at first glance, retention seems like such a good idea (and limited retention can be, more on that later). You need logs to find out how your server is performing, and what if something goes wrong? However, they’re mostly just noise, and they go stale very quickly… when was the last time you needed to look at a 4 month old apache log file?

The reality is that the vast majority of the time you’re only really interested in the last couple of lines. Why keep the rest?

What question are you trying to answer?

Log files have there use; they are invaluable to diagnose specific and immediate problems along the lines of “My web site keeps giving me a white page!”, or “Why on earth won’t my firewall start?”, or “What was the last thing Apache did before it crashed?”.

However, to answer the perhaps more useful questions like, “Am I seeing increased traffic?” or “Are my hard drives healthy?”, or even esoteric questions like, “Did spring cleaning my server save me money?“, your raw logs really aren’t going to be much use to you.

To answer the questions you’re really interested in, you’re going to have to cook this data into something tasty.

What I do…

This is the approach I’m currently using for myself, and which I been recommending to my clients. Obviously you need to adjust this based on specific requirements, for example, one client I had in the past had a legal requirement to retain all logs off line (of course nobody ever looked at them but rules is rules).

  1. Retain raw logs for a day: keep your raw logs for a short period of time, this will let you get at the raw text of any error messages should anything on your server die.
  2. Run an infrastructure monitoring tool: instead of keeping raw logs, what you should be keeping is the higher level statistical information that is produced by analysing your logs (and other sources) produced by a tool like munin. These results have all the noise (and any sensitive information) removed, and are far better at helping you diagnose problems.

Using this approach I was in the past able to, among many other things:

  • Spot a failing hard drive on a customer’s server before it became a problem (because over time the frequency of errors on that specific drive was increasing).
  • Optimise caches within a feedback loop (I could track configuration changes with a corresponding increase or decrease in cached pages served).
  • Isolate the cause of an intermittent failure on a client site (by seeing what the server was doing at the time of the outage, I could see that the mysql query cache was becoming full causing queries to run slowly and apache to block).
  • Link an increasing number of errors back to a configuration change made months ago (I had logged the time and date of the config change, and could look back at my graphs to see that I first started seeing problems after this time. Reverted the change and everything was a-ok).
  • …etc…

In each case the information was in the raw logs, but good luck trying to find it.

There are many tools out there that can help you, but the basic principle is the same – process your logs into a more usable statistical form from which it’s easy to gain insights from, and ditch the unnecessary raw logs which are mostly noise.