Pondering: Using Git to add history to backups

February 17, 2012 Marcus Povey

tl;dr: Use Git to selectively add revision history to existing rsync backups.

As anyone knows who has used computers for any length of time knows, hardware failure and data loss is inevitable. Having gone through several hard disk crashes, UPS and power failures and random acts of sleepless root console typing I have become especially paranoid about backups.

Especially since so much of my living is derived from owning and maintaining a fully functional computer setup.

My current setup has important files rsynced between between my home server and an offsite server nightly.

Large files which change infrequently (photos and videos) are encrypted and stored remotely on my S3 account, and really vital (but not security sensitive) stuff that I may need on the go is stored in a dropbox folder that I can get at from my laptop, ipad and phone.

What this doesn’t do

While the rsync approach has many benefits (primarily simplicity of implementation) the limitation of rsync backups is that you get a warts and all copy of whatever you’re backing up. You have no history inherent in the backup, unless you set this up yourself.

Mirrored backups are fine if you’re sure the current version is always the one you’re going to want. But consider these fairly common situations where it fails (they have all happened to me at one point or another):

You delete a new file by accident before the nightly backup has run.
You delete a file, but discover some months later that you really didn’t want to.
You change a configuration setting only to discover some time down the line that this was dumb.
…etc…

What I need here is a version control system…

Enter Git

First off, I should underline that Git is NOT a backup tool! It doesn’t store file permissions, empty directories or any number of other things that are fairly important to backups (making it unsuitable for use in system wide backups across multiple users), VCS systems are also fairly inefficient when it comes to handling binary files.

However, used wisely I think it could be perfect to add a version control layer over my existing backup (for critical locations like webserver config and business documents folders).

For one thing, it can exist entirely in place – i.e. it doesn’t need a remote server to work, although conceivably one could be added to add even more resilience to the system (I already have a private git server set up for work projects, so this would be relatively painless).

Files in these repositories could then be edited, modified and saved in the normal way. I would have to commit the changes, but the nightly backup script could easily be modified to commit any uncommitted changes (with an appropriate “nightly backup yyyymmdd” message).

This would, I think, be the simplest way to add a revision history and rollback capability to the existing rsync backups.

With all this in mind, can anyone think of a good reason why I don’t go into my documents folder and type git init?

Marcus Povey