Backups


Philosophy

There's an old saying in the computer industry that there are two types of people in the world -- those who have lost data and those who will. Drives will fail, so it's important to have good backups.

Over the years, I've gone through several backup strategies:

  1. Taunting God by not making backups at all.
  2. Backing up to a floppy-based tape drive using the ftape driver for Linux. These tapes have limited capacity, so I could only backup critical data. Also, because tape drives that attach to the floppy controller are so archaic, people using these are lauged at by just about everyone.
  3. Backing up everything to a large disk using flexbackup. At least I thought I was backing up everything. I had to do a complete restore once and discovered that flexbackup's method of eliminating files you tell it to ignore (like /tmp) is to not backup directories at all, so a complete restore misses empty directories and restores nonempty directories with the ownership/permission of the restoring user. That, and I had to patch a few things to make it work better.
  4. Writing my own software to backup to disk.

Here are the goals of my backup software:

Using backup.pl

My backup software has a very small footprint. It's one Perl script, plus a configuration file for the system running the backups. On my systems, I choose to use a local "exclude file" on each system getting backed up to list directories that GNU tar should exclude (excluding pathnames listed in a file is a feature of GNU tar, so there's no special programming required for this). Other than that, you just need to have GNU tar, bzip2, and ssh if you want to back up remote systems.

Here's the command line syntax: USAGE: ./backup.pl [ --config <cfgfile> ] [ --all ] <filesystem> [ <filesystem> .. ]

Here's my configuration file, so you can see how to configure the backups: SyslogFacility = user
SshIdentityFile = /root/.ssh/identity.flexbackup
BackupDirectory = /backup
StateDirectory = /var/state/backup
ExcludeFile = /etc/backup/exclude
Level = 0
Filesystems = / /boot rhinosaur:/ outshined:/ outshined:/boot mailman:/ mailman:/var

And here's an explanation of each of those options:

Offsite Backups

There's a lot of important data on the systems I back up, and in the event of a catastrophe (house burns down, hard drives seized by federal agents, etc.) I don't want to lose my backups along with my data, so I periodically burn images of the most recent backups onto DVD media.

I thought this would be simple, but there were actually several obstacles to overcome in getting the backups onto DVDs with as little fuss as possible. For one thing, DVDs can only hold about 4.3 GB of data and a full set of my backups is around 9 GB. I needed to split some of the files across multiple DVDs. I also learned that mkisofs won't handle files larger than 2 GB, so I need to split any file larger than 2 GB into smaller pieces. And since it would be easy for someone to walk off with my DVDs and read all my data from the backups, I need to encrypt the backup files with GPG before burning them to DVD and taking them offsite.

To do all this, I wrote offsite.pl. It does all the splitting and encrypting automatically, and even figures out which backups to take offsite (it takes the most recent level 0 and level 1 of every filesystem). It can also be used for CD-ROM images by adjusting the maximum size of one image, but my backups would take up too many CDs.

Getting the Software

You can grab a copy of backup.pl and offsite.pl out of my CVS repository. There's POD documentation embedded in the scripts, and HTML versions of the documentation for backup.pl and offsite.pl available.