Automatic Backups with
Describes a simple method to build and operate a server that maintains copies of specified client file trees, using the efficient
rsync tool to capture changes, secure shell (
ssh) so that network connections need not be trusted, and
keychain to allow
ssh keys to be used in unattended operation.
- Why another
- Configuring the Server
- Configuring the Clients
- Copying Manually
- A Word on Strategy
- Automating It
- What About Windows Clients?
- What About Macintosh Clients?
rsync is a tool to replicate files between two locations, typically on separate hosts connected by a network. It uses a clever algorithm to detect differences in files so that only the differences need be transferred, making regular backups efficient and fast.
The manpage and the wealth of documentation that comes up in a Google™ search can daunt the reader who simply wants backups, because most of it discusses other uses of
rsync (for example, running a file server—essentially a more efficient ftp archive—and mirroring websites). Also, much of the guidance on using
ssh in scripts proposes using a key with a null passphrase, a Bad Practice.
I waded through a lot of verbiage before I understood how to do what I want. In fact, it’s simple and straightforward. I wrote this to save others the time and bother.
- Establish a server to maintain copies of file trees owned by various users on any number of computers.
ssh to replicate the file trees.
- Automate the process so the backups take place without user intervention, using
keychain to manage the
- 1 unused computer
- I had a 333MHz Pentium II with 128 MB RAM. I understand rsync does take more than trivial amounts of processing power and RAM, but I don’t have numbers (if you do, please share).
- 1 modest-sized disk for the operating system
- I had a 1.2GB drive lying around, but half that would do easily. In fact, there’s no particular reason you couldn’t put the OS on the data disk, assuming you can boot from it.
- 1 big honkin’ disk
- I got a 250GB monster, several times what I think I need, but we all know how disk space needs grow [Update: just a few years later, space is running low; on the other hand, terabyte disks are now commonplace].
- Good ol’ Debian GNU/Linux
- Naturally, you’re welcome to use your favorite distribution of your favorite OS, as long as you can get
keychain for it. If you’re using GNU/Linux, I recommend Debian (stable) for any production server. This has the advantage of being almost trivially easy to update without breaking your applications, since Debian stable installs only fixes and security patches and never upgrades to a functionally different version of a package.
Configuring the Server
- Install the operating system and essential packages. For those using Debian GNU/Linux (stable), I can offer some tips:
- Since I build such servers often, I’ve assembled a “standard” set of configuration files and scripts, my Debian HostConfig Kit (see Resources, below) to make it easy to set up a new box. Unless you’re planning to run other services on the box, you can omit most of the application packages and the desktop environment.
- Manage all configuration files, scripts, and documentation using a configuration management tool like CVS. Typically, the CVS root won’t be on the same server; if it is, be sure to back it up.
- If needed, build a custom kernel; in my case, when I first built it I needed to enable the ATA133 controller so I could see all of the big honkin’ disk, but more recent stock kernels work just fine.
- Install an
ext3 filesystem on the big honkin’ disk.
- Create a mount point
/bkp and mount the big honkin’ disk. Don’t forget to add an entry to
/dev/hde1 /bkp ext3 defaults 0 2
- Make a directory under
/bkp for each client machine; in my case I have
/bkp/pikachu/, and so forth.
- Be sure to install
ssh on the server.
- Create users and groups as needed, preferably the way you have them on the client boxes.
Configuring the Clients
keychain on each client. Then, for each user that will be running rsync:
- Log in as that user.
- Generate ssh keys, specifying passphrase(s) when prompted.:ssh-keygen -t rsa
ssh-keygen -t dsa
- Define an alias to start ssh-agent and load the ssh keys.
For Bourne-type shells (
zsh…):alias gokeychain="keychain --nogui $HOME/.ssh/id_rsa $HOME/.ssh/id_dsa ;
alias gokeychain keychain --nogui ~/.ssh/id_rsa ~/.ssh/id_dsa ;
- Execute the alias.
keychain will start the agent and prompt for the passphrase(s). After that, the keys will be in memory until you explicitly remove them or reboot the machine.
- Install the public keys in the corresponding
authorized_keys file on the server, for example, if I intend to copy files using user account
ted on server
To reload the keys, typically after rebooting the machine, for each user that will be running rsync:
- Log in as that user.
- Execute the alias (what I’ve called
- Look over the
rsync manpage, but ignore all the stuff about running
rsync in daemon mode; that’s for a public service, essentially a more efficient ftp server, and doesn’t encrypt the traffic. In particular, examine the command-line options to
rsync and identify the ones you need for your situation.
- For each client, back up the file trees you need. For example, to back up my files on mononoke, I might run (as regular user
rsync -av --delete --delete-excluded
- means “archive”, equivalent to
-rlptgoD. It’s a quick way to say you want recursion and want to preserve the file attributes as they are on the client.
- is “verbose”; you can drop that once you’re confident it’s all working. While you’re testing, you might add
--progress to give you more info.
- means delete stuff from the target if it’s gone from the source.
- goes even further and says delete anything on the target that isn’t included in what you’re asking it to back up.
- is just what it sounds like; pick the ones that work for you, using the “EXCLUDE PATTERNS” section of the manpage for guidance on specifying patterns. See the note on strategy, below.
- specifies the directory tree(s) under
/home that I wish to replicate. See the note on strategy, below.
- says log into the server (
nox) as user
ted and put all this stuff under
- The first time you run the command, it will take considerable time to copy everything to the server. When you repeat the command after that it’s much quicker—it takes a while to deliberate about what needs to be transferred, then transfers only the files, or parts of files, that have changed.
- Notes on users and permissions:
- The user on the client machine obviously needs read access to all the files and directories to be copied.
- The user on the server needs to be able to write the files. You may need to add the server user to various groups in order to achieve this. It’s a good idea to have a consistent set of user and group IDs on the server and all clients.
- You could also run the backups as the
root user on the client, the server, or both, eliminating all permission issues, but raising other issues when you automate the process (I’m uncomfortable having
root‘s ssh keys in memory).
A Word on Strategy
It’s tempting to exclude just obvious files (like “cache”) and then explicitly include the directories I want to back up. For a single, manual backup, this is ok, but for automated backups, this is a poor strategy; if a user adds a new top-level directory on one of the clients, it won’t get backed up unless I explicitly add it to the script. This violates my “no user intervention” objective.
A better approach is to specify all directories with a
* (or by naming the parent directory) and then add an
--exclude clause for each tree that I don’t want. This way, any new directory gets backed up automatically.
Of course, there are exceptions. For example, suppose we’re certain that all valuable stuff gets placed only in certain subdirectories and never in the parent and that, furthermore, the parent accumulates lots of files and directories whose names might not be predictable. In such a case, it makes sense to start in the parent directory and specify the directories we want, knowing that any newly-added stuff we care about will always be in one of them. That’s easier than running a separate rsync for each subdirectory, or trying to keep up with excluding files in the parent that come and go.
To save yourself drudgery and error, put it all into a script. Let the script build the command based on the user and the client hostname. Put the script somewhere where each user can execute it, like
/usr/local/bin/syncfiles.sh on each client. It should look something like this (by default, any user on any host will back up that user’s home directory, but you can add
case clauses for particular users and hosts):
# Replicate file trees to server using rsync
# sh syncfiles.sh
# or call from cron (make sure ssh key is loaded beforehand)
# Local user name must match remote (on server) user name
keychain $HOME/.ssh/id_rsa ~/.ssh/id_dsa
#rsync -e ssh -av --delete --delete-excluded
rsync -e ssh -a --delete --delete-excluded
$Excludes $User $User@nox:/bkp/$Host
Once you’ve decided what to back up, decide when and how often.
For a laptop used mainly by a single user, connecting to the LAN intermittently, it may be sufficient to execute
/usr/local/bin/syncfiles.sh manually from time to time.
For hosts that reside on the LAN, or that have multiple users, it makes sense to schedule the
rsync operations with
cron, with a separate
crontab entry for each user. For example,
ted‘s crontab entry might look like this:
# Back up files to Nox nightly at 03:36 AM:
36 3 * * * sh /usr/local/bin/syncfiles.sh
Since catastrophes rarely happen, a painless automated backup system that quietly and reliably does its job can lull us into forgetting the whole point of doing backups in the first place: restoring our data. Make a point of running some test restores when you first start making backups, so you can note any surprises. Ideally, you should run a test restore on a regular basis.
Note: Don’t use
scp to restore; you’ll have problems with links. As far as I can tell,
scp doesn’t understand links and simply treats them as regular files or directories. This will make duplicates and, in the worst case, can make endless loops (for example, if a symbolic link points to a parent directory).
To restore using
rsync, just reverse the procedure and omit (unless we don’t want to restore everything we backed up) the –exclude and — delete options. For example, if we backed up the contents of the
/home/teddirectory with these commands:
rsync -e ssh -av --delete --delete-excluded
then to restore them we use these commands:
rsync -e ssh -av ted@nox:/bkp/mononoke/ted .
ted@nox:/bkp/mononoke/ted is the source and . (the current directory) is the target. Note that on our client either the directory
/home/ted must already exist or we must have permissions to create it.
We can also restore individual files, which probably happens more often than a catastrophic loss of entire directories or disks:
rsync -e ssh -av ted@nox:/bkp/mononoke/ted/recipes/fondue.html .
What About Windows Clients?
Yes, it’s possible to run
rsync from Windows clients, should you have users thus afflicted.
- If they happen to be running Cygwin (see Resources, below), just follow the same instructions as for GNU/Unix, above. Cygwin includes all the packages you need, including
- As an interim solution, run a Samba server, encourage the Windows users to keep their important data on their shares (“network drives”), and back up the Samba server with
- If you don’t want to install a full-blown Cygwin just for backups, note that the Cygwin installer lets you set up a minimal system and then just add the tools and libraries you need. Geoff Breach, in Sys Admin magazine (see Resources, below), describes such an approach (skip down to “Installing Win32 Client Software”). If I were to go to all that trouble, I would also add
keychain rather than use a key with a null passphrase.
Since I’m lazy, I was delighted to find that ITeF!x (see Resources, below) combined
rsync and elements of Cygwin to build
cwRsync, distributed as a single “Installer” file for Windows. That’s the approach I describe here. Its only drawback is the use of a null passphrase (the author says he plans to add support for
keychain), but the easy setup makes it the best method I’ve found so far for Windows. I’ve tested it on Windows 98 and Windows XP.
cwRsync from the ITeF!x site.
- Unzip and run the installer; you can omit the
rsync server unless you want it for some other reason. Assume
cwrsync is installed in
C:Program Filescwrsync in the following examples. In DOS batch scripts we’ll write this as
C:progra~1cwrsync since scripts have trouble with embedded spaces.
- Be sure the windows box can find the backup server; if you don’t have a local DNS server, then put the backup server’s name and IP address in the
hosts file (
windowshosts in Win9x,
windowssystem32driversetchosts in WinXP), or just use the backup server’s IP address in your
- For each user on the Windows machine:
- Establish a “home directory”;
ssh will use this location to maintain keys and the
known_hosts list. Assume our user is “ebenezer” with home directory
C:homeebenezer in the following examples.
- As the user, in a “DOS command window”, generate
ssh keys with, alas, null passphrases:c:
c:progra~1cwrsyncssh-keygen -t rsa -N "" -f .sshid_rsa
c:progra~1cwrsyncssh-keygen -t dsa -N "" -f .sshid_dsa
- By any means available, copy the public keys (here,
id_dsa.pub) to the backup server and append them to the user’s
$HOME/.ssh/authorized_keys file. A simple way is to use
rsyncinteractively; since the keys aren’t yet installed, it will prompt for a password. Assuming you’ve copied them to the
/tmp directory, on the backup server:cat /tmp/id_rsa.pub /tmp/id_dsa.pub >> /home/ebenezer/.ssh/authorized_keys
- Copy the script template
C:Program Filescwrsynccwrsync.cmd to the user’s home directory, changing the extension to
.bat for Win9x (WinXP will run either form). Use Windows Explorer or just type:copy C:progra~1cwrsynccwrsync.cmd C:homeebenezercwrsync.bat
- Edit the script. It sets some environment variables, then executes the actual rsync commands, and finally resets the environment variables (not sure why, but I’m no DOS expert). Note that the primitive DOS command-line feature in Windows has a limited line length and doesn’t support continuation lines, so a typical
rsync command line would be too long. As a workaround, use environment variables; that also makes it all more readable. You should end up with something like this:
REM CWRSYNC.CMD - Batch file to start your rsync command (s).
REM By Tevfik K. (http://itefix.no/itefix-en)
REM ** CUSTOMIZE ** Enter your rsync command(s) here
SET RSYNCCMD=rsync -e %CWRSYNCHOME%ssh -av --delete --delete-excluded
SET EXCLUDES=--exclude "[Tt]emp" --exclude "RECYCLE[DR}"
SET EXCLUDES=%EXCLUDES% --exclude '*[Cc]ache' --exclude '[Cc]ache*'
SET EXCLUDES=%EXCLUDES% --exclude 'Temporary Internet Files'
SET DIRS=dev docs home mp3 "My Documents" ssh
echo Backing up from C: drive: %DIRS%
%RSYNCCMD% %EXCLUDES% %DIRS% %REMOTE%/c
echo Backing up from D: drive: [all]
%RSYNCCMD% %EXCLUDES% * %REMOTE%/d
rsync options and arguments are the same as before. Environment variables are invoked with
%varname%. Note that we’re handling each lettered disk drive separately. Note also that, for the C: drive, we’re explicitly specifying subdirectories to back up, since we put all the stuff we care about inside them and the root directory tends to accumulate garbage we don’t need. This is the exception mentioned in the note on strategy, above. But beware: if you add a subdirectory to C: that needs backing up, you’ll have to edit the script.
- In the user’s home directory, where the batch script is, make a shortcut.
- In Win9x, the script may get “Out of environment space” errors. To fix, in Windows Explorer, right click on the shortcut file, select Properties, select the Memory tab, and change the Initial environment setting to something other than Auto (1024 worked for me).
- Test the script by double-clicking the shortcut. As before, the first time you run the command, it will take considerable time to copy everything to the server. When you repeat the command after that it’s much quicker.
- If you want a shortcut on the desktop or elsewhere, copy the shortcut you just made.
- Windows has a “Task Scheduler” (Start/Programs/Accessories/System Tools/Scheduled Tasks) with which you can run the backup script automatically. Be sure you point it to the shortcut (the
.pif file) rather than the batch script so that you get the environment space setting.
What About Macintosh Clients?
I understand MacOSX is a FreeBSD derivative, so presumably you can follow the same instructions as for GNU/Unix, above, probably with some adjustments. Previous versions of MacOS may or may not support
rsync. This exhausts my knowledge of the Macintosh world. If someone kindly points out any documentation that would help Mac users use
rsync, I’ll be happy to link to it.
Can’t leave well-enough alone? Possibilities abound:
- Add another big honkin’ disk as a RAID mirror of the first.
- Add another big honkin’ disk and use
rsync to mirror the first.
- Use a tape drive or DVD burner to make archival backups, now that everything you want to preserve is in one place.
rsync to replicate the big honkin’ disk on a counterpart that’s geographically removed.