Home : Linux :

rsync

Rsync is great for maintaining an exact copy (mirror) of one or more directories on another machine. After the initial copy, its very easy and quick to maintain as only block level changes are transferred. In addition to mirroring, rsync can be a backup tool, old versions of files or directories can be kept and recovered. While rsync is a command line tool, there are a number of backup tools that use the rsync libraries. Using rsync as a daemon has some interesting possibilities but seems best suited to public (ftp-like) servers.

rsync over ssh

The following covers basic mirroring of a local directory (Mail) to a remote directory. If the remote directory doesn't exist, it will be created. If a file already exists, only differences (if any) will be transferred... Rsync uses the ssh shell by default, deleted the unnecessary "-e ssh" switch from the following command lines.

   $ rsync -av Mail 192.168.0.7:

Note that I am in the root of my home directory and by default I send to the root of my home directory on the remote, i.e. /home/dave/Mail is rsync'd to /home/dave/Mail on the remote. I could also have been more specific:

   $ rsync -av /home/dave/Mail 192.168.0.7:/home/dave

and

   $ rsync -av /home/dave/Mail/ 192.168.0.7:/home/dave/Mail

do exactly the same thing (directory Mail > directory dave and the contents of directory Mail, i.e. Mail/, > directory Mail). The command is source to destination. To mirror the remote locally:

   $ rsync -av 192.168.0.7:/home/dave/Mail/ /home/dave/Mail

or

   $ rsync -av 192.168.0.7:Mail/ Mail

If a full path isn't specified, the local directory must be a subdirectory of the directory you are running the command from (typically your user home directory). The remote directory must be a subdirectory of whatever users home directory that you logged in as. With full paths you can rsync anything from/to anywhere if you have appropriate permissions.

The username for the remote login defaults to your local username. To specify a username use [user]@[IP address or domain name], e.g.:

   $ rsync -av rambo@invalid.com:Mail/ Mail
   $ rsync -av Mail rambo@invalid.com:

You can rsync multiple files/directories, e.g.:

   $ rsync -av root@invalid.com:'/var/named /etc/named.conf' backup/invalid.com/configs

-av: Archive almost everything (hardlinks are the exception) and be verbose. (add z, -avz, for compressed transfer and more v's, -avvv, for increased verbosity).
-e ssh: Specifies the ssh as the shell to use.

If you've deleted files on the source/sender, you may want to delete them on the destination.
--delete                delete files that don't exist on sender

If you have newer versions on the destination/receiver, you may not want them replaced. In this case you'd have to rsync in both directions to get matching directories.
-u, --update                skip files that are newer on the receiver

If you want to be selective about what you mirror there are numerous options. I need to exclude large avi files and --max-size=1m works for me. The --exclude 'something' has a number of approaches and there's a nice article at Slicehost for that.

Running rsync via a shell script can be problematic if source directories/files contain spaces. While there are a number of approaches, single quotes around a quoted path - '"foo"' - saves having to create another variable with spaces escaped (e.g. perl script). Escaping the destination only needs the standard double quotes (using the single quote wrapper on the destination will fail).

Links:
Easy Automated Snapshot-Style Backups with Linux and Rsync (old)
The backup tools duplicity (encrypted), rsnapshot (hardlinked snapshots) and rdiff (mirror/incremental) use rsync (libraries). The cool thing about TimeVault (snapshot backup for Ubuntu) is that "Restore functionality is integrated into Nautilus"

Setting up RSync over SSH with No Password (simple, some risk)
Using Rsync and SSH : Keys, Validating, and Automation (firewall denyall/allowx is better IMO)

...scp is similar to rsync (syntax and ssh transfer protocol) and is handy for single file transfers

      $ scp /source/path/file 192.168.1.1:/destination/path
[ comment | link | top ]

Backup

Basic use of rsync will mirror files/directories on one machine to another. The first time its run on a given directory, all files will be transferred. On subsequent runs only the new files and block level changes are transferred. While you have/maintain a copy, there is no way to recover files older than the copy. There are ways to use rsync for incremental backups. I'll cover two, -b --backup-dir=DIR and --link-dest=DIR.

The backup option will copy deleted and old versions of files to the specified backup directory. As the destination mirror is updated, old changed/deleted files are copied to the --backup-dir=DIR. This is easiest to do using a shell script that increments the backup directory name by date(.time). This is handy when you want the ability to recover older or deleted files.

The --link-dest=DIR backup option is a bit more involved (I don't use it. The following works but could be flawed). Use this if you want to be able to recover whole directories (older than the mirror directory). For the --link-dest=DIR option, its the rsync destination directory that is the backup directory and that needs to change/increment.

If a source file is unchanged and is in the --link-dest=DIR, a hardlink to the file in the --link-dest=DIR is created in the destination directory. Over time you will have the same (unchanged) file in a number of different backup directories. Because they have the same inode, are hardlinks to same space on the hard-drive, the total space they take up is no more than the space that the original writing of the file took

Another feature of hardlinks is that deleting the first/original, or any other, pointer to that space on the hard drive has no affect on the other(s). The problem is that the --link-dest=DIR will become less and less useful over time because more and more files change and there's less and less destination directory files that can be hardlinked to the --link-dest=DIR.

Rsyncing to a hardlinked copy (cp -al) of the last backup is one option (not sure how you would do it in a push - source > dest). My choice, possibly flawed, is to use two rsync commands per update. First I mirror the source to a new incremental destination using --link-dest=DIR for the hardlinks and then I update the --link-dest=DIR with a simple source > destination mirror. This keeps the --link-dest=DIR fresh and maximizes the potential hardlinks available for the next run.

The two steps also allows my --link-dest=DIR to be an up-to-date mirror that's in the same location as on the source (theoretically a ready-to-roll spare server). When I use --link-dest=DIR in a source > destination command, destination is a date incremented directory in a root level 'backup' directory. The 'backup' directory needs to be on the same partition for hardlinking to work.

Files:
simple backup script
hardlink backup script (possibly flawed)
[ comment | link | top ]

Key-pairs

To rsync/ssh (or sftp, scp, etc.) without having to enter a password you can use key pairs. When using cron and shell scripts to backup and/or mirror directories, using key pairs is better than storing passwords as plain text in the script. It does have some caveats. Its best to keep your private key on, and do your backups from, the most secure machine. If you're backing up a public web-server, you would use rsync from the private/backup server and only your public key would be stored on the public server. Do not keep both keys on either machine.

To generate a key-pair type (as the user, and on the machine, that's going to run rsync)

   ssh-keygen -t rsa

and just hit enter at every prompt. You'll now have two new files in the ~/.ssh directory (your home directory, you may need to turn on 'show hidden files'), id_rsa and id_rsa.pub. Open id_rsa.pub and copy the contents (single line) to the ~/.ssh/authorized_keys file on the other machine (...apparently $ ssh-copy-id does that for you, haven't tried it). You may need to create the authorized_keys file. If so, give it user rw permissions only. Now delete id_rsa.pub and try connecting from the private to public machine with ssh (note: you need to use ssh before rsync because you need to manually accept the first connection you make)..

...If it doesn't work, or strangely stops working at some point, check the files permissions. I use 600 for id_rsa and 644 for authorized_keys.

...Using a passphrase and keychain would seem a bit more secure (haven't tried it).

...To generate a key-pair for apache, e.g. for cgi scripts running as user apache
      
      # mkdir ~apache/.ssh
      # chmod 700 ~apache/.ssh
      # chown apache ~apache/.ssh
      # su apache -s/bin/sh -c 'ssh-keygen -P "" -N ""'
      source

      # su root
      # su apache
      $ ssh-keygen -t rsa
      source

The first works, the second may work (I got an error and a strange shell prompt, freaked out and typed exit - which is now stuck someplace so I cannot try tssh-keygen -t rsa).
[ comment | link | top ]

Using rsync on Windows

Sep '07: I'm looking into adding rsync functionality to my XP machine. DeltaCopy looks the closest to a KISS solution... but encrypted transfer (ssh) requires additional files and manual configuration.

cwRsync (packaging of Rsync and Cygwin)
DeltaCopy (a "Windows Friendly" wrapper around the Rsync program)
Cygwin and rsync (manual config)

See also: Unison (rsync alternative)
[ comment | link | top ]

Back to: Linux