se rsync to back up a directory tree of files

16 Jul

http://www.scrounge.org/linux/rsync.html

se rsync to back up a directory tree of files

rsync is a very good program for backing up/mirroring a directory tree of files from one machine to another machine, and for keeping the two machines “in sync.” More on rsync features.

rsync fits in great with the Scrounge.org philosophy of having lots of cheap machines. Now you have a way of keeping your backup machines syncronized with your “main” machine.
Installation

Download and install rsync. The easy way is if one of the binary packages work. If you can use a Red Hat 6.0 RPM, then download the most current version from here. It is currently at version 2.4.5-1, so you would download rsync-2.4.5-1.i386.rpm. Then (as root) type:

rpm -Uvh rsync-2.4.5-1.i386.rpm

and it is installed. Substitute as is appropriate for the most current version number.

If you can’t use the RPM file, or any of the other binary distributions, then you must download the source “tarball” file, untar it, and follow the instruction that are contained in the README file to compile and install it.

When you have it installed, type rsync –help to see if it is alive. It should display several screens of options. You must install rsync on all machines that you will be connecting to.
Configuring and testing the SSH connection

Warning! There are security implications with configuring SSH and rsync to allow “auto-login” with no password prompts. Make sure that you know what you are doing when configuring SSH, especially if you allow remote users to log into any of your machines.

Please read these additional thoughts on security when using rsync.

SSH is the preferred method of connecting with rsync (IMO.) If you haven’t already installed SSH on your machines, then see How to install and configure SSH. Before you can connect with rsync, using SSH as the transport layer, you must be able to slogin to the other host. So first try to log into the other machine by typing slogin hostname (where hostname is the name of the computer you are connecting to.) Press Ctrl-D to log out.

If you want rsync to connect with auto-login (with no password prompt!), so that you can use rsync in an unattended script, you must get RSA keys working by following the procedures explained at the Getting started with SSH page.

* Generate a public key with ssh-keygen as the user that you will be connecting with rsync. Choose a good pass phrase.
* Insert the key you just created into ~/.ssh/authorized_keys.
* Copy the ~/.ssh/authorized_keys file to the other machine(s).
* Set permissions (chmod 644) for ~/.ssh/authorized_keys, if needed.
o Optional. You may want to Use ssh-agent $SHELL to make the keys and pass phrases available to other commands (like rsync….)
o Use ssh-add to load keys in memory.
* Use slogin hostname to log into the remote host machine, as a means of testing to see if you can establish a SSH connection. Press Ctrl-D to log out.

What you are aiming for is getting SSH configured so that you can use slogin to connect to the remote machine with no password prompt. So that scripts that you write using rsync won’t require you to be at a console to type a password in.

Warning. Running rsync as root is dangerous. Consider creating a special “rsync” user and run rsync as this user. The main complication when doing this is that the rsync user must have sufficient permissions to read and write all the files that rsync will be accessing on both machines. This might require a long detour in assigning group permissions and such, but is much more secure than running rsync as root. Bone up on your system adminstration skills.

Here is some more information on managing SSH RSA keys.

If for some reason you don’t want to (or can’t) use SSH, then you must use the native RSH transport layer. In this case, you must be able to connect with rlogin (instead of SSH’s slogin.) See man rsh, man rlogin and maybe man rcp. Remove all instances of –rsh=ssh from the OPT definitions in the script example, below.
rsync reference

* Official rsync page
o rsync documentation.
o Source code (tarball)
o rsync binaries in RedHat 6.0 rpms
o How to get rsync going on a Windows machine
* Rsync mirroring howto and FAQ.

* How to install and configure SSH

* The Linux System Administrators’ Guide
* UNIX and Linux Access Permissions Bits
o Changing access permissions
* Linux Security Guide
* Linux Security HOW-TO

A simple rsync script

Copy and paste the script into a text file. Look through it and change variable definitions, as needed. Save and name it to be something like rsync_demo.sh. Change the permission bits so that it is executable. (chmod 700 rsync_demo.sh) Create the excludes file. (See script for explanation.) Run rsync_demo.sh by typing ./rsync_demo.sh

#!/bin/sh

# Simple rsync “driver” script. (Uses SSH as the transport layer.)
# http://www.scrounge.org/linux/rsync.html

# Demonstrates how to use rsync to back up a directory tree from a local
# machine to a remote machine. Then re-run the script, as needed, to keep
# the two machines “in sync.” It only copies new or changed files and ignores
# identical files.

# Destination host machine name
DEST=”smpent”

# User that rsync will connect as
# Are you sure that you want to run as root, though?
USER=”root”

# Directory to copy from on the source machine.
BACKDIR=”/root/bin/”

# Directory to copy to on the destination machine.
DESTDIR=”/root/bin/”

# excludes file – Contains wildcard patterns of files to exclude.
# i.e., *~, *.bak, etc. One “pattern” per line.
# You must create this file.
# EXCLUDES=/root/bin/excludes

# Options.
# -n Don’t do any copying, but display what rsync *would* copy. For testing.
# -a Archive. Mainly propogate file permissions, ownership, timestamp, etc.
# -u Update. Don’t copy file if file on destination is newer.
# -v Verbose -vv More verbose. -vvv Even more verbose.
# See man rsync for other options.

# For testing. Only displays what rsync *would* do and does no actual copying.
OPTS=”-n -vv -u -a –rsh=ssh –exclude-from=$EXCLUDES –stats –progress”
# Does copy, but still gives a verbose display of what it is doing
#OPTS=”-v -u -a –rsh=ssh –exclude-from=$EXCLUDES –stats”
# Copies and does no display at all.
#OPTS=”–archive –update –rsh=ssh –exclude-from=$EXCLUDES –quiet”

# May be needed if run by cron?
export PATH=$PATH:/bin:/usr/bin:/usr/local/bin

# Only run rsync if $DEST responds.
VAR=`ping -s 1 -c 1 $DEST > /dev/null; echo $?`
if [ $VAR -eq 0 ]; then
rsync $OPTS $BACKDIR $USER@$DEST:$DESTDIR
else
echo “Cannot connect to $DEST.”
fi

Note. rsync doesn’t (by default) actually copy whole files between machines. Rather, it uses the rsync algorithm to find the differences between the two files and only sends sufficient information that is needed to make the destination file be identical with the source file. This is much more complicated than just copying the file, but has the potential for drastically minimizing the amount of data that has to be copied.

Thanks to Brian, Eric, and Johannes Ullrich for their help in preparing this page.

Comments and corrections to me.

Back to the main Scrounge page.