Linux Backup Primer

One of the frequently asked questions regarding Linux system administration is, "How do I backup my system?" While backup on a Microsoft platform is pretty straight forward (click on the backup button or select backup from the Start menu), Linux backups can be quite intimidating if you're not familiar with the UNIX paradigm of files and devices. This column will explore the devices and methods involved in protecting the data that exists on your Linux systems.

One of the frequently asked questions regarding Linux system administration is,
“How do I backup my system?” While backup on a Microsoft platform is pretty straight forward (click
on the backup button or select backup from the Start menu), Linux backups can be quite intimidating
if you’re not familiar with the UNIX paradigm of files and devices. This column will explore the
devices and methods involved in protecting the data that exists on your Linux systems.

What is a Backup?

In the simplest terms, it is the process of making copies of your data onto alternate media
(usually removable) in order to allow the recovery of that data in case the original is lost. A
backup can simply be a copy (‘cp’) of a file or files to another location, or it can be a stream of
data that is written out by a special program (‘tar’) to a special device or location.

Many admins automatically equate backup to tape drives, but this isn’t necessarily true. Under
Linux, or any other UNIX variant, backups can be made to files on the existing filesystems,
alternate filesystems, tape drives, remote systems, and even tape drives on remote systems. Also,
from a user-level perspective, there are no “tape drives” or “Zip drives”, just files (as we will
explain shortly).

What Device Should I Use?

There are a vast number of devices out there that are marketed as being “perfect for system
backup”. These include tape drives, removable disk drives, and even mystical “Internet backup”
systems. For backup operations, tape drives offer the surest storage method. Why tape? Well, while
a Jaz or Zip drive from Iomega may seem interesting as a backup, they tend to be hijacked for
filesystem duties when hard drives get cramped. As for the mystical Internet backups, if you can’t
get onto the Internet because of a crash, how can you restore the data that will get you back on?
Also, do you really trust your data to someone else’s remote systems?

So now that we’ve settled on the tape drive, how do we access it? As I mentioned earlier,
everything on a Linux system can be considered a file when examined in user space (versus kernel
space). Therefore, we ‘open’ the appropriate tape drive “file” for writing and write the data out to
the file. While this sounds like an oversimplification, it is what actually occurs. The /dev
directory on the root filesystem contains all of the “files” that are associated with physical
devices (like a tape drive) under Linux. When you manipulate those files, you are actually
manipulating the underlying device. Figure 1 illustrates how device files are associated with
physical devices under Linux.

Figure 1: How Device Files Correspond to Physical

Device Rewinding No-Rewind
1st SCSI tape drive /dev/st0 /dev/nst0
2nd SCSI tape drive /dev/st1 /dev/nst1
nth SCSI tape drive /dev/st[n-1] /dev/nst[n-1]

1st ATAPI tape drive /dev/ht0 /dev/nht0
2nd ATAPI tape drive /dev/ht1 /dev/nht1
nth ATAPI tape drive /dev/ht[n-1] /dev/nht[n-1]

1st floppy tape drive /dev/ft0 /dev/nft0

As you can see from the table in Figure 1, the device names are based upon the logical number of
the device within the interface hierarchy for that device type, not its physical id (SCSI) or IDE
channel (ATAPI). Therefore, even if the SCSI tape drive was assigned SCSI ID 4, if it is the first
TAPE device on the SCSI chain, it would be /dev/st0, not /dev/st4. This convention
makes it easier to keep track of your tape drives and even applies across multiple adapters. Also,
note that under the 2.0.x kernels, only a single ATAPI drive is supported. Under the 2.2 kernels,
you may use multiple ATAPI drives (ht0, ht1,…).

Now that we know how these “files” are named, what’s the difference between the rewind and
no-rewind versions of the names? Simply put, the rewind device allows you to perform an operation
with the tape drive, and then automatically rewind the tape media to the beginning when you finish
the operation. Alternatively, the no-rewind device leaves the tape positioned wherever it stops when
the operation completes. For most simple backup operations, the rewinding device is preferred, since
it will automatically prepare the tape for removal once the backup operation is completed. However,
for more complex backup operations (such as appending backups, logical seeking, and those other
operations performed by high end backup utilities), the no-rewind device is required.

Device Access Under Linux

Under Linux, devices are accessed via their associated device node file. These nodes are located in
the /dev directory. But don’t be fooled — a device node is more than just a plain file. When a
device node is opened for access, there are special attributes that tell the Linux kernel which
physical hardware device we actually mean. Each node file has an associated major and minor number
as well as a device type attribute. For example, the device node for a SCSI tape looks like this:

crw-rw-rw-   1 root     operator   9,   0 Dec  1 04:10 /dev/st0

The ‘c’ in crw-rw-rw identifies this device as a “character” device (which means it
processes I/O one character at a time), with a major number of 9 and a minor number of 0. The major
number is like an address that tells the kernel which device driver to use and the minor number is
used to define specific functional capabilities for that particular instance of the device.

What if the node files doesn’t exist? It’s easy to create a missing device node file. In fact,
there’s a Linux command that does just that — mknod . The command is executed as:

mknod /dev/nodename [c|b] major minor

However, if the proper device node doesn’t seem to exist on your system, the first thing you need
to check is whether proper device support is available in the kernel. This can be determined by
examining the output of cat/ proc/devices . On my system, this looks like:

Character Devices

1 mem 10 misc
2 pty 14 sound
3 ttyp 21 sg
4 ttyS 29 fb
5 cua 37 ht
7 vcs 128 ptm
9 st 136 pts

Block Devices

1 ramdisk
2 fd
3 ide0
9 md
22 ide1

You can find both the ht and st devices in the character list. This shows
that st is major 9 and ht is major 37. Therefore, if the /dev/ ht0
node file was absent, we could use mknod like this:

mknod /dev/ht0 c 37 0

To make the no-rewind device, we would add 128 to the minor number of the rewind device:

mknod /dev/nht0 c 37 128

For more information on these values, refer tothe information in the kernel source documentation

One problem that many admins run into involves the use of the mt command with rewinding
devices. The mt command simply allows you to manipulate a “magnetic tape” (mt) device. For
example, issuing mt -f/dev/st0eod to prepare the tape for appending a new backup set will
result in the tape drive at /dev/st0 being forwarded to the current end of data (eod). The
drive will then be automatically closed, at which time the device driver will rewind the tape to the
beginning (because it is a “rewind” device). Now, if you write to the tape, you will overwrite the
data currently on the tape, instead of appending the data to the end of the tape. By simply using
the no-rewind device, mt will leave the tape at the eod position and your next backup set will be
added to the end of the last backup.

The recommended method to alleviate this confusion is to create a symbolic link between the
no-rewind device you wish to use, and an alias known as /dev/tape. By default, the
mt command will use the drive associated with /dev/tape if no -f argument
is supplied. If you have more than one tape drive, instead of using the symbolic link method, you
can also use the environment variable TAPE with the same result by setting
TAPE=/dev/nst[x]. Either way, you could then issue the mt command above as mt
, resulting in a proper positioning of the media without having to worry about which
/dev entry to use.

Okay, so we know what the tape drive is called and we know how to decide the rewind versus
no-rewind issue. But how do we move the files from our system to the tape drive? This is where the
various backup utilities come into play. All Linux distributions ship with the godfather of backup
utilities — tar (of course dbppt and lbppt, a/k/a dump and restor, were
the true Adam and Eve of UNIX backup). The tar command has been around since the
Version 7 days of UNIX (1979, the firstdeath of Open Source UNIX). The nametar is an
acronym for “Tape ARchiver.”It was written to simplify moving data from a system to a tape and back
since dump andrestor were originally written to process paper tape units.

The general syntax of the tar command is:

tar -mode -option [files]

where mode is -c for create (backup),-x for extract (restore), or -t
for table of contents (list). Options include elements like -v for verbose output and
-f file for the destination (create mode) or source (extract or table mode). For more
option information, see the tar manpage on your system.

The simplest backup performed with the tar command could look like this:

tar -cvf /dev/st0

Figure 2: Output of a Simple tar


This tells tar to (c)reate a backup with (v)erbose output to the (f)ile/dev
of the entire (/) system. The output of such a command could look like Figure 2. In this
case, the tar operation will open the /dev/st0 file (device), write the stream of
data in tar format to that open file, and close the file when all the data is written.
Because we selected the rewind device (/dev/st0), the file (media) will be rewound by the
device driver whentar closes the file.

The next step is to verify that the data is on the tape properly. Unfortunately, the
verification method provided by the version of GNU tar under Linux only supports a
comparison mode — the tape is reread and it’s contents are compared byte for byte against the
original files. However, this is still better than no verification at all. Remember, discovering
that your tape contains bad data when you need to restore is not a Good Thing.

With the backup completed and verified, it’s time to relax, right? Wrong! Your systems
willcontinue to change and the only way for you to keep up with the changes is to continue to
perform periodic backups of the changed data. There are many ways to do this, the easiest being
“incremental” and “differential” backups.

Guru Incremental Scheme 2 Guru Incremental Scheme 1
Figure 3

Both of these methods backup files based on a date, either that of the last backup (incremental)
or the last full backup (differential). Graphically, a week’s worth of these backups would work as
seen in Figure 3.

Notice that differential backups will duplicate data that would be ignored by the incremental
method. But, a differential restore means simply restoring the last full backup followed by
thelatest differential backup. If your backup window and tape capacity allow, differential backups
will make emergency restores easier while still reducing backup media and time overhead.

To perform a backup using these methods, use tar‘s -N or — newer options
followed by the date that files should be compared against. The date should be in normal calendar
format (i.e.: 03/12/199 = March 12th, 1999).

tar -cvf /dev/st0 –newer 03/12/1999 /

This command would backup all files on the system that were modified on or after March 12th,

# TAR backup script to cover daily and weekly backups on one tape.
# Generic UNIX version – edit variables as required
# For this to work, it MUST be started on a Monday!
# Copyright (c) 1999, Tim Jones
# Permission granted for use/modification
# Tim Jones/Linux Magazine provide this shell script with no warranty
# (implied or otherwise)

DOW=`date +%w`
DATE=`date +%D`
DAY=`date +%A`
DEVICE=”MY REWIND TAPE” # Rewinding tape drive
NDEVICE=”MY NO_REWIND TAPE” # non-rewinding tape drive
REWIND=”mt -f $DEVICE rewind”
EOD=”mt -f $NDEVICE eod”
FSF=”mt -f $NDEVICE fsf”
MAILLIST=”root” # list of users to receive backup notice

if [ $DOW = "6" ]
# This is Saturday, so append and write the whole system!
echo $DATE > /tmp/.LASTFULL
tar -cvvf $NDEVICE / >/tmp/backup.txt
$FSF 5
tar -dvf $NDEVICE >> /tmp/backup.txt
cp /tmp/.LASTFULL /etc/.LASTFULL
# this is not Saturday
case $DOW in
# it’s Sunday – nothing to do
exit 0
# Monday, Let the tape rewind for the difference verification
echo $DATE > /tmp/.LASTINC
tar -cvvf $DEVICE –newer `cat /etc/.LASTFULL` / >/tmp/backup.txt
cp /tmp/.LASTINC /etc/.LASTINC
# Other Days, must rewind manually and Inspect, -i
echo $DATE >/tmp/.LASTINC
tar -cvvf $NDEVICE –newer `cat /etc/.LASTINC` / >/tmp/backup
$FSF `expr $DOW – 1`
echo “************ Verifying ${DAY}’s backup” >> /tmp/backup.txt
tar –dvf $NDEVICE >> /tmp/backup.txt
cp /tmp/.LASTINC /etc/.LASTINC
mail $MAILLIST < /tmp/backup.txt
rm -f /tmp/backup.txt


I’ve provided a simple script that can be used as a model that takes into account both full and
differential backups. Before you attempt to use it, please examine the variables at the head of the
script and make changes to suit your specific environment. It is guaranteed to not work as

That’s it for this installment. Please remember that an unverified backup is worthless unless
you like gambling with your organization’s data!

Tim Jones, Vice President of Enhanced Software Technologies (The BRU Guys), has dealt with
system backup issues since 1984. Tim can be reached at tjones@estinc.com.

Comments are closed.