Backup Part One: Preparation

Don’t put off backups. Here’s how to plan a Linux backup strategy.
Have you backed up your computer recently? If you’re like most people, the answer is “no.” Unfortunately, that’s a bad answer. Think about everything that’s stored on your computer — business correspondence, unfinished programs, tax documents, family photographs, and so on. Now imagine that data vanishes without a trace. How long will it take you to recover or re-create your data files? Will it even be possible?
Some causes of data loss can be addressed through means such as implementing a RAID array (described in the February through April “Guru Guidance” columns). Such approaches aren’t a substitute for backing up your data, though; even a RAID array isn’t protection against fire, theft, or human error, for instance.
Unlike most “Guru Guidance” columns, this one is short on concrete descriptions of tools you can use to accomplish a task. Instead, this month’s article emphasizes creating a backup plan — deciding what hardware, software, and general procedures to use to keep your data safe. Next month will cover one of the trickier aspects of backup: What to do when the worst happens and you need to recover an entire system from a backup.

Deciding What to Back Up

You should begin your backup planning by deciding what needs to be backed up. The easiest and, in some sense, safest answer is “everything.” Backing up the entire directory tree ensures you can recover a system to a blank hard disk with minimal fuss. In practice, of course, you can safely omit dynamic filesystems such as /proc. You can probably also omit network mounts and removable media, although you may need to back them up separately.
Unfortunately, backing up an entire computer is time-consuming and requires high-capacity backup media. Furthermore, in some sense, you’ve already got backups of Linux programs and most system files in the form of your Linux distribution media. For this reason, many people confine their backups to user data (/home and perhaps certain directories in /var, such as mail spool directories) and site-specific system configuration files (most of which are stored in /etc). Restoration of system files, when necessary, is then done from the original Linux distribution media.
On the other hand, a full backup makes an entire restoration far easier and faster in the event of a disaster, complete with any software upgrades and site-specific configurations. You can also be sure you won’t have forgotten files that might be located in some unusual location.
One compromise is to perform a full backup infrequently and perform a less complete backup on a more regular basis. The upcoming section, “Planning a Backup Schedule,” covers this possibility in more detail.

Backup Hardware Options

Once you’ve decided how much of your system you want to back up, you should begin investigating backup hardware options. Possibilities include:
*Tape. Magnetic tape has been the traditional backup medium of choice because of its low cost per gigabyte and high capacity. Today’s individual tapes may not be able to back up complete systems, though; many formats top out at 160 GB or less per tape. Tape changer hardware can be useful for large systems and network backups. Unlike most storage media, tapes are sequential access devices, meaning that their data must be accessed in linear sequence. This isn’t a problem when backing up, or when restoring all data, but it can slow down partial data restores. Many tape drives include built-in data compression features, and manufacturers often advertise compressed data capacity, so watch out for that when evaluating hardware.
*Optical media. CD-R, CD-RW, and recordable DVD formats have become popular backup media. These formats are best for backing up small systems (thin clients, for instance) and personal data. At 700 MB, CD-R and CD-RW are too small to hold anything but the smallest complete systems. Recordable DVD formats can hold more, but not enough to back up large workstations or servers. These formats do have the advantage of easy read access on a wide variety of systems, which can simplify data restoration in a network environment; rather than deal with a network restore, you can physically move the backup media to the downed system to recover data.
*Removable hard disks. Removable drive bays and external hard disks make backups to hard disk possible. Buy a removable bay and a few hard disks and you can back up your data to a drive that’s as large as the source drive itself. The media (that is, the hard disks) are expensive on a per gigabyte basis, but the removable drive bay is much less expensive than most tape units, making this approach cost-competitive with tapes in many environments.
*Low-capacity removable disks. Zip disks, LS-120 disks, magneto-optical disks, and even floppy disks can be used for backup purposes. In today’s world, though, these media are so slow and low in capacity that they’re useful only for backing up specific projects or system configuration files. Such disks can be a useful part of an emergency restore plan, though. Install ZipSlack (http://www.slackware.com/zipslack/) or a similar slim Linux system on a removable disk and you can use it for emergency data recovery if your main system becomes unbootable. CD-based distributions can also be used in this role.
Broadly speaking, removable or external hard disks and tape units make the best general-purpose backup tools. Optical media can be useful in certain roles, such as for backing up a standard workstation configuration that can be easily replicated at a site with dozens of workstations. (Such a site would likely use a network backup tool and tape or removable disks on a backup server to handle user data.) You should probably research the prices of both disk and tape backup options of an appropriate size to handle your backup needs. Remember to include the price of both the main backup unit and its media. Ideally, you should include enough media to handle at least two or three backup sets — that way, if your most recent backup set goes bad, you can fall back on the next-most-recent set. You may also want to keep at least one backup set off-site, to protect from disasters such as fires or physical theft of your equipment.

Backup Software Options

Linux backup software ranges from simple file copies using the cp command to elaborate tools for performing network backups. Some of the more popular backup tools include:
*tar. This venerable tool works well with tape drives, but it’s also useful for backing up to optical media (with the help of additional programs, such as cdrecord) or removable disks.
*cpio. This program is similar to tar in overall capabilities and features, but it differs in operational details.
*dd. You can back up an entire partition, even if Linux can’t read the underlying filesystem, using this program. Restoring individual files can be difficult, though. It’s best used on multi-boot systems to back up a small but obscure OS from Linux. It’s also handy to make backups of floppy disks to other media.
*Partimage. This program is similar to dd in that it backs up an entire partition, but it’s more flexible and useful as a backup tool. You can read about Partimage in the April 2005″ Guru Guidance” column, available online at http://www.linux-mag.com/2005-04/guru_01.html.
*AMANDA. The Advanced Maryland Automatic Network Disk Archiver (AMANDA, http://www.amanda.org) automates backups on a network. One system functions as a network backup server that automatically backs up all the other systems on a regular basis. Its setup can be a bit tedious, but it’s a very powerful and flexible tool.
*Arkeia. This commercial program, headquartered at http://www.arkeia.com/, is a backup tool with many advanced features, including network and cross-platform backups.
*BRU. The Backup and Recovery Utility (BRU, http://www.tolisgroup.com/) is another commercial backup program. It features an easy-to-use GUI front-end.
One caution: The dump and restore programs, long staples on Unix systems, are unreliable with recent versions of Linux and should be avoided.
If none of these programs seems to be your cup of tea, try doing a Web search on “Linux backup.” Plenty of smaller and more obscure programs are available, so you may find one that’s ideally suited to your needs.
GUI front-ends to some of these tools, such as KDar (http://kdar.sourceforge.net), also exist. These programs can help those who are more comfortable with a GUI than with command-line tools perform backup tasks.

Network Backup Options

Some of the backup tools just mentioned are designed with explicit network support. AMANDA and Arkeia, in particular, are useful for backing up mid-sized and large networks. (Be prepared to spend a lot of time and money on the backup server itself, though; you’re likely to need an expensive tape changer to handle more than a few systems.)
Smaller networks, such as home or small office networks, may not need such an elaborate setup. You can back up network data using tar and NFS mounts, for instance. On the system without a tape drive, configure NFS to export the directories you want to back up. On the system with a tape drive, mount those NFS exports and back them up just as you would any other directory. To restore data, reverse the process. You can do the same with Samba or other network tools to back up non-Linux systems.
One problem with such simple network backup procedures is that you may run into permissions problems. NFS normally squashes root access, meaning that root on the NFS client system has no more privileges than another user (typically nobody) on the NFS server system. This fact makes backing up system files difficult or impossible without disabling the squashing feature, but doing this is inadvisable for security reasons.
One way around this problem is to reverse the backup operation so that the system you want to back up initiates the process. You can send a tarball of the data you want to back up to the system with the backup hardware (via NFS, an SSH file transfer, or some other method) and then save it on the backup system. These operations can be coordinated by running programs remotely, cron jobs, or by directly accessing the backup hardware from the remote system. If you find you need to set up complex scripts to handle your needs, though, you may want to investigate AMANDA, Arkeia, or some other network-enabled backup tool.

Planning a Backup Schedule

One of the most critical issues you must address when planning how to back up your computer is how often to perform a backup. This question can actually have a fairly complex answer because you can perform different types of backups:
*Full backups. These backups store most or all of the data on your computer, or at least all the data you want to have available for subsequent restoration. Full backups are time-consuming but make for easy restoration in the event of a total system failure.
*Incremental backups. To save time and backup media space, you can choose to back up only those files that have changed since your last full backup. The downside is greater complexity when restoring data; you may need a full backup and one incremental backup. You can also end up restoring some unwanted files: if a file was deleted between the last full and incremental backups, it might be restored.
*Differential backups. These backups are like incremental backups, but they omit files that have been backed up by any means (full, incremental, or differential). The result is still greater savings in time and backup media space compared to incremental backups, at the cost of still greater restore complexity. You could need to restore data from several backups to recover all files.
Some people and software doesn’t distinguish between incremental and differential backups. Before implementing a plan, be sure you understand how your software handles these backup types.
Many backup plans include two or all three types of backup. For instance, you might perform a monthly full backup, a weekly incremental backup, and a daily differential backup. This minimizes the time required for the backups on most days, but you might need up to eight backups to restore data: your latest full monthly backup, the most recent incremental weekly backup, and the differential backups for up to six days. If you’re lucky, though, your failure will occur immediately after a monthly full backup, so you might need just one backup to restore the system.
You can usually fit several incremental or differential backups on a single physical backup medium, such as on a single tape. A schedule such as the one just described might require five media: One for the full backup and one for each incremental backup and that week’s differential backups. To keep three complete backup sets, you’d therefore need fifteen backup media.
Just how often you should perform a backup is a very system-specific issue. A workstation that stores most user data on a file server might need very infrequent backups, since the workstation’s data will change little on a day-to-day basis. The file server, though, might need daily backups, particularly if that server holds the data for many users.
Ask yourself how much time and effort it would take to recover lost data for any given backup schedule. If a schedule you’re considering would require hundreds or thousands of hours of work to re-create lost data, consider increasing the backup frequency. If the number of hours required is low, consider decreasing the backup frequency.

Performing a Backup

Actually performing a backup is a very software- and hardware-dependent procedure. Using a tool such as tar, the process might entail a single command, such as:
tar cvplf /dev/st0 /home /usr /var /
The l option tells tar not to back up subdirectories that reside on separate filesystems; hence, this command separately lists every filesystem that must be backed up: /home, /usr, /var, and /. The details of what filesystems you would list, of course, depend on your system’s configuration. You might also want to add or remove options, change the output device (/dev/st0, a SCSI tape unit, in this example), and so on.
Ultimately, you’ll have to research the options for your chosen backup software to learn how to perform a backup of your system. You may also want to automate the task. You can do this by performing a backup via cron or by using software-specific scheduling tools. If you do so, be sure you have a system in place to change backup media on a regular basis. You might want to include an email or other automated reminder to yourself in your scripts.

Next Month

Next month’s “Guru Guidance” will look at some of the issues in restoring data from a backup, with an emphasis upon complete system restores — that is, recovering from hard disk failures or other disasters that require you to restore a system “from scratch.”

Roderick W. Smith is the author or co-author of over a dozen books, including Advanced Linux Networking and Linux Power Tools. He can be reached at class="emailaddress">rodsmith@rodsbooks.com.

Comments are closed.