Suppose you arrive at work one morning and find a hundred new computers on the loading dock. Your boss tells you to fully set them up — including installing Linux — and have them on desks by the end of the week. Ugh! How would you tackle this unexpected job and still get your normal work done?
Once you’ve physically assembled the computers, you have many options for handling the software part of the job. You could install each system by hand — an approach that gets the job done, but one that would likely force you to neglect all of your other duties. You could make a hundred copies of the installer CDs and install them all at once. You could perform network installs, but then you’d be limited by the available network bandwidth. And unfortunately, in all of these scenarios, you’d still have to kick off each installation by answering all of those prompts. Moreover, you’d probably spend additional time and effort installing other software on to each system.
System cloning is an approach that speeds and simplifies installs across many similar systems. If you run a server farm or support a large set of desktops, system cloning is a great way to build and deploy systems fast. In fact, no matter how many machines you administer, system cloning can save time, reduce downtime, and simplify maintenance.
A hundred systems? No problem.
Prerequisites and Preliminary Decisions
There are two main approaches to cloning: you can automate the installation process, or you can load a pre-prepared disk image on to the hard disk of each new machine. Each approach has many variations and choices. In this article we’ll look at some of the most useful techniques.
System cloning works best when the target systems are very similar to one another: similar hardware devices, similar operating system and supplemental software, and similar system configuration. Of course, there’s always some system-specific items to handle after cloning, but cloning is easiest when those are kept to a minimum. Thus, strategies such as configuring workstations as DNS and DHCP clients, and using a name service for user account information are good ideas.
As you prepare for cloning, you must also make some basic decisions:
Where is the master operating system installation or disk image going to be located? On the network, or on removable media like a CD or DVD?
How will each individual system be booted? From a floppy disk, a CD or DVD, or via a network boot image?
How will you access the master data if it’s on the network?
Ideally, these considerations are taken into account before hardware is ordered. For example, if you plan on booting from floppy disk for the installation, don’t order systems without a floppy drive. Similarly, if you plan on network booting, make sure that the network adapter in each system supports that operation (for more information on network booting, see the October 2002 feature “Network Booting,” available online at http://www.linux-mag.com/2002-10/netboot_01.html).
Automated, Unattended Installations
There are a variety of software packages that automate the initial portion of the installation process where you typically must respond to many screenfuls of prompts. Using one of these packages, the installation can be unattended: once it’s initiated, no further administrator intervention is required. You start it, walk away, and when you return, there is a newly installed system, already rebooted and ready for use.The best known of these package is KickStart, provided by Red Hat. Other, similar packages are listed in the sidebar, “Other Automated Installation Tools.”
To use KickStart, you create a configuration file that contains information you’d otherwise have to provide at the install prompts, and then pass the keyword ks to the kernel at the installation boot. KickStart can use install media or a network-accessible installation directory as the source of the Linux operating system software.
Network accessible media is often the most convenient when you have many installations to perform. It’s also easy to set up. Create a directory (that’s readily-accessible from anywhere on your network), and copy the RedHat subdirectory from each of the installation CDs to that location. For example, assuming the first CD is in the drive, this command copies the necessary files to the central location:
# cp -ar /mnt/cdrom/RedHat /location
Simply insert each installation CD into the drive in sequence, and repeat the cp command shown above for each one. Note that the installation CD is mounted in the conventional location for Red Hat systems.
The install directory must then be exported via NFS. Be sure that the directory is exported in read-only mode, and include the no_root_squash option. Also, ensure that the systems to be installed are granted access to the share. Finally, the directory tree should be protected against unauthorized modification for security reasons.
The next step is to create a KickStart file configuration named ks.cfg. ks.cfg is a text file containing keywords that specify various system and installation options. The options are very easy to understand. Listing One is a simple example that’s been annotated for clarity.
Listing One: A sample Kickstart configuration file
There are three sections in this sample file: the first section contains installation prompt responses (specified by keywords and options); the second section is package selections (introduced by %packages); and the third section is a post-installation script (introduced by %post). (Although we’ve wrapped some lines to fit,) each entry within the file must be on a single line.
The entries in the first part of the file are generally self-explanatory. First, the installation and system languages are specified, followed by entries for the keyboard, mouse, time zone, and root password (specified here in encoded form). Next, the installation type (new install or upgrade) and the location of the installation source are given, followed by the boot loader selection and reboot after installation option. The third group of commands in the first section partitions the hard disk, creating /boot as 8 MB, / as about 7 GB, swap as an 800 MB partition, and a fourth partition which will be mounted at /data. The latter partition will use all remaining space left on the hard disk. Under Red Hat 8.0, you can also configure RAID devices and the Logical Volume Manager from within a KickStart file.
The file then goes on to specify the user authentication method (here, a shadow password file using MD5 encoding, although you can also specify a name service or Kerberos), and how to configure X on this system.
The second section of the file contains a list of packages to be installed. The first three items in the list are package groups as defined in the standard Red Hat installation (see below). The final entry is an additional package. Any prerequisite packages that these selections depend on will also be installed (indicated by the –resolvedeps option to %packages). Note that the mandatory groups are always installed (and thus need not be listed).
The final section of the file contains a post-installation script that runs after the install completes successfully and just prior to system reboot (if the latter is selected). In this case, the script is very simple, but obviously a much more elaborate script could be provided. There are a wide range of activities that might be performed by the post-installation script, including adding additional user accounts, enabling and disabling services and daemons, copying additional files from remote systems, generating an SSH key, appending to or editing system configuration files, and so on.
If desired, a pre-installation script can also be used. This is specified by a section headed by %pre, placed before the %post section.
Figure One: ksconfig, the KickStart GUI configuration tool
The ks.cfg file is easy to create with any text editor. If you prefer a GUI interface, try the ksconfig utility. Figure One displays some of the ksconfig panels (most of which correspond to the example configuration file in Listing One). The utility creates a file named anaconda-ks.cfg.
The final preparatory step is to make the KickStart configuration file accessible to the installation process. One of the simplest ways is to use a floppy disk and copy the file there. Alternatively, the file can be placed in a network location.
If you choose to use a boot diskette to start the installation, you may need to fiddle with it a bit to make room for the ks.cfg file. You can delete the many .msg files on the diskette — they contain help text which won’t be needed. You may also want to replace the syslinux.cfg file on the diskette with a much simpler one. For example, this file will initiate a KickStart installation immediately after booting without you entering any command at all:
See the manual page for more information about syslinux.
KickStart has many advantages. It’s a free solution with a fair amount of flexibility. The same template KickStart configuration can handle disks of different sizes and differing network hardware on the target systems. Other hardware differences (e.g., video card and monitors) can be handled by using several template KickStart configuration files or a ks.cfg file generator that creates an appropriate file on demand (many such front-ends are available).
Post-installation system configuration is possible via the post-install script, although writing a thorough script involves some work (and may require per-system customization). Adding additional software to a KickStart installation is more work, but it can be done (see the sidebar “Customizing the KickStart Installation Source”).
Customizing the KickStart Installation Source – Part 1
There are several reasons why you might want to modify the installation source as delivered on the official installation media: you want to add additional packages, want to remove unwanted packages from package groups, or want to replace existing packages with updated versions, and so on.
The installation source contents are described in files located in the base subdirectory of the RedHat directory on the first installation CD. The files are comps.xml,a file that defines package groups and package dependencies, and hdlistand hdlist2,which hold binary data about the package RPM files themselves. The contents of these files must match the installation files or there’ll be problems.
The comps.xmlfile is an XML version of the traditional compsfile present in earlier versions of Red Hat Linux. comps.xmldefines packages and their dependencies, package groups, and the package selection menus available during installation. The conversion to XML format occurred with Red Hat 8.0.
This package is named usbutils (USB utilities). The package name is simply the RPM file name minus any version-specific information. The package entry defines the dependencies for this package (in this case, the glibc and hwdata packages).
The dependency list can be empty, but the delimiters must still be present. If you want to add a package to the installation source, you need to add an entry for it to the end of the comps.xmlfile. If you’re updating an existing package to a new version, the package entry probably won’t need to be modified (but you should verify this).There is no need to remove package entries from the file.
The groups entries at the beginning of the file define named groups of packages that can be selected and/or installed as a unit. Here is an example showing its structure (modified a bit from its original form):
This group is referred to as text-internet within comps.xml.
The name and description fields are used in installation menus. Many groups also include additional name and description attributes for other languages by including the xml:lang option in the opening tag, as in this example:
uservisible indicates whether this group is visible to users during installation (it is in this case), and the default entry indicates whether the package is installed by default (again, here it is).
The grouplist consists of a list of prerequisite groups. In this case, there is just one: the base group. Occasionally, the tag metapkg may also be used in this section. It’s used to specify a meta group that consists of a list of groups.
The list of packages making up the group follows the group dependencies. Each package is included as a packagereq tag. The type attribute in the packagereq tag specifies whether the individual package is required, and whether it is installed by default or not. Mandatory packages are always installed. The system administrator may decide whether to install ones that are marked optional (not installed by default) or default (installed by default).
Adding a new group to the file is very easy (copying and editing an existing group specification makes getting the structure correct easy). Modifying a group to add additional packages is also straightforward: add the appropriate packagereq attributes to the desired group, and also add any group dependencies.
Removing a package consists of removing the corresponding packagereq attribute, or, better, changing its type to optional. There is no need to remove a group as you can simply change the group’s default attribute to false (you may need to add it). If you are making major changes to a group, the best practice is to make a copy of the entire stanza, change the id of the original, and modify the copy.
The last part of the groups listing is headed by the following comment:
<!– META GROUPS –>
The group entries that follow are defined solely by a list of dependent groups (i.e., a grouplist). These are so-called meta group: super groups consisting of sets of normal groups.
The group hierarchy section defines how items appear in the installation package selection menu, including their division into categories and ordering within the list. Here is a small part of this section:
The group hierarchy consists of a series of category definitions. Each of these defines a name attribute and a list of subcategories. Each subcategory attribute lists a group name which is part of that category. The following categories are defined as of this writing: Desktops, Applications, Servers, Development, and System.
You can easily add an item to an existing category. For example, the documentation entry in the System category above does not appear in the delivered comps.xmlfile. You can define a documentation group consisting of the contents of the Documentation CD included in the installation package, and add it to the regular menu hierarchy.
If you’ve modified the comps.xmlfile or have replaced any RPM files, you must run the genhdlistcommand to update the hdlistand hdlist2files in the base subdirectory. This utility is part of the anaconda-runtime package. The command takes the installation source directory as its argument (i.e., the directory where the RedHat trees were copied).
KickStart’s disadvantages? If you want to install multiple systems at the same time, you’ll need copies of the boot floppy or other booting media for each machine, unless you rely on network boots. In addition, if you’re using a network installation source — as is typical — then the number of systems that can be installed simultaneously is limited by the available network bandwidth. Using a separate LAN for the installation can be very helpful.
“Ghosting” is the colloquial term for copying a disk image to a new hard disk as a one-step operating system install. The term ghosting comes from the name of a commercial package, Ghost, from Norton (http://www.norton.com). PowerQuest offers a similar product named Drive Image (see http:// www.powerquest.com). Both Ghost and Drive Image support Linux to some degree, but Ghost provides better Linux documentation (although the documentation’s quality is still only fair).
Ghosting allows you to copy an image of a hard disk or an individual disk partition directly to another hard disk or partition, or to an image file. Image files can be stored on CD, DVD, or hard disk, and can later be written to one or more target disks. For Linux, the most common operations are storing a disk image to an image file, and restoring an image file to a new disk. Storing the entire disk image ensures that the boot program is also copied to the new systems.
These are the steps involved in ghosting a Linux system:
Figure Two: Ghost boot disk options
Set up the master system. Install Linux and additional packages, and perform as many other system configuration tasks as you want to do.
Create a Ghost boot disk suitable for the master system.
Boot the master system with the boot disk. Create a disk image in a network-accessible file system. Alternatively, you can write the disk image to a CD or DVD.
Create a Ghost boot disk for the target system(s).
Boot the target system into Ghost, and load the drive image onto the system’s hard disk.
Ghost is primarily a Windows product that includes support for Linux ext2 and ext3 file systems. You have to use a Windows system to create Ghost boot disks, but this is the only operation where Windows is required. The boot disks themselves boot into PC-DOS, the old IBM version of DOS.
Ghost allows you to create several varieties of boot disks (see Figure Two). For Linux, the type of boot disk you need depends on where the disk image file is stored. Table Two summarizes the options.
Table Two: The types of Ghost boot disks
Target System’s Boot Disk Type
CD or DVD
CD/DVD Startup Disk with Ghost
Windows computer’s file system
Peer-to-Peer Network Boot Disk or Drive Mapping Boot Disk
Linux computer’s file system
Drive Mapping Boot Disk
The start up disk with CD/DVD support allows the target system to load the image file from Ghost-created media. The network boot disk allows the target system to load the image file from the Windows server where it is stored. The drive mapping boot disk allows the target system to load an image file from an SMB mapped network drive. The system storing the image can be either a Windows system or a Linux system functioning as a primary domain controller via Samba. In this latter case, the image file can be stored on any type of Linux file system. Note that in each case, the target system acts as the Ghost master, and the system containing the image file is the Ghost slave.
Ghosting has several advantages. It’s cheap. The only extra cost is the modest cost of the Ghost software. It’s very flexible since the source of the drive image can be stored on a CD or DVD, as a network-accessible file, or on an existing hard disk. Ghosting also allows you to perform as much system customization as you need to once, and directly on the master system, rather than having to run a post-installation script on every computer. That can save a lot of time.
Ghost can also write an image file to a larger disk provided that the Linux file systems are all ext2 or ext3. Ghost will expand the partitions in the image file proportionally on the target disk. Ghost can be used with other file systems (e.g., ReiserFS), but it can’t expand the underlying partitions — you’ll have to work with identically-sized disks.
Ghosting has some drawbacks, too. For Linux systems, it’s easy, but not always fast. For example, on Windows the Ghost program can save only the actually-used portion of Windows file systems, but it always does a sector-by-sector copy of Linux file systems. Ghost can also be limited by network bandwidth, and it incurs the overhead inherent in remote file access. You must also perform system-specific customization on each system after the copying is completed. Finally, although you can handle differences in network adapters via the boot disks (which are adapter specific), other hardware differences, such as video adapters, may need to be handled manually.
A completely free kind of ghosting is possible between two disks in the same system or between two systems connected by a network. The target disk needs some prior preparation: partitions must be created, and file systems must be built. In addition, a boot loader should also be installed in the MBR, either before ghosting or afterwards (by booting off a floppy). For a network ghost operation, the target system must run an NFS server.
Once you’ve prepared the target system and disk, mount each of the file systems on the master system, and then execute commands like these:
# (cd /mnt; dump 0f /dev/hdxn | restore xf -) &
where xn designates the appropriate partition on the source system (e.g., /dev/hda1).
If you need to install a lot of systems using this method, use a designated ghosting system where you can install each target hard disk in turn (as network writes are slow).
The final approach to mass installations that we’ll consider is hardware disk duplication. Disk duplication consists of copying a hard drive directly within a special device specially designed for the purpose. It’s like ghosting without the middleman. As with ghosting, you create a source disk containing the desired system configuration, including as much general customization as you’d like. Then, place that disk and a blank disk into the duplicator, and you’ll have a copy of the disk very quickly, usually in well under 5 minutes (depending on the size of the disk).
Figure Three: Logicube’s OmniClone 2U disk duplicator
Figure Three shows one such device, the OmniClone2U from Logicube (http://www.logicube.com). This device can make a single copy of a master disk, and includes a USB interface for controlling it from a workstation. Larger devices can make more simultaneous copies. Other companies that produce these devices are Corporate System Center (http://www.corpsys.com) and the An Chen Computer Company (http://www.copystar.com.tw).
Disk duplicators offer the fastest solution to cloning systems. Because you must open the case of each system, they are most useful when you are building the computers yourself (or at least in house). They are expensive, however. Two-disk duplicator units start at about $2,000. Larger systems are more — a device for five disks costs around $3,600.
When used with Windows disks, some disk duplicators can resize file systems, but this is not currently supported for Linux file systems. Thus, they must be used with identical hard disks.
This method also provides no way to adapt to other differing hardware between the master and target systems other than correcting the configuration manually once the disk is installed and the system is booted.
Once you’ve succeeded in installing those one hundred systems, your next challenge is keeping them all up to date and in the state you want them. None of the tools examined in this article address these points.
There are several tools for keeping system software up-to-date, including the Red Hat Network, the Ximian package (http://www.ximian.com), and Cobalt BlueLinQ from Sun. These packages work by having a system connect to an Internet site where it’s examined; any required software updates are automatically downloaded and applied.
Keeping existing systems properly configured is a job for a different sort of tool. The best known of these is the Cfengine package written by Mark Burgess. Cfengine can also be used to automate package addition, updates, and many other similar configuration tasks.
Cfengine will be discussed in an upcoming Guru Guidance column in this magazine.
Æleen Frisch is the author of Essential System Administration. She can be reached at email@example.com.
Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62