Kickstarting Cluster Nodes, Part 1

Designing, building, installing, and configuring a Beowulf-style cluster presents a number of challenges, even for very capable system designers and system administrators. Once decisions about topology, layout, hardware, and interconnect technologies have been made (based primarily on the needs of the software and models you intend to run), a considerable amount of work must still be done to forge a working cluster.

Designing, building, installing, and configuring a Beowulf-style cluster presents a number of challenges, even for very capable system designers and system administrators. Once decisions about topology, layout, hardware, and interconnect technologies have been made (based primarily on the needs of the software and models you intend to run), a considerable amount of work must still be done to forge a working cluster.

Installing Linux on all of the nodes is often the most time-consuming task. Moreover, installs don’t happen just once. Despite the tendency of most computational scientists to forego system upgrades unless absolutely necessary, clusters and networks do change (due to hardware upgrades, outright hardware failures, and the desire to leverage the latest software), requiring OS upgrades or de novo installs.

For most clusters, the operating system on each of the compute nodes is essentially the same. For system administrators lucky enough to have completely identical node hardware, any one of a number of system cloning packages can ease the burden of installation. For example, once a system image is installed and working as desired on a single node, you can replicate its disk image with an inexpensive software package like Norton Ghost (for more information on cloning systems, see last month’s “System Cloning” feature, available online at http://www.linux-mag.com/2002-12/cloning_01.html).

However, time tends to erode the advantages of having identical hardware: as time goes on, newer hardware is added to grow the capabilities of the cluster. In other cases — as with university departments, small research groups, and “computational opportunists” with limited budgets — clusters are often built out of a mix of hardware (including cast-off or otherwise unused PCs), or are constructed piecemeal by adding new nodes over time. What’s needed is an automated method that installs equivalent operating system images on similar, but not identical, hardware.

Enter Kickstart, the automated installation system for Red Hat Linux. Through the use of a single configuration file, Kickstart allows you to automatically answer all the usual questions asked by the anaconda installer used by Red Hat. Often employed to install Linux on any system directly over a network, Kickstart can be used to install Linux on heterogeneous hardware with a variety of disks, network cards, memory, and swap space.

This month, let’s install all of the services required to install Linux over a network using Kickstart. Next month, we’ll cover the details of configuring and using Kickstart. Although this discussion focuses on using Kickstart with Red Hat Linux 7.3, similar automated, networked installers are available in most other Linux distributions.

Prepare for Kickstart

Before you can use Kickstart, you have to create a Kickstart server running a number of network services. In particular, the Kickstart server provides:


  • Network addresses for the nodes using the Dynamic Host Configuration Protocol (DHCP).

  • Preboot eXecution Environment (PXE) services for nodes with PXE network booting capabilities (optional).

  • Trivial File Transfer Protocol (TFTP) and/or Multicast Trivial File Transfer Protocol (MTFTP) services to support PXE network booting (optional).

  • Network File System (NFS) services.

  • A Domain Name Server (DNS/BIND) and Network Time Protocol (NTP) daemon; and

  • A complete copy of the Linux distribution, along with the all-important Kickstart configuration files, all exported via NFS.

In a Beowulf cluster, a single beefy node usually serves as a master. Typically configured with more operating system packages as well as more disk and memory, the master node typically provides all of the network services and shared files needed by compute nodes. In most cases, the master also serves as the Kickstart server. Therefore, all the services listed above must be installed and configured on that machine.

If you already have a master server, make sure to install all of the latest system updates and security patches before you proceed further. Also make sure that the dhcp, pxe, tftp-server, nfs-utils, anaconda-runtime, bind, and ntp RPMs are installed. If you don’t have a master server yet, identify a machine with lots of disk space and RAM, and install Red Hat Linux, including all of the server features mentioned above.

Configuring DHCP

DHCP provides network addresses and additional information about the network to nodes on a network. DHCP is most often used to lease IP addresses to hosts dynamically, but it’s also used to provide fixed addresses to well-known hosts. In our case, DHCP will provide fixed IP addresses to every node in the cluster so that each node can be configured exactly the same way, without knowing what IP address to use ahead of time. In addition, the DHCP daemon on the master node will direct nodes that can boot over the network where to find their Network Boot Program (NBP). Once booted, DHCP will also tell each node where to find the Kickstart configuration files needed to automatically install the Linux operating system.

Listing One shows a typical dhcpd.conf file for a 64-node cluster using the 192.168.0.0 address space. In this example, the master node has an (internal network) IP address of 192.168.0.1, is providing DNS and NTP service to the nodes, and is serving as a router (employing IP masquerading) for the compute nodes. The configuration information for the compute nodes is contained within a group{} because they all share the dhcp-class-identified, vendor-encapsulated-options, next-server, and filename attributes. The dhcpd.conf file contains an entry for each node so the server can map each node’s Ethernet hardware (or MAC) address and to its corresponding IP address. (A variety of scripts and tools can harvest MAC addresses from nodes, but ultimately the MAC addresses need to be contained in this configuration file.)




Listing One: A sample DHCP configuration file (typically /etc/dhcpd.conf)


subnet 192.168.0.0 netmask 255.255.0.0 {
option routers 192.168.0.1;
option subnet-mask 255.255.0.0;
option broadcast-address 192.168.255.255;

option domain-name “penetrove.com”;
option domain-name-servers 192.168.0.1;

option time-offset -18000; # Eastern Standard Time
option ntp-servers 192.168.0.1;

group {
option dhcp-class-identifier “PXEClient”;
option vendor-encapsulated-options ff;
next-server 192.168.0.1;
filename “/u1/ks.cfg”;

host node02 {
hardware ethernet 00:E0:81:21:73:B8;
fixed-address 192.168.0.2;
option host-name “node02″;
}
host node03 {
hardware ethernet 00:E0:81:21:C1:22;
fixed-address 192.168.0.3;
option host-name “node03″;
}
.
.
.
host node64 {
hardware ethernet 00:E0:81:21:86:D3;
fixed-address 192.168.0.64;
option host-name “node64″;
}
}
}

The DHCP dhcp-class-identifier option tells nodes booting via PXE that they should contact a PXE server (usually running on port 4011) to obtain their NBP. next-server points to the PXE server, which also provides the Kickstart configuration file. The filename entry is the name of the Kickstart configuration file.

Under Red Hat 7.3, the DHCP server, once installed, should be enabled and turned on using chkconfig. Since the master node usually has multiple network interfaces, DHCP should be limited to requests originating on the internal cluster interface by setting DHCPDARGS=eth0 (if the first Ethernet interface, eth0, is connected to the internal network) in the /etc/sysconfig/dhcpd file.

Configuring PXE, TFTP, MTFTP

Despite the warnings still contained in much of the documentation, the PXE server can run on the same system as the DHCP server. Since DHCP has gotten better at providing information for PXE booting, it’s no longer necessary for the PXE server to provide proxyDHCP services. Therefore, only the normal PXE service on port 4011 is needed.

The /etc/pxe.conf file configures the PXE server. In Red Hat 7.3, add the following entry to that file to prevent the PXE server from using the DHCP port:


[UseDHCPPort]
0

Traditionally, system images for network booting were sent to client nodes via TFTP, a file transfer protocol that runs over UDP. However, the PXE server package now provides a multicast version of TFTP called MTFTP that does the same job, only faster. By default, the PXE server tells network booting clients to use MTFTP to obtain system images, but to fall back to TFTP if MTFTP is unavailable.

Unfortunately, the PXE RPM provided in Red Hat 7.3 does not come with a configuration file for xinetd — you have to build your own in /etc/xinetd.d/mtftp. A working version of that file (adapted from the corresponding file for TFTP) is shown in Listing Two.




Listing Two: The xinetd configuration for MTFTP (typically /etc/xinetd.d/mtftp)


service mtftp
{
socket_type = dgram
protocol = udp
wait = yes
user = root
server = /usr/sbin/in.mtftpd
server_args = /tftpboot
disable = no
per_source = 11
cps = 100 2
}

For both TFTP and MTFTP, disable should be set to no so xinetd can start those services as needed. To be secure, both services should be configured to only serve files contained in /tftpboot. Moreover, access to TFTP and MTFTP should be limited to the cluster nodes or the cluster network through these entries in /etc/hosts.deny:


in.tftpd: ALL
in.mftpd: ALL

Those two entries block all access to TFTP and MTFTP. To grant access to nodes within the address space of the cluster, add this to /etc/hosts.allow:


ALL: 192.168.

Establishing firewall restrictions to the TFTP and MTFTP ports (using ipchains or iptables) further enhances security (for a primer on firewalls, see this month’s Tech Support column on pg. XX). Finally, the following two entries should be added to /etc/services so that the PXE server and mtftp daemon get started up on the appropriate ports:


pxe 4011/udp
mtftp 1759/udp

For more information about network booting (especially for diskless nodes) and PXE, see “Network Booting,” in the October 2002 issue (available online at http://www.linux-mag.com/2002-10/netboot_01.html).

Configuring the Network Time Protocol Server

Keeping node clocks synchronized is important for a number of reasons. Typically the master node is setup as a Network Time Protocol (or NTP) server, and all of the compute nodes query it to keep their own clocks in synch. The master node, having a second network interface, usually obtains its time synchronization from an external source or from a central time server within the organization. The latest versions of NTP also provide some improved security mechanisms to limit access to the NTP daemon.

Listing Three shows an NTP configuration file (minus the usual comments) for the example cluster configuration. The first line tells ntpd daemon to ignore all queries. The second line makes the loopback interface (127.0.0.1) a trusted host. The third line allows all hosts on the 192.168.0.0 network (our cluster nodes) to access the master’s ntpd daemon to obtain time synchronization, but prohibits the nodes from modifying the master’s time. The fourth line establishes the central time server for the organization as a trusted device, while the fifth line tells ntpd to use this central time server for synchronization.




Listing Three: A sample Network Time Protocol configuration (typically /etc/ntp.conf)


restrict default ignore
restrict 127.0.0.1
restrict 192.168.0.0 mask 255.255.0.0 notrust nomodify notrap
restrict time.penetrove.com mask 255.255.255.255 nomodify notrap noquery
server time.penetrove.com
server 127.127.1.0
fudge 127.127.1.0 stratum 10
driftfile /etc/ntp/drift
broadcastdelay 0.008
authenticate yes
keys /etc/ntp/keys

To make sure that the master’s clock gets set properly at boot time, the /etc/step-tickers file should contain the name of an external or central time server:


time.penetrove.com

A Word About DNS

Caching DNS service can be provided by the master node by running named, if necessary. This is particularly useful for speeding up name server queries, especially if the compute nodes must access the external network or the Internet regularly by IP masquerading through the master.

However, because of the large number of security vulnerabilities that frequently show up in named, access to the local named daemon should be limited to the compute nodes using ipchains or iptables. (More information about configuring a secure DNS server is available at a number of websites, and will not be covered here.)

Configuring NFS

The master node usually contains much of the disk space within the cluster, and provides intranetwork access to this disk space via NFS. Users’ home directories are usually mounted onto all compute nodes, as are one or more sizable disk volumes containing programs and/or datasets. NFS services are configured in /etc/exports. The following two entries in /etc/exports make both /home and /u1 available to all compute nodes:


/home 192.168.0.0/255.255.0.0(rw,no_root_squash)
/u1 192.168.0.0/255.255.0.0(rw,no_root_squash)

We’ll store the full Linux distribution to be used with Kickstart on the /u1 partition. The Kickstart configuration file will also be contained on /u1, and compute nodes will use NFS to access all of the Kickstart files during installation.

Building a Working Linux Distribution

Now that all of the infrastructure services are configured and running, place a copy of the Linux distribution on the /u1 disk partition. The easiest way to get the distribution onto the disk is to either copy the Red Hat 7.3 CD-ROMs directly or download the files over the Internet. For this example, the files were copied from the CD-ROMs to /u1/redhat. Thus, the installation RPMs are contained in /u1/redhat/RedHat/RPMS/.

Inevitably, updated versions of some packages begin appearing within days of the release of a Linux operation system. New kernels are constantly appearing and new security enhancements are added to critical applications. Unfortunately, that implies more work: original RPMs get installed during Kickstart, and must then be replaced immediately with updated versions.

Fortunately, the folks at Red Hat make it fairly easy to update an on-line distribution with new RPMs. Simply remove the deprecated packages in RedHat/RPMS/, and replace them with the new versions from the appropriate updates/ directory from a Red Hat mirror ftp site. Next, rebuild the hdlist files contained in RedHat/base/ using genhdlist from the anaconda-runtime package as follows:


# cd /u1/redhat
# /usr/lib/anaconda-runtime/genhdlist `pwd`

While this manual package replacement step takes a few minutes each time a new Errata package comes out, it ensures that the next OS you install is up-to-date. Of course, nodes already up and running will have to be updated manually or by way of a script. (An upcoming Guru Guidance column will demonstate some clever tools that make ongoing maintenance of software on any group of machines easier.)

Stage Images for Network Booting

Red Hat provides a kernel and an initial ramdisk image for booting with PXE. These files, located in images/pxeboot/, must be copied to the appropriate directory in /tftpboot/. The PXE RPM in Red Hat 7.3 creates a linux.0 file in /tftpboot/X86PC/ UNDI/linux-install. This file is the NBP that subsequently loads the kernel and ramdisk image, which must be named linux.1 and linux.2, respectively. These two files should be staged as follows:


# cp /u1/redhat/images/pxeboot/vmlinuz \
/tftpboot/X86PC/UNDI/linux-install/linux.1
# cp /u1/redhat/images/pxeboot/initrd.img \
/tftpboot/X86PC/UNDI/linux-install/linux.2

Up Next: The Kickstart Configuration

Now that all the needed services are configured and the fully-updated Linux distribution is on-line, we’re ready to build a Kickstart configuration file. And we’ll do exactly that next month.



Forrest Hoffman is a computer modeling and simulation researcher at Oak Ridge National Laboratory. He can be reached at forrest@climate.ornl.gov.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62