dcsimg

Kickstarting Compute Nodes, Part 2

Installing Linux on more than a few nodes in a cluster is time consuming, boring, and potentially fraught with error. Since it's typically desirable to have a nearly identical operating system on each node, selfsame, full installs and configurations must be performed repeatedly. While disk cloning is a good solution for clusters with the exact same hardware, cloning hard drives can be problematic on nodes with a variety of different disks, network interfaces, and processors.

Installing Linux on more than a few nodes in a cluster is time consuming, boring, and potentially fraught with error. Since it’s typically desirable to have a nearly identical operating system on each node, selfsame, full installs and configurations must be performed repeatedly. While disk cloning is a good solution for clusters with the exact same hardware, cloning hard drives can be problematic on nodes with a variety of different disks, network interfaces, and processors.

To save time and effort and to adapt to variations in hardware, automated installs are often the best solution. Kickstart, the automated installer for Red Hat Linux is one popular tool that performs “hands-off” installs (other Linux distributions offer similar packages).

Kickstarting Kickstart

Last month’s Extreme Linux column (available online at http://www.linux-mag.com/2003-01/extreme_01.html) introduced Kickstart and described the many infrastructure services required to successfully use Kickstate to install Linux over a network. As we saw last month, an appropriate server — usually the master or front-end node of a cluster — must provide:


  • Network addresses for the nodes using DHCP (the Dynamic Host Configuration Protocol)

  • Preboot eXecution Environment (PXE) services for nodes with PXE network booting capabilities (optional)

  • Trivial File Transfer Protocol (TFTP) and/or Multicast Trivial File Transfer Protocol (MTFTP) to support PXE network booting (optional)

  • Network File System (NFS) services

  • A Domain Name Server (DNS/BIND) and a Network Time Protocol (NTP) server, and

  • A complete copy of the Red Hat Linux distribution, along with the all-important Kickstart configuration file, all available via NFS.

Last month’s column also described how to configure these services for a typical cluster environment, and showed how to build a fully up-to-date Red Hat distribution that could be installed on all of the nodes.

This month, let’s continue with Kickstart and build a Kickstart configuration file pared down appropriately for thin compute nodes.

Building the Kickstart Configuration File

The Kickstart Configurator, ksconfig, is a graphical tool that constructs a Kickstart configuration file (you can find more information on ksconfig in the “System Cloning” feature in the December issue of Linux Magazine, available online at http://www.linux-mag.com/2002-12/cloning_01.html). ksconfig presents a series of panels that you fill out with appropriate configuration information.

While ksconfig helps build a usable configuration file, you may want or need more sophisticated customization for your cluster. Another option is to bootstrap your Kickstart cutomization with a configuration file based on a system that’s already up and running. As it just so happens, every Red Hat install results in a usable Kickstart configuration file that captures all of the parameters and options used to install that running operating system. After Red Hat Linux is installed on a system, you can find this file at /root/anaconda-ks.cfg. Kickstart configuration files may be edited with any text editor.

Listings One through Three show portions of a Kickstart configuration file that’s usable with the sample cluster described last month. Here the Kickstart file has been divided into multiple listings just for clarity; the actual configuration file is just one file (you can download the entire sample Kickstart file from http://www.linuxmagazine.com/downloads/2003-02/extreme/anaconda-ks.cfg.txt).

Listing One shows a number of Kickstart directives, or parameters that control the installer itself. For compute nodes, we do not want any kind of X installation, hence the skipx directive. The network directive tells the installation process to use the DHCP server (established in last month’s column) to obtain a network address. The firewall directive tells the installation to firewall all ports except the one used by ssh, but to trust any traffic on eth0, the internal/private cluster network. The nfs directive tells the installation to find its distribution tree from the NFS server on 192.168.0.1 (the master node), under /u1/redhat.




Listing One: Kickstart directives


# Penguin Cluster Kickstart file
install
lang en_US
langsupport –default en_US.iso885915 en_US.iso885915
keyboard us
mouse generic3ps/2 –device psaux –emulthree
skipx
network –device eth0 –bootproto dhcp
rootpw –iscrypted $1$AIAneuom$CH/i03qzAJ8C2swmyYpLs.
firewall –high –port ssh:tcp –trust eth0
authconfig –enableshadow –enablemd5
timezone America/New_York
bootloader
nfs –server 192.168.0.1 –dir /u1/redhat
reboot
clearpart –linux
part /boot –fstype ext3 –size=36 –asprimary
part / –fstype ext3 –size=5200 –asprimary
part swap –size=512 –asprimary
part /scratch –fstype ext3 –size=1 –grow –asprimary

The clearpart directive removes any existing Linux partitions. The part directives creates a /boot partition of 36 MB (–size=36), a / (root) partition of 5.2 GB –size=5200), a swap partition of 512 MB, and the rest of the disk (-size=1 –grow) for a /scratch partition. All are primary partitions (denoted by –asprimary), and the non-swap partitions are ext3 filesystems.

(More detail about all of the Kickstart directives is provided in December’s “System Cloning” article.)

Listing Two shows the %packages section, which describes the package groups (starting with an @ symbol) and individual packages that should be loaded onto the target system. The –resolvedeps flag causes the installer to resolve all dependencies among selected packages, loading additional packages as needed.




Listing Two: The list of package groups and individual packages to install


%packages –resolvdeps
@ Printing Support # Groups of packages defined by
@ Network Support # Red Hat
@ NFS File Server
@ Network Managed Workstation
@ Utilities
@ Legacy Application Support
@ Software Development
glib2-devel # Additional individual packages
Glide3-devel # needed or desired on compute nodes
kernel-smp
compat-egcs-c++
rsync
compat-egcs-g77

Existing package groups may be redefined, and new package groups may be added by editing the Red Hat/base/comps database (again, see December’s “System Cloning” feature).

However, many admins prefer to keep their distribution tree pristine, and instead use their own post-install scripts to customize their installs.

Customizing Your Install

A script placed in the %post section of the KickStart configuration file runs after the Red Hat installer finishes. For example, you can write a shell script to enable and reconfigure services, add users, and install additional software packages.

Listing Three shows a %post shell script that customizes the install on each cluster node.




Listing Three: A KickStart post-install script – Part 1


%post
#/bin/bash
#
PATH=/sbin:/usr/sbin:/bin:/usr/bin
export PATH
#
# Penguin Cluster Node Assimilation Script
echo “Commencing Penguin Cluster Node Assimilation”
#
# Setup hosts table
echo “* Building /etc/hosts”
cat > /etc/hosts << EOF
127.0.0.1 `hostname` localhost.localdomain localhost
#
192.168.0.1 node01 master
192.168.0.2 node02
192.168.0.3 node03
.
.
.
192.168.0.64 node64
EOF
#
# Configure ipchains for needed service, block everything else
echo “* Reconfiguring the firewall rules”
cat > /etc/sysconfig/ipchains << EOF
:input ACCEPT
:forward ACCEPT
:output ACCEPT
-A input -s 0/0 -d 0/0 22 -p tcp -y -j ACCEPT
-A input -s 0/0 -d 0/0 -i lo -j ACCEPT
-A input -s 0/0 -d 0/0 -i eth0 -j ACCEPT
-A input -s 0/0 -d 0/0 -p tcp -y -j REJECT
-A input -s 0/0 -d 0/0 -p udp -j REJECT
EOF
/etc/rc.d/init.d/ipchains restart
#
# Configure ntp and set the correct time before going any further
echo “* Configuring and starting time service”
cat > /etc/ntp.conf << EOF
restrict default noquery notrust nomodify
restrict 127.0.0.1
restrict 192.168.0.0 mask 255.255.0.0
server 192.168.0.1
driftfile /etc/ntp.drift
logfile /var/log/ntp.log
EOF
cat > /etc/ntp/step-tickers << EOF
192.168.0.1
EOF
chkconfig ntpd on
/etc/rc.d/init.d/ntpd start
#
# Setup /etc/hosts.equiv
echo “* Building /etc/hosts.equiv”
cat > /etc/hosts.equiv << EOF
node01
node02
node03
.
.
.
node64
EOF
#
# Setup NFS mounts
echo “* Establishing NFS mounts”
mkdir -p /home
mkdir -p /u1
if test -f /etc/fstab.assim-save; then \
echo “** WARNING: Modifying previously-saved fstab file instead of the current one”
else \
cp -p /etc/fstab /etc/fstab.assim-save
fi
cat > /etc/fstab << EOF
`cat /etc/fstab.assim-save`
node01:/home /home nfs soft,bg,intr 0 0
node01:/u1 /u1 nfs soft,bg,intr 0 0
EOF
mount -at nfs
#
# Configure a serial console
echo “* Configuring for serial console”
if grep ‘^co:’ /etc/inittab > /dev/null; then \
echo “** Serial console already configured”
else \
cat >> /etc/inittab << EOF
# Serial console since this machine has no head
co:2345:respawn:/sbin/agetty ttyS0 9600 vt100
EOF
fi
if grep ‘^ttyS0′ /etc/securetty > /dev/null; then \
echo “** ttyS0 is already contained in /etc/securetty”;
else \
cat >> /etc/securetty << EOF
ttyS0
EOF
fi
#
# Update /etc/hosts.allow and /etc/hosts.deny
echo “* Setting up /etc/hosts.allow and /etc/hosts.deny”
cat > /etc/hosts.allow << EOF
#
# hosts.allow This file describes the names of the hosts which are
# allowed to use the local INET services, as decided
# by the ‘/usr/sbin/tcpd’ server.
#

ALL: 192.168.0.
EOF
cat >> /etc/hosts.deny << EOF
#
# hosts.deny This file describes the names of the hosts which
# are *not* allowed to use the local INET services,
# as decided by the ‘/usr/sbin/tcpd’ server.
#
# The portmap line is redundant, but it is left to remind you that
# the new secure portmap uses hosts.deny and hosts.allow. In particular
# you should know that NFS uses portmap!

ALL: ALL
EOF
#
# To allow root login via rlogin/rsh, add ‘rlogin’ and ‘rsh’ entries
# to /etc/securetty
echo “* Allowing root logins via rlogin and rsh”
if grep ‘^rlogin’ /etc/securetty > /dev/null; then \
echo “** rlogin is already contained in /etc/securetty”
else \
cat >> /etc/securetty << EOF
rlogin
EOF
fi
if grep ‘^rsh’ /etc/securetty > /dev/null; then \
echo “** rsh is already contained in /etc/securetty”
else \
cat >> /etc/securetty << EOF
rsh
EOF
fi
cat > /root/.rhosts << EOF
master root
EOF
#
# Enable rsync, rlogin, and rsh
echo “* Enabling rsync, rlogin, and rsh”
chkconfig rsync on
chkconfig rlogin on
chkconfig rsh on
#
# Update /etc/aliases so someone gets root’s mail
echo “* Updating /etc/aliases”
if grep ‘^root:’ /etc/aliases > /dev/null; then \
echo “** root is already contained in /etc/aliases”
else \
cat >> /etc/aliases << EOF
root: localuser@node01
EOF
newaliases
fi
#
# useradd may be used to establish user accounts on the compute node.
# Alternatively, the master node may provide user/group databases using rsync
# after the installation process is complete.
#
# Install my favorite analysis package
rpm -i /u1/packages/analpack-1.0-5.i386.rpm
# Install my favorite commercial compiler
tar xzpf /u1/packages/xyz-f90.tar.gz
#
echo “Penguin Cluster Node Assimilation Complete”
exit

The sample script shown here, the “Penguin Node Assimilation” script, is written in bash. It builds an appropriate hosts table in /etc/hosts, builds a new set of firewall rules in /etc/sysconfig/ipchains and restarts ipchains, builds configuration files for NTP, enables and starts ntpd, and builds /etc/hosts.equiv.

Next, the script makes mount points for the NFS volumes, modifies /etc/fstab with entries for those NFS partitions (keeping a copy of the original in the process), and mounts the filesystems.

Since most computational clusters do not have keyboards and monitors (or even KVM switches) for every node, a serial console is often used, where the serial ports of each node is connected to something like a Cyclades terminal server. To support this, the script configures a serial console (by adding a line to /etc/inittab if it’s not already there), and allows root logins on the serial console by adding ttyS0 to /etc/securetty if necessary.

The script then creates new configuration files for tcpd. The /etc/hosts.allow file created by the script allows access to all services from hosts in the 192.168.0. address space. /etc/hosts.deny here denies access to all services for all hosts. Next, the script enables root logins via rlogin and rsh, establishes a .rhosts file for the root user (which allows rlogin/rsh access to the root user on the master node), and enables rsync, rlogin, and rsh. Some admins prefer not to use the r-utilities because of the obvious lack of security. However, on a private cluster network with appropriate TCP wrappers and firewall rules, the security risk is negligible.

The script then sets up an alias for e-mail destined for the root user, and rebuilds the mail alias database. Finally, additional software packages, contained on the /u1 volume mounted from the master node, are loaded onto the system. The packages can be in the form of RPMs, gzip‘d tar files, or other any other type of extractable archive. The script then exits, causing the system to reboot (as specified by the reboot directive in Listing One).

Creating a good %post script takes some effort, but it’s likely to save you considerable time. The sample “Assimilation” script shown here is written so that it may be re-run on a working node without detrimental effect. This allows for continuous modification and testing of the script (in isolation from the rest of the configuration file) as incremental improvements are needed. The script can be kept up-to-date with the configuration of the cluster without having to rebuild cluster nodes, and it can be tested on operating nodes before beginning a new install on a new node.

Now that the Kickstart configuration file is complete, place it where the DHCP server tells the compute nodes to find it. In this case, it should be copied to /u1/ks.cfg.

Are We There Yet?

Whew! That’s a lot of up front work, but if you have lots of nodes, it’s worth it. Now that all the services are configured and the Kickstart configuration file is in place, it’s time to start the network installation process on each of the nodes. Because network bandwidth is limited, you probably don’t want to have more than about a dozen nodes doing a network installation simultaneously. Doing new installations on banks of 10-12 nodes works best.

These days, most PCs ship with Preboot eXecution Environment (PXE) support in their Ethernet interfaces. Often this support is disabled by default, and must be enabled through a series of BIOS or other changes. (For more hints about enabling PXE support, see Tim Kientzle’s “Network Booting” feature in the October 2002 issue of Linux Magazine, available online at http://www.linuxmagazine.com/2002-10/netboot_01.html).

Once enabled, the PXE boot process starts immediately after system tests are complete. The node sends out a DHCP request, and the server responds with an IP address and instructions dictating that the client is a PXEClient. The node then contacts the PXE service and executes instructions from that server. PXE is configured by default to prompt the user to press F8 for the menu, but after 10 seconds to attempt to do a local boot (from floppy, CDROM, hard disk, or whatever). Press F8, and choose “Remote Install Linux.” The Network Boot Program (NBP) loads and executes, causing the installation kernel and ramdisk images to be loaded. A sample PXE boot sequence is shown in Figure One.




Figure One: A sample PXE boot sequence


Intel UNDI, PXE-2.0 (build 082)
Copyright (C) 1997-2000 Intel Corporation

CLIENT MAC ADDR: 00 E0 81 21 C1 22 GUID: 00000000-0000-0000-000000000000
CLIENT IP: 192.168.0.2 MASK: 255.255.255.0 DHCP IP: 192.168.0.1

Local Boot
–> Remote Install Linux

BOOT SERVER IP: 192.168.0.1

Intel Linux NBP, Beta-3 (build 003)

Downloading Linux kernel image…
Downloading initrd image…
Uncompressing Linux… Ok, booting the kernel.

After the kernel is booted, the installation process hits the DHCP server again to get its IP address (again), and to discover the location of the Kickstart configuration file. The Kickstart configuration file then describes where and how to obtain the packages to load onto the system. If all goes well, the node automatically installs Linux according to the Kickstart configuration file, executes the %post script, and reboots. Since the default action for PXE is to do a local boot, it’s not necessary to disable PXE after the installation is complete.

If PXE isn’t supported on the node hardware or if a keyboard isn’t available, a boot floppy can be built to accomplish the same thing. Simply copy the bootnet.img file to a floppy disk, mount the floppy disk, remove the unnecessary message files, replace the syslinux.cfg file with a customized version for automatic Kickstart, and unmount the floppy. Figure Two shows the exact steps involved and includes a syslinux.cfg file that starts a Kickstart installation.




Figure Two: Steps to create a Kickstart boot floppy


$ dd if=/u1/redhat/images/bootnet.img of=/dev/fd0
$ mount /mnt/floppy
$ cd /mnt/floppy
$ rm -f *.msg
$ cat > syslinux.cfg << EOF
default ks
prompt 0
label ks
kernel vmlinuz
append ks initrd=initrd.img lang= devfs=nomount ramdisk_size=8192 serial=0,9600n8 console=ttyS0
EOF
$ cd /
$ umount /mnt/floppy

This modified syslinux.cfg file tells the boot process that the default section to execute is labeled ks and that no prompt should be displayed. The section labeled ks identifies the name of the kernel file on the floppy disk, and specifies additional parameters that should be passed to the kernel at boot time. The ks parameter tells the kernel that it should do a Kickstart install by contacting the DHCP server to find its configuration file. The initrd parameter tells the kernel where to find its initial ramdisk image on the floppy. The serial and console parameters specify that the kernel should use a serial console on ttyS0. The PXE- and floppy disk-based installations should produce identical results.




The TOP500 List

The 20th TOP500 List was announced during the recent Supercomputing Conference 2002 (SC2002) in Baltimore, Maryland. The TOP500, updated twice annually, catalogs and ranks the world’s most powerful supercomputers based on their performance on the standard Linpack benchmark. Traditionally comprising only the biggest and most expensive custom-built commercial supercomputers, the list now includes a significant number of PC-based, Beowulf-style clusters. And for the first time ever, two such clusters rank in the top ten. At number five is a cluster of 2,304 Pentium 4 Xeon processors running at 2.4 GHz at Lawrence Livermore National Laboratory. And at number eight is a cluster of 1,536 dual Pentium 4 Xeons running at 2.2 GHz at NOAA’s Forecast Systems Laboratory (FSL).

The Livermore cluster, called MCR, was put together by Linux NetworX, and can achieve 5.694 trillion calculations per second (teraFLOPs). The nodes are interconnected by Quadrics Ltd. high bandwidth, ultra low-latency interfaces and switches.

The FSL machine was built by High Performance Technologies, Inc. (HPTi) using Myricom’s Myrinet cluster-interconnect technology. Used for weather research, it ran the Linpack benchmark at 3.337 teraFLOPs.

A total of 55 Intel-based and eight AMD-based PC clusters now occupy slots in the TOP500. In the latest list, the total number of clusters is 93 systems. Fourteen of these clusters are labeled as “self-made,” since they were designed and assembled by the end users themselves. The future looks very bright indeed for commodity cluster computing.

Rounding out the top five are: an 8,192 processor IBM SP Power3 called “ASCI White” at Livermore, coming in at number 4 with 7.226 teraFLOPs; two Hewlett Packard AlphaServer SC systems with 4,096 processors each called “ASCI Q” (both at Los Alamos National Laboratory), coming in second and third with 7.727 teraFLOPs. And at the top of the heap is the 5,120 processor NEC SX6 vector machine, commonly called the “Earth Simulator.”

The Earth Simulator was built in Japan primarily for climate research. It uses a hybrid vector/massively parallel processor (MPP) architecture to achieve very high performance on certain kinds of repetitive calculations. On the Linpack benchmark it achieved 35.860 teraFLOPs.

The Earth Simulator has raised the bar for the supercomputing industry, and U.S. companies are likely to respond by developing their own very high performance offerings.

Assimilate and Compute

Once all of the nodes are booted with either PXE or by way of the floppy disk, they’ll all be ready to use as compute nodes. When new or replacement equipment arrives, it can be assimilated quickly and easily into the cluster collective simply by adding its MAC address to the DHCP server configuration and booting the node with PXE or the floppy disk.

While it’s necessary to configure a number of services to get Kickstart working, it’s well worth the effort. All nodes can be loaded or reloaded over the network at any time, and all the services and software on these compute nodes will be configured and ready to run in minutes — saving the system administrator (perhaps you?) countless hours of repetitive tasks.

If you have a new cluster arriving soon or need to upgrade your OS but have been afraid to do so, take a look at Kickstart. Then spend the rest of the week soaking up some rays while your boss thinks you’re slaving away in the chilly computer room clicking “OK” over and over and over…




Resources


The TOP500 List: http://www.top500.org/

Supercomputing Conference 2002: http://www.sc-2002.org/

Linux NetworkX: http://www.linuxnetworx.com/

Quadrics Ltd.: http://www.quadrics.com/

High Performance Technologies, Inc.: http://www.hpti.com/

Myricom, Inc.: http://www.myrinet.com/



Forrest Hoffman is a computer modeling and simulation researcher at Oak Ridge National Laboratory. He can be reached at forrest@climate.ornl.gov.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62