The Clustermatic Linux distribution, produced by the Cluster Research Lab at Los Alamos National Laboratory, is a collection of software packages that provides an infrastructure for small- to large-scale cluster computing.
The Clustermatic Linux distribution, produced by the Cluster Research Lab at Los Alamos National Laboratory, is a collection of software packages that provides an infrastructure for small- to large-scale cluster computing. Consisting of LinuxBIOS, BProc, and a job scheduler, Clustermatic 5 (released in November 2004) runs on 32- and 64-bit x86 platforms, as well as the 64-bit PowerPC processor. A number of large Linux clusters are running Clustermatic, including the 2,816 AMD Opteron processor cluster called “Lightning” at Los Alamos.
This month, let’s look at Clustermatic and build a small cluster using the software.
LinuxBIOS (http://www.linuxbios.org) replaces the system BIOS with a little bit of hardware initialization code and a compressed Linux kernel that can be booted directly from a cold start. LinuxBIOS allows the operating system to control a cluster node from power on, and bypasses the proprietary, antiquated, slow, and often buggy BIOS in common use today. As a result, a cluster node can be booted and operational in as little as three seconds!
LinuxBIOS gunzip s a small Linux kernel straight out of non-volatile RAM (NVRAM), then loads a full kernel over Ethernet, Myrinet, Quadrics, SCI, or from some other device. Even this “full” kernel can be rather lightweight: nodes can be as simple as just a CPU and memory, with no hard disks, no floppies, and no filesystems, making for fast and efficient compute nodes with little autonomy.
A variety of motherboards are known to work with LinuxBIOS. (A list of these is available on the LinuxBIOS web site.) Additionally, a number of cluster integrators support LinuxBIOS, including Linux Networx, which built the 11.2 teraflop per second “MCR Cluster” at Lawrence Livermore National Laboratory using LinuxBIOS and Red Hat Linux.
Some skill with hardware and using the right motherboard are required to flash a Disk on a Chip. Alternatively, a flash burner can put the LinuxBIOS image into firmware. Again, instructions for a few hardware combinations are contained on the web site. Some of these procedures involve potential risks to hardware and even to personal safety, so take care when you work with energized equipment. (And if you’re not comfortable with handling sensitive electronic components, you shouldn’t attempt to install LinuxBIOS onto PROMs or NVRAM by yourself. Instead, hire a qualified electronics technicians or find a systems vendor to burn and install the ROM image desired. \)
Fortunately, LinuxBIOS isn’t required to get a Clustermatic cluster going. Nodes can be booted over the network or from CD or floppy media after the traditional BIOS has done its thing.
BProc, the Beowulf Distributed Process Space (http://bproc.sourceforge.net), was introduced recently in the February 2005 “Extreme Linux” column (available online in May 2005 at http://www.linux-mag.com/2005-05/extreme_01.htm). It provides a single process space (akin to a single system image, or SSI) across an entire cluster, meaning that all application processes show up in the process table of the master node and can be controlled directly from the master.
BProc consists of a set of kernel patches, kernel modules, master and slave daemons, and utility programs used to start, migrate, and manage application processes across an entire cluster. In addition, a library of BProc system calls is available for controlling process migration and performing a variety of functions on cluster nodes.
February’s column included installation instructions for BProc and introduced the utilities used to start programs on nodes (bpsh), copy files between nodes (bpcp), and check the status of nodes (bpstat). When using the Clustermatic distribution, building BProc separately isn’t necessary, as it’s provided as a set of RPMs along with a modified kernel RPM and patched kernel sources. Also included are beoboot, software for booting and configuring cluster nodes; beonss, a node nameservice; bjs, the BProc job scheduler; and mpich, a free MPI implementation modified to work with BProc.
The easiest way to get Clustermatic up and running is to download the ISO CD-ROM image from the Clustermatic website (http://www.clustermatic.org/) and burn it onto media. The Clustermatic 5 disk, the latest release as of this writing, contains all of the kernel and software package RPMs and SRPMs for the x86, x86_64, ppc, and ppc64 platforms. These packages should be installed on a functioning Linux distribution on the machine that’s intended to be the master node.
As an example, let’s install Clustermatic on an x86 system running Fedora Core 2.
First, you need a modified kernel that supports BProc. The i686 kernel package provided on the Clustermatic CD includes BProc and has SMP and 64 GB memory support built-in, but it may lack some of the features of kernels from various distributions. For standard x86 systems, especially those running Fedora Core, from whence the Clustermatic kernel was derived, the stock Clustermatic kernel should be fine. However, the patched kernel source is provided (in the noarch/ directory) in case you need to build a custom kernel is necessary. The PowerPC kernel is built for the Power4/970/G5.
The new kernel can be installed without removing the existing kernel on the system by typing:
[root@master1 i686]# rpm –ivh kernel-2.6.9-cm46.i686.rpm
Unlike normal Fedora kernel RPMs, this RPM does not create initrd images or reconfigure the boot loader. These steps, which are different for different Linux distributions, must be performed manually.
On Fedora, you can build the initrd image as follows:
[root@master1 root]# /sbin/mkinitrd /boot/initrd-2.6.9-cm46 2.6.9-cm46
Next, edit /etc/grub.conf to point to the new kernel by appending the
title Clustermatic (2.6.9-cm46)
kernel /vmlinuz-2.6.9-cm46 ro root=LABEL=/ rhgb quiet
If you want to boot this kernel by default, change the default line in /etc/grub.conf to default=1 (assuming the Clustermatic entry is the second entry in that file). Next, reboot the system to load the new kernel, login, and type uname –r to verify that the 2.6.9-cm46 kernel is running. (Similar installation instructions for SuSE and Yellowdog Linux are provided in the README on the CD.)
Next, the BProc and associated RPMs should be loaded (from the i586/ directory on x86 systems) as follows:
[root@master1 i586]# rpm –ivh b*.rpm m*.rpm
This installs the beoboot, beonss, bjs, bproc, bproc-devel, bproc-libs, and mpich-p4 packages.
Now that all of the packages are installed, the system must be configured. Edit /etc/clustermatic/config to establish your configuration. The interface directive must refer to the correct network interface (s) on which the BProc master daemon should listen. The default port can be changed using the bprocport directive. A master directive line should be included for each master node in the cluster. And a range of IP addresses for cluster nodes can be specified using the iprange directive. The slave node boot file should be specified with the bootfile directive. A list of libraries to export to slave nodes can be specified with the librariesfrombinary and libraries directives. Finally, a series of node directives specifying their node numbers and MAC addresses should be included so that beoboot will respond to their RARP requests. The node list can be added later using the nodeadd utility.
Listing One contains an example configuration file (sans comments).