Configuring a Beowulf Cluster

Without serious Linux savvy, in- stallation and administration of a Beowulf cluster can be cumbersome and time con- suming, particularly when the cluster consists of more than a handful of nodes. Lack of a single system image -- one operating system controlling all nodes simultaneously -- makes day-to-day administration challenging. The result is that, without software tools, you must maintain each node individually.

Without serious Linux savvy, in- stallation and administration of a Beowulf cluster can be cumbersome and time con- suming, particularly when the cluster consists of more than a handful of nodes. Lack of a single system image — one operating system controlling all nodes simultaneously — makes day-to-day administration challenging. The result is that, without software tools, you must maintain each node individually.

Although both free and commercial clustering toolkits are now becoming available to alleviate some of the problems that you will encounter, configuring a Beowulf cluster without a toolkit requires a little bit of ingenuity, but this approach affords the user or administrator more options in the design of the system.

Fortunately, all nodes in a cluster except the first one usually are configured identically. While they may consist of different hardware components, they typically run the same kernel and application software and have the same file system layout. This means that you can clone all the compute nodes in a cluster from a single pre-configured system.








Extreme Figure 1
Figure One: A front-end node is attached to a public network, and the other nodes are connected to each other and to it.

Public vs. Private

In a typical Beowulf configuration, one node — often having more memory and large disk volumes — serves as a front end or master to the rest of the cluster. This front-end node is connected to the public network and is usually accessible on the Internet, while the remaining slave nodes may be interconnected via a private network that includes a connection to the front-end node. Figure One shows this kind of configuration.

This setup provides better security for the cluster because only the front-end machine needs to be maintained in the most secure fashion. The slave nodes are isolated from the Internet, and the inter-node network traffic is transmitted only on the private local area network (LAN).

As a consequence, the slave nodes are easier to manage because some security rules can be circumvented. For instance, the root user can be allowed to use rsh to execute privileged commands on all nodes.








Extreme Figure 2
Figure Two: A server farm allows each node to connect to a public network while also being connected to each other.

Alternatively, in an environment where many serial computational tasks are run, a server farm configuration may be a bit more practical. In this configuration, shown in Figure Two, each node may again be connected to a private network, but each is also connected to a public or routed network so that users may log into and use individual nodes without affecting the others. One super node may still be used for serving up large disk volumes to the entire cluster. In this configuration, security becomes a concern for all nodes, and system administration can be more difficult.








Extreme Figure 3
Figure Three: This front-end configuration includes a high-speed proprietary network for passing messages.

Increasingly, computational scientists need higher-bandwidth networks for inter-node communication. The more popular clusters that are used today often have a specialized, proprietary network (as shown in Figure Three), which offers very high speed transmission and very low latency. In fact, there are a number of specialized interconnects available on the market today that are specifically designed to improve message passing in cluster computing environments.

The network configuration and topology in a Beowulf cluster are dictated by the needs of the applications that will be run on the system. Individual PCs or nodes can be thought of as computational elements that may be arranged and interconnected in various ways using various networking elements to meet the needs of the application. This is the strength and beauty of cluster computing.

Choosing IP Addresses

While the IP address used for the Internet connection is usually specified by an Internet Service Provider or the site network manager, any block of addresses may be used for the private LAN. However, using addresses that are accessible on the Internet can be troublesome.

The designers of the Internet anticipated this potential need and wisely set aside blocks of addresses that are not routed over the Internet. One Class A network (10.0.0.0) and two Class B networks (192.168.0.0 and 172.16.0.0) are reserved for use only on non-routed LANs.

In the discussion that follows, it will be assumed that the 192.168.0.0 network addresses are used for the nodes on the private LAN and that the configuration is that shown in Figure One. The front-end node will use an assigned address for its public Internet connection and 192.168.0.1 for its second Ethernet interface, which is connected to the rest of the cluster. The second node will be 192.168.0.2, the third 192.168.0.3, and so on.

Configuring the Front-End Node

The front-end node usually shares its vast disk resources with the entire cluster using the Network File System (NFS). While each node may have its own scratch disk space, often called /scratch and used by running applications, the front-end node normally houses user files and large data files. As a result, only the disks that are on the front-end node need to be backed up. These disk resources must be listed in the /etc/exports file on the front-end node so that they are available to the rest of the cluster. For example, the lines in Listing One should be contained in /etc/exports.




Listing One: Disk Resources Listed in /etc/exports


/home 192.168.0.0/255.255.0.0(rw,no_root_squash)
/huge_data_disk 192.168.0.0/255.255.0.0(rw,no_root_squash)

These entries tell the NFS server (and mountd) that the /home and /huge_ data_disk partitions should be made available for read/write access to all nodes in the 192.168.0.0 address space. The no_root_squash option tells the NFS server that privileges should be honored from root users on all these nodes.

For security reasons, it is wise to disable access to most services on the front-end node from all network addresses except those within the cluster. This is easily accomplished by editing a couple of files, since most Linux systems use TCP wrappers or tcpd, which checks the /etc/hosts.deny and /etc/ hosts.allow files before initiating most services. First, you should make sure that all services are disabled for all hosts by creating the following entry in /etc/hosts.deny:


ALL: ALL

Second, access to all of the services should then be granted to the nodes in the cluster and to localhost, the loop-back interface.

Finally, you should grant Secure Shell access to all systems on the Internet so that users may log into the system to use it. The following lines in /etc/hosts.allow accomplish this:


ALL: 192.168. localhost
sshd: ALL

Assuming that the firewall or router for the site network blocks external traffic originating from the 192.168.0.0 addresses, and assuming that the other employees who have access to the network are considered trustworthy, these measures should adequately protect the front-end node from any malicious intrusion.

Instead of running Domain Name Service on the front-end node and trying to secure it from possible attack, many cluster administrators prefer to maintain hostnames and addresses in /etc/hosts. The following entries would reside in /etc/hosts for the example cluster:


127.0.0.1localhost
# Beowulf cluster nodes
192.168.0.1node01 master
192.168.0.2node02
192.168.0.3node03
192.168.0.4node04

Because it will be necessary to use rsh to initiate parallel jobs and to perform several other system operations, all of the nodes in the cluster should be listed in the /etc/hosts.equiv file as shown in the following:


node01
node02
node03
node04

Having a Good Time

Keeping system times on the nodes synchronized is important in a clustered environment. This is accomplished by configuring and using the Network Time Protocol (NTP) daemon. The front-end node should be configured to use the timeserver that services the site network (time. mycompany.com in this example) by adding a server line to /etc/ntp.conf as follows:


server time.mycompany.com

This server should also be listed in /etc/ntp/step-tickers as follows:


time.mycompany.com

All of the servers that are listed in /etc/ntp/step-tickers are checked at boot time so that the system clock can be set immediately before starting the NTP daemon.

Configuring Slave/Compute Nodes

Slave nodes are configured in much the same way as the front-end node. The changes to /etc/hosts and /etc/ hosts.equiv as described in the previous section should also be made on each slave node.

However, since these nodes have no direct Internet connection, they must rely on the front-end node for quite a variety of services. For instance, all of the slave nodes will synchronize their system time with that of the front-end node.

Therefore, the server entry in /etc/ ntp.conf for all other nodes should be as follows:


server node01

In addition, the entry in /etc/ntp/ step-tickers is:


node01

Once you have configured a single slave node, it may be cloned onto all other slaves so that you don’t have to repeat the entire process for each node. There are a number of cloning software packages available for free download over the Internet.

Keeping Things in Sync

After the cluster is up and running, you will immediately need to change things. As users and groups are added or as new software is installed, you will need to update certain files on every node in the cluster.

This task is often handled by the Network Information Service (NIS); however, on very large clusters the overhead associated with NIS may slow job initiation or cause other problems. Moreover, NIS cannot be used to synchronize all of the configuration information in a cluster.

One method of synchronizing nodes is to have a script redistribute certain files on a regular basis. For instance, the small shell script in Listing Two, called ssync, can be run to update all the files listed in sfiles on all the nodes listed in /etc/shosts.




Listing Two: Slave Synchronizer (/sbin/ssync)


#!/bin/csh
#
# Install in /sbin/ssync
# This script automatically synchronizes slave nodes in a Beowulf
# cluster
#
foreach node (‘cat /etc/shosts’)
echo “Syncing ${node}”
foreach file (‘cat /etc/sfiles’)
echo ” Copying ${file}”
rsync ${file} ${node}:${file}
end
end

The /etc/sfiles file is composed of the following:


/etc/passwd
/etc/shadow
/etc/group
/etc/hosts
/etc/hosts.equiv

The /etc/shosts file contains the list of slave nodes as follows:


node02
node03
node04

Create a cron entry (using crontab -e) for the root user on the front-end node that will run the ssync script regularly. In this example, it is run at five minutes past the hour:

# At five minutes past the hour, push necessary files out to
# cluster nodes
5 * * * */sbin/ssync

You may add files to the list in /etc/sfiles so that other system and software configuration files are kept up to date across the entire cluster. Once the script is running, configuration changes need only be performed on the front-end node. Then, at some later time, the changes will propagate to the other nodes.

Very often system administrators, and even users, will need to run a command on every node in the cluster to check for processes, kill jobs, check system load, shut down the system, and so on. A small script for “broadcast rsh,” like the one called brsh in Listing Three, can be used for these tasks.




Listing Three: Broadcast rsh (/usr/bin/brsh)


#!/bin/csh
#
# install as /usr/bin/brsh or /usr/local/bin/brsh
#
foreach host (‘cat /etc/bhosts’)
echo “***** ${host} *****”
rsh ${host} $*
end

You may place this script in /usr/ bin or place /usr/local/bin on all nodes so that anyone may use it. A file containing a list of all the active nodes, called /etc/bhosts, must be maintained on all nodes. You could add this filename to /etc/ sfiles so that it will be propagated by the ssync script.

The brsh script may be used to execute any command on all nodes. For example, to check the uptime and system load on all nodes, simply run something like Listing Four.




Listing Four: Using the brsh Script


$ brsh uptime

***** node01 *****
12:06am up 41 days, 12:11,2 users,load average: 0.00, 0.00, 0.00
***** node02 *****
12:06am up 11 days, 14:22,0 users,load average: 1.00, 0.97, 0.91
***** node03 *****
12:06am up 5 days, 7:27,0 users,load average: 0.99, 0.97, 0.91
***** node04 *****
12:06am up 5 days, 7:21,0 users,load average: 0.00, 0.00, 0.00

Ready to Roll

Once the necessary compilers and message-passing libraries have been installed, the Beowulf cluster should be ready to start building and running parallel jobs.

Installation of Linux in a clustered environment requires only a few additional configuration steps, and you will be able to accomplish most administrative tasks with simple shell scripts like the ones that we’ve discussed in this column.



Forrest Hoffman is a computer modeling and simulation researcher at Oak Ridge National Laboratory. He can be reached at forrest@esd.ornl.gov.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62