Concepts in Beowulfery

Computational scientists were early adopters of Linux because of its reliability and efficiency. These folks needed a powerful yet stable computing platform to run their complex scientific simulations. Linux provided a solid development platform and had the reliability and stability they required. Additionally, the open source aspect of Linux was appealing because the open source development process paralleled the scientific method; the code is widely published and reviewed by others prior to acceptance.

Computational scientists were early adopters of Linux because of its reliability and efficiency. These folks needed a powerful yet stable computing platform to run their complex scientific simulations. Linux provided a solid development platform and had the reliability and stability they required. Additionally, the open source aspect of Linux was appealing because the open source development process paralleled the scientific method; the code is widely published and reviewed by others prior to acceptance.

Linux caused something of a revolution in computational science. When NASA needed an inexpensive way to solve large computational problems in earth and space sciences in 1994, a group of researchers led by Thomas Sterling decided to hook up a bunch of PCs running Linux to see if they could be made to function like a supercomputer. They justified this experiment through the economics of the problem; commercial supercomputers are enormously expensive to purchase and operate.

Sterling’s experiment, called the Beowulf Project (http://www.beowulf.org), was a success; since then, clusters of PCs, now called Beowulf clusters or just Beowulfs, have sprung up in research laboratories, companies, and academic institutions around the world. Beowulfs are affordable for small research groups or even individuals and can be dedicated to solving single problems. Some researchers even managed to build a Beowulf without any budget, simply by using surplus PCs and some institutional infrastructure (http://stonesoup.esd.ornl.gov). As many supercomputer manufacturers went out of business in the 1990s, the survivors learned from the Beowulf movement that building clusters is an extremely cost-effective solution.

Nowadays, the term Beowulf is usually used to refers to a cluster of commodity off-the-shelf (COTS) computers. Also referred to as COWs (Clusters Of Workstations), NOWs (Networks Of Workstations), or LOBOS (Lots Of Boxes On Shelves), they are sometimes even constructed from mail-order parts. Beowulf clusters may also be purchased as complete systems from a wide array of commercial vendors who deliver, install, and set up the cluster at the customer’s site. The combination of the mass production of components that has driven down the cost of a computer along with the power of Linux has made high-performance parallel computing affordable.

Hardware

Individual nodes in a Beowulf cluster usually contain one or more of the fastest available CPUs for speedy processing, lots of memory for handling large data sets, and fast buses and large memory caches for quick memory throughput. However, you must always analyze how you are going to use your cluster. In cases where the computations are relatively simple but are performed on very large data sets, it may be wise to sacrifice some CPU speed in favor of spending more money on a faster bus or larger cache.

Cluster nodes are typically interconnected via one or more local area networks (LANs) that are running on fast or gigabit Ethernet. Fast Ethernet has a maximum throughput of 100 megabits per second (Mb/s), while gigabit Ethernet has a maximum throughput of 1,000 Mb/s or 1 gigabit per second (Gb/s). In most cases, only one node (often called the master node) is able to connect to other systems outside the cluster. This separates the network between the nodes in the cluster from the network outside the cluster and means the inter-node network is not cluttered with other traffic. It also means security may be relaxed somewhat on all but the master node.

Many clusters today also have another inter-node connection method that offers higher transmission speeds and greatly reduced network latencies (the time spent waiting for the network channel to open and begin transmission). Two different proprietary products commonly used in Beowulfs, Myrinet and Scali (SCI), offer transmission speeds in excess of 240 megabytes per second (equivalent to 1.92 Gb/s) with latencies below 9 microseconds. These specialized high-speed, low-latency message passing networks tend to be expensive, often adding 50 percent to the price of each node, but they can be invaluable for improving the performance of communications-intensive parallel applications.

Computational Performance

The performance of these machines has increased over the years so much that the largest Beowulf-style clusters now rival the performance of the largest commercial supercomputers. In fact, two similar Beowulfs placed 30th and 31st on the June 2001 TOPS500, a list of the 500 fastest supercomputers on the planet (http://www.top500.org). These clusters were built by IBM and contain 1,024 Intel Pentium III processors running at 1 GHz. They have achieved a peak performance of 594 gigaflops (one gigaflop is one billion floating point operations per second). The National Center for Supercomputing Applications (NCSA) owns one; Shell Oil owns the other.

Another Beowulf cluster is 42nd on the list. Located at and “home-built” by Sandia National Laboratories, the CPlant (Computational Plant) contains 1,000 processors and has peaked at 512.4 gigaflops. Clusters based on both the Intel Pentium and Compaq Alpha processors have become so common that a separate list called the Clusters @TOP500 is being created (http://clusters.top500.org).

System Administration

The biggest hurdles in running a Beowulf cluster are software installation and the day-to-day administration of the machine. In this loosely coupled computing environment, each PC or workstation has its own copy of the operating system and its own disk, memory, and process space resources. This means that instead of managing one Linux box, the system administrator may have 100 or even 1,000 boxes to manage. Adding a user, configuring a software package, or cross-mounting a new NFS partition can be excruciatingly tedious.

In the early days of Beowulfs, sysadmins wrote their own scripts for handling such menial tasks. Cron jobs would often be used to keep nodes in sync with the first/master node.

Now there are a number of commercial and free toolkits to help install and manage clusters. These tools range from collections of useful shell scripts and installation templates all the way to a complete repackaging of the Linux kernel with clustering enhancements built in. Links to many of these toolkits and other Beowulf resources are available at the Beowulf Underground (http://beowulf-underground.org). An administrator should test a number of these tools to find the collection that works best for him.

Parallel Programming

To take maximum advantage of a Beowulf cluster, a different programming style called parallel programming must often be used. Some problems cannot easily be solved on conventional workstations or standalone PCs due to the large quantity of data involved, the amount of time required to perform all the computations, or both.

Parallel computing involves breaking up the problem so that many processors can simultaneously perform some part of it. This technique, called domain decomposition, frequently results in a linear increase in performance (up to a point), particularly when each part is computationally independent of the others. However, parallelizing an algorithm is not a trivial undertaking. The programmer must know more about the hardware than usual and must be able to consider the algorithm in a new context.

For some tasks, though, this effort isn’t needed. Image rendering algorithms perform repetitive computations on lots of data. Since every frame may be treated as independent from another, the task can be parallelized by assigning each processor the task of rendering one frame using the original serial algorithm. This is often referred to as “process parallel.”

With other types of problems, individual processors must be able to communicate with each other to exchange data and remain synchronized while they are solving the problem. The two APIs (Application Programming Interfaces) most often used are MPI (the Message Passing Interface) and PVM (Parallel Virtual Machine). They both provide a library of routines for distributing (scattering) or collecting (gathering) data across processors and for point-to-point communication.

Since these APIs are available on all commercial supercomputers as well as Beowulfs, any code that is developed on a Beowulf may be moved to a commercial supercomputer without any changes to the source, and vice versa. This is a big win for everybody.

Looking Ahead

In the coming months, this column will focus on other aspects of high-performance computing with Linux. From hardware design and system administration to message passing and applications programming, a wide range of topics related to Extreme Linux will be examined. Meanwhile, poke around the sites listed in the Resources sidebar and have fun!




Resources



Forrest Hoffman is a computer modeling and simulation researcher at Oak Ridge National Laboratory. He can be reached at forrest@esd.ornl.gov.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62