Beowulf Infrastructure

While building a Beowulf cluster is cheap, estimating the true costs of acquiring an entire cluster can sometimes be a headache. Duke University's Dr. Robert G. Brown describes what you need to know before writing a proposal -- or a check -- for your first Beowulf.


Beowulf-style supercomputers built out of over-the-counter (OTC) hardware are an economical and practical alternative to expensive, proprietary hardware. Beowulf-like clusters maximize floating point cycles per dollar spent, and small “hobby-scale” beowulfs (less than perhaps eight nodes, where, for the purposes of this article, a node refers to a single case that might house one or more CPUs) can be built almost anywhere and can be managed by anyone with a decent knowledge of Linux or Unix. [As an example, the author has a complete hobby-scale cluster in his home.]

But as the number of nodes increases, one must pay careful attention to physical infrastructure. Cluster nodes consume electricity and generate heat, and therefore require adequate cooling. A cluster weighs a certain amount, and has a footprint and a volume. Clusters require network wiring, and one must be able to physically access the front and/or back of each node to perform regular maintenance.

In a similar vein, if you’re installing only a handful of nodes, where each node takes an hour or even several hours to install, the overall time investment is still slight — perhaps only a day or two. At that scale, it’s fairly easy and relatively inexpensive to put a monitor and keyboard on each node or use a cheap keyboard, video monitor, and mouse (KVM) switch to get to each node to do an install or upgrade. Again, spending minutes per node per day may add up to only a day or two of time over a year. That leaves most of the year for cluster production.

But when you plan to install a hundred nodes, a seat-of-the-pants approach simply doesn’t work. A hundred nodes might require ten kilowatts (or more) of electrical power and several “tons” of air conditioning (a term that doesn’t refer to the weight of the air conditioner, but rather its capacity). The cluster could weigh several tons, might need hundreds of square feet of floor space, and could easily cost $10,000 (or more) a year in recurring costs for power and cooling — even after buying and constructing the power lines and air conditioning units.

Cluster management faces a similar crisis in cost. Spending two hours installing each node adds up to 200 hours (or five full work weeks) for initial assembly. Once installed, spending just two minutes per day per node adds up to over three hours a day! Managed poorly, maintaining even a fairly small “professional” cluster can become more than a full-time job.

As you can see, infrastructure costs can be significant for larger clusters, and poor methodology — methodology that doesn’t scale well with the number of installed nodes — can lead to disaster.

This article describes some things you should know before you run out and buy a few hundred nodes for your metaphorical garage, and suggests at least a few ways to honestly estimate the fixed and recurring requirements and costs for running a relatively large cluster.

Cluster Space

As noted above, cluster nodes have a variety of physical dimensions. They have a footprint (the area of their base), a height (and hence a volume), and a weight. Their shelving or rackmounts may increase their footprint. Access to the front and back of the nodes must typically be preserved to allow nodes to be moved in and out of the cluster, to allow cool air in and warm air out, and to provide access to network and power cabling. Comfortable access space typically requires 2-3X the footprint of the node itself, even in the most efficient and compact cluster room layout.

Nodes are often stacked up vertically: tower units may be placed in heavy duty steel shelving, with rackmount units in two-post or four-post racks. In very rough terms, four shelves of four tower units per shelf (16 nodes) might occupy a strip two feet wide by three feet long by close to eight feet high. Adding front access, a fairly minimal space for such a cluster would be twenty square feet in a room with at least eight foot ceilings. Assuming a pessimistic weight per node (including the weight fraction of the shelving that supports it) of 30 pounds (14 kg.), the loaded shelf could weigh 500 pounds.

Rackmount clusters are often installed in 19″-wide racks that range from 40U to 45U, where one “U” is 1.75 inches (making racks around six feet or two meters tall). Depending on configuration, rackmount nodes can still weigh 22 pounds (10 kg.) per U, and are often roughly 30″ deep. Including access, a fully loaded rack might require a minimum of 13 square feet (with at least least eight foot ceilings) and can weigh 1000 pounds or even more if an uninterruptible power supply (or UPS, with its heavy batteries) is included.

Finally there is a “blade” cluster configuration, which we won’t discuss here, except to say that blade clusters permit still higher densities of nodes, albeit at a considerably higher cost per computer cycle. Expense aside, a blade cluster may be right for you if you need many nodes but have a minimum of physical space.

Clearly, space becomes a major factor in large cluster design. One even needs to carefully consider things like floor strength, as one stacks up half a ton per square meter. Few things can ruin your day like a 45U, two-post rack filled with expensive equipment falling over or falling through a ceiling.

Humidity is another bad thing — electrical circuits don’t like getting wet. The next major things to consider are power and air conditioning.

Electricity: The Stuff of Life

To begin, the home of your new cluster must provide enough power to feed all of the cluster’s machinery, including nodes, networking gear, storage peripherals, and KVMs. You should provide enough circuits and enough receptacles so that it’s easy to plug each component directly into a socket. (Extension cords are a very bad idea.) Typically, electricity is provided in the form of “power poles” next to rack locations, or is provided as receptacles built into walls, ceiling, or floor (floor receptacles are commonly found in raised floor facilities).

Nodes can draw a wide range of electrical power. A “reasonable” estimate is 100-200 W per CPU, but this is very crude. Blade computers or older (slower clock) systems, stripped, might require less. Power requirements increase with load, CPU clock speed, memory, disk, network, and other peripherals (and time, as systems evolve), so carefully consider your proposed node configuration under load. The best possible way to determine a node’s power requirements is to measure it under a variety of load conditions, over time. A “kill-a-watt” plug-through meter is an inexpensive and readily available way to take such a measurement (see “Resources” at the end of this article). Remember that nodes you buy five years from now may require even more power.

Here are a couple of suggestions, based on personal and painful experience, regarding electrical wiring requirements for a compute cluster location.

  • Over wire. In principle, a 20 amp, 120 VAC circuit can deliver about 1700 W rms (average) power without blowing. Thus, one might naïvely expect to be able to run as many as sixteen 100 W nodes on a single circuit; however, in practice, you might find circuit breakers tripping at ten nodes, as systems draw in excess of their average rate while booting, for example, or a low power factor distorts peak currents (see the “Electrical FAQ” link in “Resources”). Reserving 50% of the capacity of each circuit in your estimates isn’t excessive. The cost of excess capacity, amortized over ten years, is trivial compared to the cost of inadequate capacity (not to mention the subsequent headaches and loss of productivity).

  • Learn about line distortion. Learn about the kind of line distortion that occurs when large numbers of switching power supplies (the kind found in most computers) are on a single line, especially on a long run from the receptacle to the power bus and neutral line ground. Each phase of a multi-phase circuit should have its own neutral line. Sharing of neutrals is a shockingly bad idea in a machine room. There should be a short run from a local power panel to the receptacles, and all wiring should be done by experienced, licensed professionals so that it meets or exceeds the requirements of the National Electrical Code.

  • Read the Harmonics Q&A FAQ. Anyone considering electrical infrastructure (either renovation or new construction) for a cluster should start by reading the “Harmonics Q&A FAQ” at http://www.mirusinternational.com. This FAQ provides a marvelous education about how many switching power supplies on a single line can distort line voltage, generate spurious and injurious system noise, reduce a system’s natural capacity to withstand surges, and more. Do not assume that your building’s existing wiring (even where adequate in terms of nominal capacity) is adequate to run a cluster, unless you wish to be tormented by power-related hardware problems. Also, consider getting a harmonic mitigating transformer for the space.

  • Consider uninterruptible power. Although the marginal benefit of keeping nodes up through short power outages may or may not be significant in your power grid, a good UPS conditions power far better than most surge protectors. A single UPS for your whole facility is likely to be cheaper and more manageable than individual UPS hardware for each node — less than $100 per node in additional expense.

Of course, where there’s power, there’s heat. Let’s turn to cooling.

It’s Cool to be Cool

All of the power that goes into a room through all of those electrical cords has to be removed from the room too, generally with one or more air conditioning (AC) units. Heat that remains behind raises the room’s temperature, and computers hate tropical climates.

A single loaded shelf (16 cases) can draw anywhere from 1.6 KW to close to 5 KW (in a loaded dual configuration). A single, loaded 45U rack might draw from 4 KW to well over 10 KW in its meter-square floor space. Summed up, this (plus a margin for switches and other cluster equipment, plus heat produced by human bodies, electrical lights, and the AC units themselves) is the heat that must be removed from the room.

AC is typically purchased or installed in units of “tons,” where a ton of capacity can remove the heat required to melt a ton of ice at 0° Celsius into water at 0° C, every 24 hours. This works out to be about 3500 watts, or three tons of AC per 10 KW of load in the space. Again, it’s better to have surplus capacity than inadequate capacity, because it’s best to keep the room at temperatures below 20° C (around 60° F). Every 10° F above 70° F reduces the expected life of a system by roughly a year and consequently increases the amount of time spent dealing with hardware failures. Any AC/Power system installed should also have a “thermal kill switch” (or another automated, thermally-enabled shutdown mechanism) that shuts down all room power if AC fails but power doesn’t and ambient temperatures exceed (say) 32° C or 90° F.

Professional care must be taken to distribute cooled air so that it’s taken up by the intake vents to collect the heated exhaust air and return it to the chiller. The system should be capable of being balanced against load distribution in the room, increasing airflow where it’s most needed. In operation, the room should have no particularly hot or cold spots, although it will always be warmer “behind” a rack (where the hot air is typically exhausted) than in front. Many possibilities exist for air distribution: up through a raised floor, down from the ceiling (but be careful about dripping condensation), from a single heat exchanger run from a remote chilled water supply, or from multiple units installed locally.

Before leaving our discussion of physical space, remember to light the room brightly. It’s hard to see tiny jumpers in the dim or dark. If at all possible, reserve room in the cluster space for a small work bench with tools, some rolling chairs, a cart with a monitor and keyboard, and perhaps a closet for spare parts. Some simple cluster creature comforts like these minimize the time and effort required to install and physically maintain the nodes.

Choosing the Right Linux

Even if you’ve chosen “Linux” for your cluster, there are still many choices to be made.

There are many general purpose Linux distributions, each with its own advantages and disadvantages. There are also specialized Linux “cluster” distributions, including one from Scyld (a company founded by many of the original NASA Goddard Beowulf group) that is designed for building true beowulf compute clusters. Finally, there are many vendors who are happy to provide you with a ready-to-operate or “turnkey” cluster. Amazingly, a turnkey Linux cluster can retail for as little as the OTC hardware cost, plus a 10-20% “integration charge,” which is quite reasonable.

Turnkey vendors are also happy to help you with your infrastructure design on a consultative basis if you find the prospects and process of building a cluster daunting or have particular problems that this article doesn’t address.


The final aspect of physical infrastructure to consider is network access — not the network backbone of the cluster itself, which is likely to be local to the cluster and simply a matter of routing wires to switches within the room — but access to the cluster from other networks.

Some clusters are intended to be operated “locally,” or from a head node or other access point physically co-located with the cluster, with no access to a WAN or outside LAN. This configuration is fine, provided that you’re willing to cope with the environment: a loaded cluster room sounds like a 747 taking off and is typically cold enough to require a jacket or sweater, a hat, and possibly ear muffs or headphones (listening to music helps to damp out the noise and keeps your ears warm).

Most clusters, however, integrate with a building LAN so that users can access the cluster from their offices. Many clusters even permit access from a campus WAN, or across the Internet (secured with ssh). In either case, if you want to grant “remote” access, make sure that the physical cluster space contains fiber or copper connections to the appropriate backbone.

Physical Infrastructure Costs

Space, power, air conditioning, and network access all cost money to provide. There’s an initial expense — the capital investment required to build or renovate a space to make it suitable for your cluster — and recurring expenses for using the space.

The capital cost is highly variable (obviously), but can easily be in the tens to hundreds of thousands of dollars, depending on the capacity desired, availability and cost of power, AC, network connections, and more. This cost must be viewed as being amortized over the lifetime of the space. For example, a $30,000 renovation for a space to hold 100 nodes over ten years adds a cost of $30 per node per year. Adding “rent” and “interest” might push this to $50 per node per year. Viewed that way, the investment is meager, except that the $30,000 must be provided “up front” if no suitable space already exists.

Recurring costs can be estimated as follows:

  • Assuming a power grid where electricity retails for $0.08 per KW-hour, a simple calculation shows that 1 W of power, used 24 hours a day for a year, costs about $0.70.

  • The cost of the AC needed to remove that watt can be estimated at around $0.30 for the year.

So, it costs approximately $1 per watt per year to estimate recurring, combined costs for power and AC (This might be high or low by a factor of 2 or more, depending on actual power costs and things like ambient outside temperatures in your area).

If a single-CPU cluster node draws 100 watts when working, it costs between $100 and $150 (including the $50 estimate for amortized renovation costs) just to run it for a year. A 100-node cluster can be estimated to have at least $10,000 per year in recurring infrastructure costs just to keep it turned on, and could cost well over twice that for dual CPU nodes in a space that required extensive renovation!

As you can see, the physical infrastructure of a compute cluster large enough to be considered a “supercomputer” may cost far less per CPU to purchase, but it requires very much the same physical infrastructure as a comparable “big iron” supercomputer: a suitable space, plenty of power, and sufficient cooling capacity to remove all of that power as it’s converted into heat by the cluster’s operation.

You have to factor in all of these costs (both fixed and recurring) into your “total cost of ownership” (TCO) budgeting for the cluster. That raises the cost of the cluster over a naïve estimate that included only the cost of the hardware, but the TCO is still quite low.

However, we still need to consider the second aspect of cluster computing: management and operational infrastructure. How difficult (and hence, “expensive”) is it to manage and monitor cluster nodes? The next few sections examine these important aspects of cluster infrastructure.

Cluster Installation and Maintenance

Little specialized skill is required to physically install a compute cluster once the physical infrastructure (space with racks or shelves, power, and cooling) is ready to receive the nodes. Almost anybody can remove tower units or rackmount systems from their boxes and shelve them or rack them, as the case may be. Cabling them up neatly (both network cabling and power cabling) is easily done with a pack of cable ties or specialized rack cable supports. Configuring network switches is often just a matter of patching in the cables.

Easy does not mean thoughtless. Work neatly and select a layout that avoids future maintenance problems. For example, you don’t want to have to shut down and disassemble a rack to get at a single node, or to disentagle a knot of spaghetti wiring to pull out a single bad network cable. Think carefully about ventilation and wire runs. Try to keep network cables far away from sources of interference (such as fluorescent lights). Watch bend angles and avoid kinks in cabling and (especially) fiber. Use patch panels and cable bundles with suitable connectors to interconnect racks and switches. Small investments here in better-than-minimal hardware can pay off in reduced downtime and maintenance later.

The one “tricky” part of building a big cluster is installing a suitable image of Linux on all of the cluster nodes. However, these days, Linux is quite possibly the most scalable operating system in the world in terms of installation. It’s relatively easy to uncrate a node, connect it to the network, power it on, and have Linux installed automatically. [The instructions for doing so are beyond the scope of this article, but you can find a detailed discussion of automating installs in the "System Cloning" feature in the December 2002 issue of Linux Magazine, available online at http://www.linux-mag.com/2002-12/cloning_01.html.]

In fact, almost all of the software maintenance of the node can be fully automated. You can even automate package management and updates with yum (If you’re a Linux LAN administrator and have never tried yum, you’ll probably drool when reading the sidebar “yum is Yummy.”). For example, running yum update as root updates all packages on your cluster to the current revisions stored in the yum repository. For cluster nodes, services such as DHCP, PXE, NFS, Red Hat’s kickstart, and yum translate to virtually no hands-on software management.

yum is Yummy

yum stands for “Yellowdog Updater, Modified,” and is a tool that can be used either in scripts or manually to “front end” nearly every aspect of software package management on an RPM-based Linux system (and not just RedHat). yum can perform or automate the following tasks: install and remove packages by name or wildcard/glob; retrieve information about packages by name or wildcard/glob; update installed packages to the latest images (determined by revision/release numbers) available within any number of repositories; list installed and uninstalled (but available) packages, on a given host, by name, wildcard/glob, or everything; and determine which packages provide given features or files.

yum has a trivial command interface. It’s easier to illustrate this

with a few examples than to describe it in detail. For example:

rgb@ganesh|T:277>yum provides \*vmstat
Gathering package information from servers
Getting headers from: Phy 7.3 RPMS
Getting headers from: Dulug 7.3
Finding updated packages
Downloading needed headers
Looking in available packages for a providing package
No packages found
Looking in installed packages for a providing package
Installed package: procps provides /usr/bin/vmstat
1 results returned

Note that root privileges weren’t required to run this unprivileged command. Note also that yum searched multiple repositories (listed in /etc/yum.conf) as well as installed files until it found a package that contained the requested binary. By the way, the package did not have to be installed to get this information.

Now let’s see if procps is installed, and what other ps-related packages might be available in the archive(s) or installed on host ganesh:

rgb@ganesh|T:278>yum list \*ps
Gathering package information from servers
Getting headers from: Phy 7.3 RPMS
Getting headers from: Dulug 7.3
Finding updated packages
Downloading needed headers
Looking in Available Packages:
Name Arch Version
gimp-print-cups i386 4.2.0-9
qtcups i386 2.0-7
w3c-libwww-apps i386 5.3.2-5
Looking in Installed Packages:
Name Arch Version
a2ps i386 4.13b-19
bg5ps i386 1.3.0-7
cups i386 1.1.14-15.2
h2ps i386 2.06-2
ksymoops i386 2.4.4-1
procps i386 2.0.7-12
tetex-dvips i386 1.0.7-47.1

You can see that procps is installed, including the arch and

revision/release. yum list \* would list all the packages installed and available.

Let’s see an install work — that requires root privileges:

ganesh: yum install xdaliclock
Gathering package information from servers
Getting headers from: Phy 7.3 RPMS
Getting headers from: Dulug 7.3
Finding updated packages
Downloading needed headers
Resolving dependencies
Dependencies resolved
I will do the following:
[install: xdaliclock.i386]
Is this ok [y/N]: B<y>
Getting xdaliclock-2.18-8.i386.rpm
Calculating available disk space – this could take a bit
xdaliclock 100 % done
Installed: xdaliclock.i386
Transaction(s) Complete

yum went to the repositories, determined all the dependencies of the package, prompted for a chance to quit (which can be surpressed, along with the verbose process trace, for automated scripting) and then retrieved the rpm and installed it.

If ultimately you decide that xdaliclock is silly, then…

ganesh: yum remove xdaliclock
Gathering package information from servers
Getting headers from: Phy 7.3 RPMS
Getting headers from: Dulug 7.3
Finding updated packages
Downloading needed headers
Resolving dependencies
Dependencies resolved
I will do the following:
[erase: xdaliclock.i386]
Is this ok [y/N]: B<y>
Calculating available disk space – this could take a bit
Erased: xdaliclock.i386
Transaction(s) Complete

…and it’s gone (again, the remove command requires root privileges). Finally…

ganesh: yum update
Gathering package information from servers
Getting headers from: Phy 7.3 RPMS
Getting headers from: Dulug 7.3
Finding updated packages
Downloading needed headers
No actions to take

…shows that ganesh is absolutely current relative to its source archives — all patches and the latest version updates are installed for every package installed on ganesh.

This isn’t surprising: ganesh is kept current by means of a nightly cron script, as are some 700+ other Linux systems on the Duke campus. Seth Vidal (who wrote and maintains yum ) has only to drop an “urgent” security or functional update (for example) into the master campus repository before he leaves work in the evening and by morning every Linux system on campus (that was installed from the primary repository) will have been updated, including all cluster nodes.

Monitoring a Cluster

The other element of cluster management is node monitoring. Linux is phenomenally stable, but even it can crash, especially when being hammered by a parallel application with lots of memory in use, the CPU being driven at peak, and the network being used to transmit lots of complex messages. In addition, while the probability that a single workstation experiences a hardware failure on any given day is low, the probability that one node out of a hundred fails in any given day isn’t so low at all! Maintaining high productivity in a cluster depends on identifying node failures (whatever the source) quickly and initiating appropriate action.

An absolutely essential part of monitoring (and securing) a cluster is ssh. Do not use rsh or telnet for node access. Also, shut down all unused ports and daemons, and use something like syslog-ng to ship all the syslog traces from the cluster nodes to a single log host (perhaps the head node or a LAN log server) where they can easily be monitored all at once. This can be used to track a wide class of failures, faults, and security violations on a per node or cluster basis.

For example, when a disk or memory chip starts to fail, it often creates failures that are picked up and recovered by the kernel and logged by the system logger. The node’s administrator can then at least try to schedule the installation of a suitable replacement at a convenient time.

The other part of monitoring is the active monitoring of a node’s operational status. For example, one might be interested in tracking a given node’s load averages, memory utilization, network utilization, and process pool. Monitoring is important to the cluster’s administrator and the cluster’s users, and no particular privileges or restrictions need apply to monitoring.

On a single workstation, a user can monitor the machine with a variety of tools that typically work with information provided by the kernel in /proc/. ps, vmstat, uptime, free, all provide various views of this data, or one can just do a cat /proc/meminfo and read a snapshot from the raw interface.

On a network of workstations or cluster nodes, this is much more difficult to manage. True, one can run ssh node0 cat/proc/meminfo one node at a time to check node memory utilization, but this is highly inefficient, even if wrapped in a script that loops over nodes. Also, one typically needs a running display of node state, something that gives you an immediate view of the state of all the nodes.

There are two distinct ways of providing this information to a cluster user or manager. One is to make the cluster a true Beowulf cluster with a single process ID (PID) space and single “head,” upon which specialized, integrated tools transparently provide all this information. This is the approach supported by Scyld and Clustermatic, based on the bproc tool.

The other technique is the node-is-a-specialized-workstation approach given above (which might be part of a “grid” computer, a mixed purpose LAN cluster, or other). In this case, one typically needs to install specialized monitoring components — for example, a daemon on the host to be monitored plus tools or daemons to collect the data and make it available in various forms for display or action.

One monitoring tool [of many written by the author] is xmlsysd, an unprivileged daemon that runs on a cluster node and dynamically parses out information culled from /proc and various systems calls. Once it’s gathered the information, xmlsysd packs the information in an XML-like message, and returns it to a connecting agent called wulfstat. wulfstat uses a simple tty interface to present a simple, scrollable view of an entire cluster’s (or LAN’s) “state” at a glance. Different views exist for load average, memory utilization, network utilization, and a combined display of system properties. It even provides a view of “running processes” on all the nodes.

There are (lots of) other other clustering tools and approaches as well. The open source cluster computing community is nothing if not imaginative, talented, and productive. The best known (and probably best) of these is ganglia, and the beowulf underground web site (http://www.beowulf-underground.org, described as “freshmeat for beowulfers”) is a decent place to shop around.

Whether or not monitoring tools are necessary in your environment depends a great deal on the design and purpose of your cluster. For many clusters, the tools do not form a distinct management cost.

All in all, installing a cluster and setting up fully automated management (especially in a pre-existing LAN environment) and monitoring tools is amazingly simple and very inexpensive. Even in the worst case, where a cluster is being built from scratch in an environment with no pre-existing LAN to integrate with, the costs are likely to be modest and controllable as long as a suitable model for the cluster is chosen. Let’s estimate the scaling of these costs.

Management Infrastructure Costs

In any environment that already has a web server, a file service LAN (with user accounts and NFS exported home directories and workspaces), and boot services (DHCP, PXE, and kickstart), much if not all of the cluster’s software infrastructure can be set up in a matter of a day’s work by a skilled system administrator. The startup cost to build a cluster from scratch in an environment that does not have any of these resources (especially the skilled administrator) is naturally greater. And in either case, additional time is generally required for tuning a cluster, building local packages and other core administrative tasks.

Once the core services are in place, installing a cluster node requires the time required to physically set it up (install it in rack or on shelf and cable it up), plus about five minutes. A single cluster administrator can probably install anywhere from 16 to 64 nodes a day, depending on how much work they have to do to set up the node and how they organize the task.

Once installed, each node “takes care of itself” at the cost of keeping the software repository it draws from current, keeping a general eye on the cluster, dealing with hardware failures and other crashes, and performing routine LAN management tasks on the servers and on behalf of the cluster users. You’ll probably find that the number of nodes that can be supported by a single administrator (post-installation) is determined primarily by the rate of hardware failure, as each hardware failure takes several hours of administrator time to resolve. Roughly speaking, a good administrator should be able to install and care for hundreds of nodes, provided that the nodes are reasonably reliable.

It isn’t as simple to estimate the cost of system management (as it was with physical infrastructure) because so much of the cost is “opportunity cost,” whose actual value is determined by the scarcity of competent Linux systems managers in your environment, how encumbered they already are, and the value and displaceability of their existing tasks. For example, if you have no Linux management infrastructure and have to hire a Linux manager just for your cluster, then the cluster will cost one full administrator’s salary.

In all other cases, administrative costs can be “guesstimated” to be: on the order of a few days a year for the cluster as a whole; on the order of an hour per node per year doing node-specific software and administrative tasks; and an indeterminate amount of time doing hardware maintenance. (Sorry, but this last quantity depends on hardware reliability and service contracts and luck. Some clusters run and run without actually touching the nodes, whereas a poor choice of motherboard or power supply yields problems that suck up time like a vaccuum cleaner.)

Installing and running a cluster of 128 nodes might take on the order of about one month of skilled systems administrator time per year, much of which can be scheduled and put off to keep the opportunity cost low. However, 512 nodes might drive the same, unaided administrator outside their sane operational boundaries on a day when ten nodes decide to fail all at the same time or when critical tasks start to pile up. Entire institutions have been known to froth at the mouth when a cluster design turns out to be a “maintenance problem” with more than daily failures!

The cost estimate for the administrator does not include the cost of providing support (especially programming support) to the users of the cluster or the cost of providing general LAN-level support (such as running backups, dealing with printing and X configuration issues on user workstations, and helping users figure out how to use their CD player to listen to music while they work). This estimate is for cluster-specific installation and maintenance costs only. A more reasonable estimate that includes some of these nonlinearities might have a single administrator managing anywhere from 2 to 400 nodes, depending on skill, luck, and local environment.

Still, however much your mileage may vary, the number of nodes that any one person can manage — up to 400, say — is huge, by any standard. The vaunted “Total Cost of Getting Work Done” (the true measure of TCO that includes the benefit in a cost-benefit analysis) for a well-designed Linux cluster in a well-planned and competently-managed infrastructure environment is far lower for most parallel applications than any competing “big iron” solution. It’s also a lot easier (and cheaper) to get started if you do not happen to have all that great a local environment.

One can easily build a small “learning” cluster for a few thousand dollars or out of resources and systems already at hand, either at work or at home. High school students can build one for a project, as can graduate students setting up a small cluster overseas.

The Bottom Line: Cheap

The obvious, striking conclusion is that setting up a Linux OTC cluster for doing high performance computing tasks at any scale, from small to quite large indeed, is incredibly inexpensive by any reasonable measure. The physical infrastructure requirements are straightforward and amount to providing the cluster with room and board, with highly scalable fixed and recurring costs. Even the turnkey and commercial Linux solutions, where other people make money selling you the cluster, have very, very reasonable per-node cost scaling.

This, of course, accounts for why we are in the middle of a veritable explosion of Linux-based clusters in universities, government labs, and increasingly, in corporations. Duke University has well over 32 independent clusters in various research groups and continues to get even more. There’s simply no cheaper way to obtain HPC resources.



Dr. Robert Brown earned his Ph.D. in physics from Duke University in 1982. Since then, he’s taught physics, done considerable research, and developed professional knowledge of computer science, Linux, and compute clusters. Dr. Brown writes extensively, including a book on Beowulf cluster engineering and poetry. You can find his work and much more on his home page at http://www.phy.duke.edu/~rgb.

Comments are closed.