Linux on high-performance computing clusters seems an obvious choice now, but it wasn't a forgone conclusion when Thomas Sterling and Donald Becker used Linux to build the world's first Beowulf cluster in 1999. Linux has come a long way since then. Learn why Linux has put "super" back into supercomputers.
Over the past few years, high-performance computing clusters — tightly- or loosely-coupled collections of low-cost, off-the-shelf computers — have largely supplanted proprietary supercomputers. Even with hundreds or even thousands of compute nodes, a commodity hardware HPC cluster is a fraction of the expense of the likes of IBM’s Blue Gene or Fujitsu’s RIKEN Super Combined Cluster.
At the same time, Linux features have been greatly expanded and refined. Today, Linux is as capable as any Unix, and because Linux source code is widely and freely available, Linux has also been customized to scale up or scale down to virtually any computer configuration.
Perhaps obviously, the combination of HPC clusters and Linux has become the de facto standard for solving complex computing problems. In academic and research environments, the open, unencumbered, even “home grown” spirit of the two solutions holds great appeal to bleeding-edge yet often budget-constrained scientists.
But what is it about Linux that has made it so successful in HPC? And more importantly, what can its success tell us about how we might approach HPC in the future? Let’s find out.
In the Beginning …
One of the first HPC efforts to use Linux was The Beowulf Project. (For more information, see the sidebar “What is a Beowulf?”) According to project founder Thomas Sterling, Linux wasn’t the most capable operating system in 1993, the year Beowulf was launched, but it was the most appealing:
Beowulf is more a concept or a methodology than a thing. A very good operational definition can be found in one of the first books on the subject, How to Build a Beowulf:
“A Beowulf is a collection of personal computers (PCs) interconnected by widely available networking technology, running any one of several open source Unix-like operating systems.”
So if you want to call your cluster a beowulf, you need to play by the rules. The key phrases are personal computers, widely available networking, and open source.
* Personal computers could probably be replaced with commodity servers/computers.
* The widely available networking part provides a bit of wiggle room.
One of the major design goals of early Beowulf systems was to keep them as common as possible. Commodity components keep costs low, avoid vendor lock-in, provide an upgrade path, and allow for rapid installation. The problem with using only commodity networking components is that it may severely restrict certain networking technologies that are crucial for performance. With this in mind, Beowulf’s can be built from less mainstream technologies, such as Myrinet, QsNet, SCI, or Infiniband. (The jury is still out whether Infiniband is a commodity technology.)
* The last requirement, open-source, is probably what differentiates a Beowulf from all other “clustered” systems. The use of an open source operating system allows customization and rapid changes that are virtually impossible with commercial operating systems. An HPC cluster composed of Windows NT or Solaris are not strictly Beowulf systems.
The history of the Beowulf project (http://www.beowulf.org) is quite fascinating. If you’re not familiar with the project or its history, see Tom Sterling’s article “Beowulf Breakthroughs” in the June 2003 issue of Linux Magazine (available online at http://www.linux-mag.com/2003-06/breakthroughs_01.html).
“To be sure, Linux wasn’t the first Unix-like operating system to run on a PC, and in the beginning, it wasn’t even the best. But unlike anything that had come before it, Linux was the focus and consequence of a revolutionary phenomenon made possible through the Internet: hundreds of coders from around the world, most of which had never met each other, working together, sharing expertise on a new operating system.”
As with Sterling, the early users of HPC systems were rooted almost exclusively in Unix, because features such as multiple users, large files, remote access, and so on, weren’t supported in Microsoft Windows, Apple’s MacOS, or other mass-market operating systems. Of course there were many different dialects of Unix, but all were still Unix.
For Unix users, Linux was very attractive because it was literally plug-and-play. Well, to be accurate, it was plug-recompile-and-play. Packages such as PVM and MPI ported very quickly to Linux.
Of course, the cost of Linux was also very attractive… it was free. Cobbling together a few Pentium Pro boxes with MPICH and Linux wasn’t an expensive proposition. Indeed, the combination of cheap processors and peripherals and Linux provided a very low barrier to entry to cluster computing.
The Open Thing
Of course, the openness of Linux (and Linux distributions), continues to hasten the growth of Linux for HPC clusters. As mentioned, an almost zero cost of entry provides easy access to test the technology. Once a system is shown to work, the advantages are at times staggering: a ten-fold improvement in price-to-performance over traditional HPC machines isn’t uncommon. People tend to notice those sorts of things.
HPC users also enjoy the ability to build their own kernel, with exactly what they need and no more. (Compute nodes don’t really need sound support, laptop power management, or those very interesting Ham radio extensions.) The compute node can be maximized (or minimized, depending how you look at it) in any number of ways for the problem at hand.
Indeed, users have the option of using simple, small monolithic kernels (where kernel modules are compiled in) for their specific needs. A small compact kernel means more memory to crunch numbers. In fact, kernels can be so small, that compute nodes can even run without the need for hard drives. With clusters, “less” is often better.
Another advantage of Linux is the ease in which hardware can access kernel internals.
This capability is particularly important with clusters. For example, a common method to optimize communications is called “kernel bypass.” In this method, communication takes place outside the kernel networking stack and memory is copied (messages passed) between two independent process spaces on different nodes. Of course, to implement this, you may need to modify the kernel.
In general,”cluster plumbing” is entirely open on Linux, allowing for customizations, optimizations, and fixes that aren’t applicable for the mainstream user. (For an example, see the sidebar “Josip’s Fix.”)
In March 1999, Josip Loncaric found something peculiar with the Linux TCP stack. It seemed that the kernel was introducing delays for small packets. While this behavior had minimal impact on pretty much every other corner of the market, the cluster community saw very poor performance for small packets.
The availability of source code allowed Josip to address the issue for two different kernels. His first patch resolved the issue for the 2.0.36 kernel, and a second patch for the 2.2.12/13 kernel allowed the user to tune the short packet behavior.
Those clusters users that relied on the the kernel TCP implementation were able to patch and rebuild kernels that worked better for small packets. There was no marketing decisions to be made, no release schedule, no non-disclosure agreement to sign, and no drawn out decision process. Josip just fixed it.
You can read a report about the fix at http://www.icase.edu/coral/LinuxTCP2.html.
Beyond kernel customizations, entire Linux distributions are now customized for use in HPC. For example, BioBrew is full Linux version tailored to bioinformatics users.
The classic business strategy of “lock-in,” selling a customer something that requires them to continue buying products and services, has fueled the growth of many companies. It’s also been the best source of boat anchors (kersplash!) in the HPC market.
Let’s consider a common scenario. Your organization buys a new supercomputer called the Whopper Z1 from Fly By Night Systems (FBN). The Whopper Z1 runs WOS 1.0, a version of Unix ported to the Z1. The computer works well for the first year, and everyone is happy.
Then, in the second year, you want to add more memory. Well, to keep the service contract intact, you need to buy the memory from FBN systems. Funny, it looks identical to the memory you bought for your home computer, but it costs ten times more. So, you upgrade the memory, and while you are at it, you upgrade to the next version of WOS (version 2.0). Everything is fine — until year three.
It turns out that the Whopper Z1 is now going “off contract” because a new replacement system you’re buying, called the Whopper Z2, has been installed. The Whopper Z2 also has a new version of WOS (version 3.0), which doesn’t run on the old Z1. Now the old Whopper Z1 is pretty much useless, and will be kept on-line for another year solely to allow everyone to move their codes over to the new machine.
By year four, you can’t really sell the Z1 or even use it because hardware and software support is expensive. (Ah, but if you tied a rope to it, it could indeed be used as a boat anchor.)
Now consider the scenario where Linux was used as the operating system. Since the source code is available, you can choose to keep the old Whopper Z1 running without a support contract. You can find people who can help you fix things. You may even have some “Linux hackers” on staff because they have been running Linux at home for five years. And, as you find out, this is a good thing, because FBN Systems goes out of business and you’re stuck with two large pieces of hardware and a binary version of WOS.
In the end, “vendor lock-in” is always bad for the customer. No one likes to hear, “You can’t do that.” The words “can’t” and “Linux” aren’t often heard in the same sentence.
Ownership and Community
In the absence of anyone saying “No, you can’t,” there are many people saying, “What if?” In a sense, Linux has become the paint by which a technologist can express whatever he or she wants to. Expressing your ideas is tantamount to ownership.
And since many hands have helped create the “practice and art of cluster computing,” you can become a co-owner by simply helping a new person with a question on the Beowulf mailing list. The “community knowledge base” is quite immense and growing each day. If you experience a problem or have a question, rest assured, there is almost always someone else who is an email away from helping who has suffered a similar fate. And, in an open environment, the quality of the answers is high.
Not All is Rosy
In fairness, not everything is rosy in HPC Linux. There are issues unique to this environment that have yet to be addressed.
Perhaps the most important issue is how can independent software vendors (ISV) target a fast moving and diverse software environment? It’s not an easy problem, yet it’s a problem that needs to be solved.
In addition, an ISV needs a hard line between their product and the cluster infrastructure. For instance, A misconfigured MPICH library should not be the problem of ISV (although it often is).
Finding good professional support is also an issue. Clusters are diverse, making support from a single source difficult. New support models that leverage the openness of the cluster infrastructure may be the best way to proceed.
Beyond ISVs and support, another challenge lurking may be entire kernel forks for HPC systems. A highly-optimized HPC kernel may diverge so much from the original kernel that a new version is warranted.
It’s Not Really a Linux Thing
Clusters have been and will continue to be built with closed source systems. But open source seems to be a much better way to address a small market with specific needs.
In a way, the success of Linux in the HPC world is as much about Linux’s openness as it is about its Unix heritage. Fundamentally, we all like options… the more the better. Linux on clusters, like Linux on most other things, maximizes choice.
In the HPC world, the decision to use Linux may seem like the natural choice, but remember, “… you didn’t come here to make the choice, you’ve already made it. You’re here to try to understand why you made it. I thought you’d have figured that out by now.”
Douglas Eadline is the Editor-in-Chief of
ClusterWorld Magazine (http://www.clusterworld.com), a magazine designed for both the novice and experienced cluster user.
ClusterWorld includes a variety of “how to” information that’s similar to what you find in
Linux Magazine. Starting in January,
ClusterWorld will be running a five-part series on how to build an eight-processor/node cluster for under $2500. You can subscribe at http://www.clusterworld.com. Doug can be reached at email@example.com.