Some relief for cluster consternation.
My daughter started school the other day. She came home and said
the teacher recommended that students get a graphing calculator.
Mind you, it wasn’t the hundred bucks for the calculator that
prompted me to grab a pencil and paper and say, “Back in my
day, this was our graphing calculator. No batteries needed. And can
even keep the stylus on your ear.” Instead it was the idea
that the mastery of pencil and paper was
becoming a lost art. After all, if you’re on a desert island
and need to plot a parabola, where are you going to find a graphing
At that point, my wife joined the conversation and mentioned
that my daughter’s teacher probably used a graphing
calculator in high school and college as well. Sigh. Nothing like
feeling old. When I was in my college years, there were classes
devoted to learning how to plot equations on these large,
toggle-switch laden things called
"i">minicomputers. The art has improved a bit since then and
graphing calculators are but one example. In my daughters school,
graphing calculators are used to teach subjects —
they’re not a subject per se. If I don my analogy hat on for
a moment, I think back to the day, with all of the associated
consternation, minicomputers let me do better math and science.
Graphing calculators can now handle many of those chores and
presumably will help my daughter do better math and science.
Thinking about clusters, with all of their modern-day consternation
(see the sidebar, “Big Word Alert”), I have to ask
what’s stopping clusters from being a useful tool? Instead of
plotting equations, however, we can calculate and plot entire
solution spaces right in the classroom.
Education is Obvious
One obvious solution is enhanced education. Ah, well,
there’s the catch. Check your favorite search engine for
“Beowulf cluster courses” or “Linux cluster
courses” and some of the top responses are from people
asking for such a course. There were some
hits, however, and I’ll mention those later.
So why no cluster courses? I propose three reasons.
1.First, up until about three
years ago, clusters were a fast-moving target. They’re still
moving about, but not quite as fast. There are now some
“reproducible” methods that include pre-built
distributions and toolkits. Beyond node provisioning, things are a
bit more settled, but compilation and invocation issues can still
vary widely. For instance, multiple MPI
versions often can cause confusion among users (unless some type of
environment management is used (for example,
usually have to contend with making sure they know where the
various MPI libraries/binaries live on the cluster. (Nothing like
trying to start an LAM/MPI job with an
MPICH version of
2.There’s also the issue
of finding people to teach cluster concepts and practices. Most of
the rugged individualists and cluster pioneers are too busy to
teach or write about clusters. There are exceptions of course, but
in general, with clusters, it seems you’re either plowing the
field or building the plow. Taking time to explain your craft just
doesn’t seem to fit in the workday.
3.The final issue is the scope
of what can be taught. To date, most classes or tutorials that
I’ve seen have been aimed at setting up and administering a
basic cluster. (Interestingly, there’s no “How to Build
a Cluster” tutorial at Supercomputing
2006 this year, although there has been such a class in
When one talks about clusters, there are really two sub-groups
that need to be addressed: administrators and users. The
administrators are interested in provisioning the cluster to meet
users’ needs and minimizing the work needed to maintain and
upgrade the cluster. The users, on the other hand, want to run
codes and use the cluster effectively. There is some overlap, but
in general, these two constituencies have very different agendas.
Obviously, the majority of cluster courses have been on
provisioning a cluster, which needs to happen before you can
address the “domain expert” (that is, users)
Fortunately, I believe the impact of these issues is lessening.
Cluster recipes have settled down a bit, and more people know how
to “do clusters.” The Beowulf mailing list (
"story_link">http://www.beowulf.org/) and Cluster Monkey
"story_link">http://www.clustermonkey.net/) are good resources
as well. In my opinion, the real challenge is going to be packaging
and bringing this information to the domain experts. Economic
issues aside, clusters should be as easy to use as a graphing
Apprenticeships Do Not Scale Well
As I mentioned above, finding people to lecture about clusters
is difficult. This situation then begs the question, “In the
absence of courses, how does one learn about clusters?”
There seem to be two ways, both of which do not scale very well.
First, you have the old fashion apprenticeship. Working with
someone who has clusters (and knows what they are doing) is great
way to learn. The other approach is to set off by yourself, and
with the help of a mailing list or two, build your own cluster.
Both approaches take time. The do it yourself approach allows you
to make mistakes which of course is how you learn about most
However, most cluster people — be they users or
administrators — usually don’t go into such projects
blindly. They bring with them some “carry over” from
other areas of computing. Indeed, a large portion of “cluster
know how” comes from other established areas of computing
that have educational infrastructure (manuals, mailing lists,
freely available software, and even courses) already.
Consider the following topic list:
Passing Interface(MPI). Because MPI has been around before
clusters hit the big time, there are numerous books and classes
that facilitate learning MPI. Also, Parallel
Virtual Machine was a great way to connect workstations and
learn about parallel computing.
"b">Compilers. Most cluster experts have a good
understanding of compilers and building code. Understanding that
the long stream of error messages can be due to a missing library
(and hence are easily fixed) prevents the bewilderment that comes
with trying to build that new software package in your
System Administration. Opportunities to learn about
operating systems are plentiful. Three-inch thick books are in good
supply, as well as certification classes and training.
Hardware. Most clusters use off-the-shelf hardware.
Resources for understanding commodity hardware are plentiful,
although nothing works like having a motherboard or two with which
to test ideas.
"b">Schedulers. Resource scheduling has been around ever
since people started sharing computers. There are resources to help
learn about schedulers and like most things, a little hands-on time
"b">Networking. Networking is perhaps the toughest area to
find good information — even in cluster courses. For many
other cases, non-optimal network performance works quite well for
just browsing theWeb or transferring a file. Although much of Linux
networking is plug-and-play, there is room for optimization when it
comes to clusters. High-end interconnect networks have in the past
been even more obscure. Fortunately, the market seems to be
focusing on either 10-Gigabit Ethernet or
Infiniband solutions, and many of the
high-end network companies are moving in this direction as
In my opinion, a good cluster course puts all this together and
provides insights on how to weave the essential parts of these
components into a well-oiled cluster machine. Of course there are
some exclusive cluster issues which deal with parallel computing,
but a good grasp of the above issues is creates a solid
Real Cluster Courses
To fill in the gaps and get a chance to ask questions, there are
real cluster courses and places to go to learn about clusters.
The most exciting area is the The National Center of Excellence
for HPC Technology (NCEHPCT). The NCEHPCT is a consortium of four
community colleges that develops educational programs in high
performance computing technology (
colleges include Maui Community College (Maui, HI), Contra Costa
College (San Pablo, CA), Pellissippi State Technical Community
College (Knoxville, TN), and Wake Technical Community College
(Raleigh, NC). For example, it is now possible to get an Associates
degree in High Performance Computing (HPC) from Wake Technical
If you’re interested in tutorials, you may want to
investigate the Linux Cluster Institute (
"story_link">http://linuxclustersinstitute.org), which includes
educational sessions as well as technical papers. There is also the
annual IEEE Cluster meeting (
class="story_link">http://www.clustercomp.org/). It tends to be
a bit more research-oriented, but does have some tutorials.
Of course there is always annual Supercomputing show (
"story_link">http://supercomputing.org/). There are some other
short courses as well. Recently, the Advanced Research Computing
Group at Georgetown University (
"story_link">http://arc.georgetown.edu/) presented “An
Introduction to Beowulf Design, Planning, Building and
Administering”. Some Googling may find other islands of
cluster education in your area.
The final step is figuring out the various levels of HPC cluster
certification, so that when the boss says “Go get me one of
the cluster things and some people to run it,” you can find
real people with cluster skill sets.
Books, Books, Books
Since I’m highlighting some of the available resources for
cluster study, here’s a brief survey of the currently
There are now eleven cluster books of which I am aware. A group
of four books are based on the efforts of Thomas Sterling. The
first book, “How to Build a Beowulf” by Sterling,
Salmon, Becker, Savarese (ISBN 0-262-69218-X), is now a bit
outdated. It does have some relevant parts, but most of the
software it presents is now considered old.
The follow-on book by Sterling, “Beowulf Cluster Computing
with Linux”, (2002, MIT Press, ISBN 0-262-69274-0), is a
collection of topics edited by Thomas Sterling. The book contains a
large amount of useful information from prominent community
members. Sterling also edited a book entitled “Beowulf
Cluster computing with Windows” (ISBN 0-262-69275-92) which
shares some of the content with the Linux book. There is now an
updated edition of the Linux version, edited by William Gropp,
Ewing Lusk (in addition to Sterling). This version provides a very
good, but high level view of Linux HPC clustering. It i! ncludes
ROCKS and OSCAR coverage plus other important issues (ISBN
0-262-69292-93, 504 pages).
Robert Brown has a freely-available book entitled
“Engineering a Beowulf-Style Compute Cluster,” which
presents the design and construction of Beowulf style clusters. See
A good cluster background book is “In Search of
Clusters” by Gregory Pfister (ISBN 0138997098). The book was
written in the pre-Beowulf era, but has some very good (and
detailed) technical analysis in it.
Several new books have appeared in the last year. A book called
“Building Clustered Linux Systems” by Robert W. Lucke
(ISBN 0-13-144853-66) provides a very good overview of cluster
computing methods and hardware. The book provides a rather wide
coverage of options, but doesn’t dive too deeply into any one
approach. It is somewhat Hewlett-Packard-focused, as the author
works for HP.
Another book is called “The Linux Enterprise
Cluster” (ISBN 0-13-144853-65) by Karl Kopper. This book
focuses on the enterprise cluster (not HPC) and covers failover,
heartbeat, load balancing, reliable printing and Web serving, and
how to build a job scheduling system. The book has good coverage
and examples. Yet another recent book is called “High
Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and
MPI” by Joseph D. Sloan (ISBN 0-596-00570-92). It is
O’Reilly’s second attempt at a Linux cluster book, yet
many feel it too misses the mark.
There are two others books which I consider rather dated. The
first is called “Linux Cluster Architecture” by Alex
Vrenios (ISBN 0-672-32368-0). This book describes how to build a
small cluster based on Linux. However, it misses a large part of
the software that is used on HPC clusters today. The second book is
called “Linux Clustering” by Charles Bookman (ISBN
1-57870-274-7). This book covers a wide range of Linux cluster
systems and only dedicates several pages to the HPC area.
Learning about clusters is still not easy, but it’s as
hard as it used to be. With any luck, my daughter will need a
cluster for college. Then I can pull out some old hardware and show
her how it was done back in the day. Of course she will open her
64-core laptop, connect to a computational grid, and design a new
protein all with a few key strokes and convince me she
doesn’t need another one of dad’s good ideas.