dcsimg

Some Relief For Cluster Consternation

What’s stopping clusters from being useful tools?

My daughter started school the other day. She came home and said the teacher recommended that students get a graphing calculator. Mind you, it wasn’t the hundred bucks for the calculator that prompted me to grab a pencil and paper and say, “Back in my day, this was our graphing calculator. No batteries needed. And can even keep the stylus on your ear.” Instead it was the idea that the mastery of pencil and paper was becoming a lost art. After all, if you’re on a desert island and need to plot a parabola, where are you going to find a graphing calculator?

At that point, my wife joined the conversation and mentioned that my daughter’s teacher probably used a graphing calculator in high school and college as well. Sigh. Nothing like feeling old. When I was in my college years, there were classes devoted to learning how to plot equations on these large, toggle-switch laden things called minicomputers. The art has improved a bit since then and graphing calculators are but one example. In my daughters school, graphing calculators are used to teach subjects — they’re not a subject per se. If I don my analogy hat on for a moment, I think back to the day, with all of the associated consternation, minicomputers let me do better math and science.

Graphing calculators can now handle many of those chores and presumably will help my daughter do better math and science. Thinking about clusters, with all of their modern-day consternation (see the sidebar, “Big Word Alert”), I have to ask what’s stopping clusters from being a useful tool? Instead of plotting equations, however, we can calculate and plot entire solution spaces right in the classroom.

Education is Obvious

One obvious solution is enhanced education. Ah, well, there’s the catch. Check your favorite search engine for “Beowulf cluster courses” or “Linux cluster courses” and some of the top responses are from people asking for such a course. There were some hits, however, and I’ll mention those later.

So why no cluster courses? I propose three reasons.

1.First, up until about three years ago, clusters were a fast-moving target. They’re still moving about, but not quite as fast. There are now some “reproducible” methods that include pre-built distributions and toolkits. Beyond node provisioning, things are a bit more settled, but compilation and invocation issues can still vary widely. For instance, multiple MPI versions often can cause confusion among users (unless some type of environment management is used (for example, Modules at http://modules.sourceforge.net). Users usually have to contend with making sure they know where the various MPI libraries/binaries live on the cluster. (Nothing like trying to start an LAM/MPI job with an MPICH version of mpirun.)

2.There’s also the issue of finding people to teach cluster concepts and practices. Most of the rugged individualists and cluster pioneers are too busy to teach or write about clusters. There are exceptions of course, but in general, with clusters, it seems you’re either plowing the field or building the plow. Taking time to explain your craft just doesn’t seem to fit in the workday.

3.The final issue is the scope of what can be taught. To date, most classes or tutorials that I’ve seen have been aimed at setting up and administering a basic cluster. (Interestingly, there’s no “How to Build a Cluster” tutorial at Supercomputing 2006 this year, although there has been such a class in years past).

When one talks about clusters, there are really two sub-groups that need to be addressed: administrators and users. The administrators are interested in provisioning the cluster to meet users’ needs and minimizing the work needed to maintain and upgrade the cluster. The users, on the other hand, want to run codes and use the cluster effectively. There is some overlap, but in general, these two constituencies have very different agendas. Obviously, the majority of cluster courses have been on provisioning a cluster, which needs to happen before you can address the “domain expert” (that is, users) issues.

Fortunately, I believe the impact of these issues is lessening. Cluster recipes have settled down a bit, and more people know how to “do clusters.” The Beowulf mailing list (http://www.beowulf.org/) and Cluster Monkey (http://www.clustermonkey.net/) are good resources as well. In my opinion, the real challenge is going to be packaging and bringing this information to the domain experts. Economic issues aside, clusters should be as easy to use as a graphing calculator.

Apprenticeships Do Not Scale Well

As I mentioned above, finding people to lecture about clusters is difficult. This situation then begs the question, “In the absence of courses, how does one learn about clusters?”

There seem to be two ways, both of which do not scale very well. First, you have the old fashion apprenticeship. Working with someone who has clusters (and knows what they are doing) is great way to learn. The other approach is to set off by yourself, and with the help of a mailing list or two, build your own cluster. Both approaches take time. The do it yourself approach allows you to make mistakes which of course is how you learn about most things.

However, most cluster people — be they users or administrators — usually don’t go into such projects blindly. They bring with them some “carry over” from other areas of computing. Indeed, a large portion of “cluster know how” comes from other established areas of computing that have educational infrastructure (manuals, mailing lists, freely available software, and even courses) already.

Consider the following topic list:

*Message Passing Interface(MPI). Because MPI has been around before clusters hit the big time, there are numerous books and classes that facilitate learning MPI. Also, Parallel Virtual Machine was a great way to connect workstations and learn about parallel computing.

*Compilers. Most cluster experts have a good understanding of compilers and building code. Understanding that the long stream of error messages can be due to a missing library (and hence are easily fixed) prevents the bewilderment that comes with trying to build that new software package in your environment.

*Operating System Administration. Opportunities to learn about operating systems are plentiful. Three-inch thick books are in good supply, as well as certification classes and training.

*Commodity Hardware. Most clusters use off-the-shelf hardware. Resources for understanding commodity hardware are plentiful, although nothing works like having a motherboard or two with which to test ideas.

*Schedulers. Resource scheduling has been around ever since people started sharing computers. There are resources to help learn about schedulers and like most things, a little hands-on time does wonders.

*Networking. Networking is perhaps the toughest area to find good information — even in cluster courses. For many other cases, non-optimal network performance works quite well for just browsing theWeb or transferring a file. Although much of Linux networking is plug-and-play, there is room for optimization when it comes to clusters. High-end interconnect networks have in the past been even more obscure. Fortunately, the market seems to be focusing on either 10-Gigabit Ethernet or Infiniband solutions, and many of the high-end network companies are moving in this direction as well.

In my opinion, a good cluster course puts all this together and provides insights on how to weave the essential parts of these components into a well-oiled cluster machine. Of course there are some exclusive cluster issues which deal with parallel computing, but a good grasp of the above issues is creates a solid foundation.

Real Cluster Courses

To fill in the gaps and get a chance to ask questions, there are real cluster courses and places to go to learn about clusters.

The most exciting area is the The National Center of Excellence for HPC Technology (NCEHPCT). The NCEHPCT is a consortium of four community colleges that develops educational programs in high performance computing technology (http://highperformancecomputing.org/). The colleges include Maui Community College (Maui, HI), Contra Costa College (San Pablo, CA), Pellissippi State Technical Community College (Knoxville, TN), and Wake Technical Community College (Raleigh, NC). For example, it is now possible to get an Associates degree in High Performance Computing (HPC) from Wake Technical Community.

If you’re interested in tutorials, you may want to investigate the Linux Cluster Institute (http://linuxclustersinstitute.org), which includes educational sessions as well as technical papers. There is also the annual IEEE Cluster meeting (http://www.clustercomp.org/). It tends to be a bit more research-oriented, but does have some tutorials.

Of course there is always annual Supercomputing show (http://supercomputing.org/). There are some other short courses as well. Recently, the Advanced Research Computing Group at Georgetown University (http://arc.georgetown.edu/) presented “An Introduction to Beowulf Design, Planning, Building and Administering”. Some Googling may find other islands of cluster education in your area.

The final step is figuring out the various levels of HPC cluster certification, so that when the boss says “Go get me one of the cluster things and some people to run it,” you can find real people with cluster skill sets.

Books, Books, Books

Since I’m highlighting some of the available resources for cluster study, here’s a brief survey of the currently available books.

There are now eleven cluster books of which I am aware. A group of four books are based on the efforts of Thomas Sterling. The first book, “How to Build a Beowulf” by Sterling, Salmon, Becker, Savarese (ISBN 0-262-69218-X), is now a bit outdated. It does have some relevant parts, but most of the software it presents is now considered old.

The follow-on book by Sterling, “Beowulf Cluster Computing with Linux”, (2002, MIT Press, ISBN 0-262-69274-0), is a collection of topics edited by Thomas Sterling. The book contains a large amount of useful information from prominent community members. Sterling also edited a book entitled “Beowulf Cluster computing with Windows” (ISBN 0-262-69275-92) which shares some of the content with the Linux book. There is now an updated edition of the Linux version, edited by William Gropp, Ewing Lusk (in addition to Sterling). This version provides a very good, but high level view of Linux HPC clustering. It i! ncludes ROCKS and OSCAR coverage plus other important issues (ISBN 0-262-69292-93, 504 pages).

Robert Brown has a freely-available book entitled “Engineering a Beowulf-Style Compute Cluster,” which presents the design and construction of Beowulf style clusters. See http://www.phy.duke.edu/resources/computing/brahma/Resources/beowulf_book.php.

A good cluster background book is “In Search of Clusters” by Gregory Pfister (ISBN 0138997098). The book was written in the pre-Beowulf era, but has some very good (and detailed) technical analysis in it.

Several new books have appeared in the last year. A book called “Building Clustered Linux Systems” by Robert W. Lucke (ISBN 0-13-144853-66) provides a very good overview of cluster computing methods and hardware. The book provides a rather wide coverage of options, but doesn’t dive too deeply into any one approach. It is somewhat Hewlett-Packard-focused, as the author works for HP.

Another book is called “The Linux Enterprise Cluster” (ISBN 0-13-144853-65) by Karl Kopper. This book focuses on the enterprise cluster (not HPC) and covers failover, heartbeat, load balancing, reliable printing and Web serving, and how to build a job scheduling system. The book has good coverage and examples. Yet another recent book is called “High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI” by Joseph D. Sloan (ISBN 0-596-00570-92). It is O’Reilly’s second attempt at a Linux cluster book, yet many feel it too misses the mark.

There are two others books which I consider rather dated. The first is called “Linux Cluster Architecture” by Alex Vrenios (ISBN 0-672-32368-0). This book describes how to build a small cluster based on Linux. However, it misses a large part of the software that is used on HPC clusters today. The second book is called “Linux Clustering” by Charles Bookman (ISBN 1-57870-274-7). This book covers a wide range of Linux cluster systems and only dedicates several pages to the HPC area.

Getting There

Learning about clusters is still not easy, but it’s as hard as it used to be. With any luck, my daughter will need a cluster for college. Then I can pull out some old hardware and show her how it was done back in the day. Of course she will open her 64-core laptop, connect to a computational grid, and design a new protein all with a few key strokes and convince me she doesn’t need another one of dad’s good ideas.

Doug has enlisted more monkeys to help him randomly type a book on clusters. More is better. A preview is available at http://www.clustermonkey.net/content/view/128/53/.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62