Cluster training for both student and teacher. Plus, re-learning a big lesson the hard way.
Over the years I have made my share of cluster mistakes. Each problem presented an opportunity to learn something new, become a little smarter, get some scar tissue as it were. I had just such an opportunity this week as I as teaching Intermediate Beowulf: An Introduction to Benchmarking and Tuning as part of the ARC HPC Training at Georgetown University. I’ll get back to my teaching experience in minute, but first I wanted to talk about HPC education.
In one sense, discussing, the ARC HPC Training could be considered a shameless plug for trying to boost enrollment in the class I teach. (It is the best class ever — you really should sign-up for the next one, honest). However, the ARC Training is a much needed addition to the HPC market and community. When Jess Cannata and Arnie Miles first discussed teaching the course with me they explained the there would be a whole series of courses staring with the basics and then focusing on the important areas of cluster administration and use. Indeed, in addition to the course I teach, they offer Beowulf Design, Planning, Building, and Administering (September 16-19, 2008 and January 26-29, 2009), Grid Engine Configuration and Administration, Intermediate Sun Grid Engine Configuration (October 21-23, 2008), and Introduction to Programming Accelerators and Coprocessors (March 9-11, 2009). Other courses are in the works as well. Having experienced the efforts of the ARC HPC team first hand, I can attest to the commitment and dedication of all those involved. Of course there are other HPC trainings efforts and we can still use more. HPC is moving faster than ever and education is one of the gatekeepers to market growth. (Did you get that vendors?) Speaking of education, that a good segue to my recent learning experience.
The course I teach is designed for system administrators or talented end users who want to go a little deeper into cluster performance and tuning. One of the things I try to emphasize is that HPC pushes and uses hardware differently than most other markets. For example, consider an GigE switch. They all do the same thing right? You just plug in the cables and everything just works. That is unless you bought the switch that makes a carrier pigeon network look fast when all the ports are running at full speed. I often make the statement, “You can’t build a cluster by consulting those glossy data sheets. The devil, as they say, is in the details. Assumptions can ruin your day and waste time and money”. Great advice, too bad I ignored most of it this past week.
In the course, we use “mini-clusters” that consistent of a head node and one worker node. Each node has an on on board NIC, and InfiniBand or Myrinet/10G card, an AMD dual core processor, memory and a hard drive. The head node also has a NIC for LAN access. Basic stuff really. This class was the third time I was teaching the course and the second time we were using this hardware. In terms of software, I decide to us my Fedora based Cluster RPMS. The packages included things like LAM/MPI, MPICH1,2, OPEN-MPI, the NAS Parallel Test Suite etc. My previous version was based on Fedora 6 and I used Warewulf (now Perceus) to provision the nodes. In preparing for the class, I thought, I would use my new, but un-tested Fedora 8 version. In particular, I knew the nodes would not boot, but it seemed like a similar problem I had fixed before with kernel ramdisk block sizes. No worries, as I jumped in the car and headed to Georgetown.
Shorty after I arrived, Jess and I were doing our standard “night-before-the-class-getting-everything-working-routine”. Admittedly it is usually a long night, but we always had the hardware working when the class started the next day. As we started to work on building the min-clusters, Jess mentioned that he purchased two Intel GigE NICs for each node so we could play with some of the driver parameters and see how they compared to the on-board NVidia NICS. Simple enough, we’ll just set the node Intel NICs to PXE boot and be on our way. As the nodes booted we waited for the Intel BIOS message to appear so we could set PXE booting. It never showed up. Checking the BIOS boot options, there was no indication the card was even present. Darn. Oh well, let’s solider on and use the Intel NIC for LAN and go back to using the on-board NVidia NIC for the mini-cluster. Great plan, except, the Intel NIC would not work. It showed link connectivity to the switch but ethtool told us “Link detected: no”. Odd, the cards should work, I used them before in other hardware with the exact same software. Not to worry, we just popped in some good old 100BT NICs and were up and running.
Time to boot the nodes. No joy. The little boot problem I assumed would be fixed in about ten minutes, turned into a big problem. After a few yawns, we decided, let’s just drop back to the Fedora 6 software we used last time. It was very late, and we could work with the head nodes on the first day in any case and have the clusters ready on the second day.
The old software worked as expected. Well almost. As part of the class we compare GigE to InfiniBand and Myricom 10G. We also run some tests using Open-MPI so we can see the differences due to the hardware. Everything worked in the previous class. Of course it should work this time. It didn’t. The Myrinet installed and worked fine except for a random mpirun issue. The IB software installed fine, the diagnostics worked, but the MPI programs got stuck. We were not sure what happened and as far as we could tell everything was the same as last time.
Even with these problems, we were still able to teach a successful course (we had the test data from the previous course). In addition, we managed to demonstrate one valuable lesson; Assumptions can ruin your day and waste time and money. One could conclude that we planned to demonstrate this lesson as part of the course thus enlightening the students with true experiential learning. Or, on the other hand, one could conclude that the bone-head instructor did not listen to his own advice and assumed things should just work because they did before. I’ll leave the final determination as an exercise for the student. In the mean time, I’ll be testing a few assumptions over the next few days. The next time the course is offered, we’ll present a case study on the importance of eliminating the words “should work” from your vocabulary. And, remember for those that missed it the first time (and second time) Assumptions can ruin your day and waste time and money. In my case, maybe a tattoo would help.
Douglas Eadline is the Senior HPC Editor for Linux Magazine.