Commodity hardware is the norm in HPC. What about commodity software?
Rummaging through the HPC attic, I found some more material for my HPC Master Series. These are editorials written by the HPC leaders and pioneers for ClusterWorld Magazine. In case you missed it, I have already posted Beowulf in Chrysalis by Tom Sterling and The Grid by Ian Foster. Good stuff and worth reading at any time.
This week I want to add an excellent piece by Bill Gropp on Commodity Software. For those that don’t know, Bill and his crew at Argonne National Lab, have brought us the likes of MPICH, MPICH2, and PVFS2. In 2007, Bill left Argonne to join the University of Illinois at Urbana-Champaign as the Paul and Cynthia Saylor Professor in the Department of Computer Science.
Bill’s original editorial has the title Commodity Software? Is cluster software any good? an asks some very good questions. Reading it 5 years hence, there were several timely things that struck me about his post. His main point was the progress in software depends on both standards and the ability to innovate. I thought his example of NFS was right on the mark. (Sorry you have to read the piece to get the full explanation.)
At the time it was written, the commodity hardware wave was just breaking on the HPC shore. Jump ahead and we all see that many of the proprietary sand castles were washed away by that wave. There may be a similar argument for software as well. It is well known that writing good parallel HPC software is hard. Indeed, if you have read anything I have written, you will recall that I toss at night thinking about these issues and multi-core just makes it all the more difficult. I recall hearing about ISV’s who chose not to “parallelize their codes” because it is too hard or expensive or both. (The too hard part means they can’t find people and the too expensive part means if they find the people it is going to take some time.) And, of course they want to make it as flexible as possible so that when the next hardware architecture arrives, they are not re-writing code again. This last goal is extremely difficult to achieve, by the way.
My favorite part of Bill’s discussion is the following: Part of the solution is to emphasize commodity software. That is, software that is written to an agreed upon standard. Applications that use commodity software can pick and choose their software platform in much the same way that commodity hardware makes it possible to pick and choose the hardware platform. But there is danger here too. If we insist on the current set of standards, we stifle innovation and prevent the development of better standards.
Commodity software is not necessarily a new idea, but it is not “just open software.” Commodity software is designed to run on a variety of commodity hardware platforms. As Bill points out, this is tricky because you need to balance innovation and standards. One of Bill’s progeny, MPICH, is a good example of this idea. This software was designed to bring the MPI standard (as defined by the MPI Forum) to as many hardware platforms as possible. Before clusters were everywhere, there were quite a few parallel machines supported by MPICH. As a matter of fact, when it was developed, the “P4 transport layer” used by clusters was almost an afterthought to placate those same people stringing together workstations with things like PVM.
We can learn about commodity software by looking at MPICH and its successor MPICH2. Of course the “open” factor played a big role as the community helped add robustness to each package either through suggesting code, bug reports, and usage cases. And like any good commodity implementation, there were other packages that implemented the same standard and yet offered a different implementation and feature set. I have test scripts that can run LAM/MPI, MPICH, MPICH2, and Open MPI by simply applying a few sed scripts to a Makefile. (It should be noted that both MPICH and LAM/MPI are in maintenance mode and you should be using MPICH2, or Open MPI, or other newer MPI versions for your applications.)
Commercial vendors may not like the idea of commodity software because it is hard to lock-in customers. I would counter that the trend in HPC is toward commodity software and away from lock-in. The argument is not one of philosophy but of practicality. I believe that without community support it will become cost prohibitive to offer some packages in the HPC space. Currently, the cost of some software applications outweighs the cost of hardware by a large margin. Users may begin to look at the cost of commercial software versus the cost contribute to a commodity/community software project that offers similar functionality. And, because they are helping develop the application there is the potential to meet their needs better than a commercial product. Those vendors that chose not to port their applications to the HPC space may find the market has moved past them in support of community software. There is still a commercial angle for commodity/community software, however. Every single production cluster team I have worked with knows the value of application support and will gladly pay for it. At the end of the day, everyone needs results and delivering results is not a commodity process.