Small HPC

Will multi-core split HPC into two programming camps? Which one will you join?

Later this week I am heading off to my 30 year college reunion (yes I am that old). I attended Juniata College in central Pennsylvania. Juniata is one of the small liberal arts colleges for which Pennsylvania and the northeast are well known. I recall at one point in my freshman year an upper classman telling me “You will never pass Organic Chemistry. It is were they weed out the pre-med students. Just forget it.”

As I was in my year of easy impression, I became suitable scared. Although, I had one thing in my favor. I was not pre-med. I wanted to learn chemistry. In any case, my sophomore year rolled around and I found myself sitting in “Organic” (as it was called) with a bunch of knee shaking pre-meds. As I learned organic chemistry, I found it not that hard and actually interesting. Good thing I did not take the advice of the upper classman about Organic or any other high level classes. As a matter of fact, to many pre-med students, I was somewhat of an anomaly because I passed organic chemistry, but never considered medical school. The things you learn in college.

I mention my past experiences with “opinions and advice”, because I have been reading comments about parallel computing on the web. I take discussion as a good sign, although, I judge by the comments I see on various sites that there are a bit more opinions than experience in what I read. For instance, parallel computing is way too hard for most people … or what’s the big deal, it rather simple …. Both of the comments are, in my opinion, far from the truth and don’t reflect actual experience. The web is a great repository for opinions of the unexperienced. I would caution those new to to HPC and parallel computing to be a bit skeptical and form your own opinions.

I have written my fair share of software (both sequential and parallel) and my opinion or experience goes like this. Writing good software is hard. Period. Writing good parallel software is harder still, but not impossible. Understanding the basics is essential in either case. The basics. You know the boring stuff, the “oh yea” stuff.

The advent of multi-core has added a bit of confusion to basic idea of parallel computing. A little history may help. For the most part, HPC parallel computing used to mean many processors, each with private memory, communicating with other processors. There were variations, and designs that looked like multi-core processors today, but they existed as discrete processors. The fundamental idea is that communication between processors happens by passing messages. When HPC clusters hit the scene the predominate design was a dual processor motherboard (single core processors), private memory, and some kind of interconnect (e.g. Ethernet or Myrinet).

Today the predominant HPC programming model is MPI. Recall that MPI is a programming API the allows multiple processes to communicate with one another. Mapping the processes to the processors was part of the spawning process. Early parallel computers assumed one process to each processor, but clusters changed that idea a bit. With clusters processors lived in “nodes” and in most cases each node had at least two single-core processors. It is possible to run an MPI job by using all the cores on a node, or some of the cores on the node, or even over subscribing the nodes.

MPI is a data copying protocol. It essentially, copies a chunk of memory from one MPI process to another. Each process has exclusive control of all its memory. i.e. no other process can touch it unless it sends a message. It is important to understand that when an 8-way MPI code is run on a 8-way multi-core node, memory is still copied from process to process through messages even though the transport mechanism may use shared memory. In a cluster, the transport mechanism between nodes is the interconnect (GigE, 10-Gig, InfiniBand). MPI programs can span, processors, nodes, and clusters. If you can send message, MPI can run. Notice I said run, but not run optimally, that is a bit trickier and is very application dependent, however it is under control of the programmer.

There is now a big interest in programming multi-core systems. For the non-HPC users these are multi-core workstations, servers, and even desktops. Writing a parallel program for these systems often uses a shared memory or threaded model. Because the number of these systems is increasing, one has to assume that multi-core programming tools will also expand. One of the more popular is tools is OpenMP (compiler directives for threaded based parallelism). One important distinction of shared memory models is they are designed for single memory domain machines (i.e. a single motherboard). As a result these codes do not work well on clusters unless you are using something that makes a bunch of nodes look like a large SMP system, for example see ScaleMP.

Given that the numbers of cores in a processor continues to grow (e.g the new six core processor from AMD) single memory domains (motherboards) may have anywhere between 12 and 32 cores in the near future. Here is an interesting scenario. Let’s assume that 12-32 cores systems become common place. If this is enough computing power for your tasks, then how will you approach HPC programming? Will you use MPI because you may want to scale the program to a cluster or will you use something like OpenMP or a new type of multi-core programming tool because it is easier or works better? Could a gulf in HPC programming develop?. Perhaps MPI will still be used for “big cluster HPC” and other methods may be used for “small motherboard HPC”. Of course MPI can always be used on small core counts, but will some point-and-click thread based tool attract more users because “MPI is too hard to program”.

If one assumes that cores counts will continue to increase, the 64 core workstation may not be that far off. Back in the day, a 64 processor cluster was something to behold. Many problems still do not scale beyond this limit. Could we see split in HPC? I don’t know, and don’t read to much in to my opinion because there are plenty of devils in the details. Just remember, no matter how you cut it, programming anything well takes diligence and hard work. And, don’t let the upperclassman tell you any different.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/ on line 62