Explicit parallel programming presents special challenges for software developers. Now a new group of languages are coming online to address the compounded problem of multi-core processors on high-performance clusters.
In the near future, you may start to hear about programming languages with some strange sounding names. Of course, it’s almost a requirement to have a unique and slightly esoteric name for a programming language. The new group of languages will be a bit different, however, as their strangeness will continue well beyond their name.
Of course, the interest in new languages, or in existing but obscure languages, is due to (ready say it with me now) multi-core. Yes, the thing that keeps me awake a night. That carnival of cores going on right now inside your computer that will make HPC mavens of us all.
In case you missed the previews, here’s the story so far. An HPC cluster is parallel computing. A multi-core processor is parallel computing. A cluster of multi-core systems is parallel computing squared. Explicit parallel programming is hard. Hard takes time. Time costs money. Now square the result. (See HPC Guru Don Becker: Why MPI Is Inadequate.)
In the good old days, the cost to create a program was based on a single processor model. It has worked well, for the most part. And as the processors got faster, so did your code. This speed-up is sometimes referred to as a “free lunch” but I don’t see it that way. To me, if I use a programming language, I expect it to keep me above the nuances of individual processors. I mean, who would really want to recode every time a new processor was released?
Back in the day, when every vendor’s computer had its own instruction set this was not how things worked. Portability quickly became important, and languages like Fortran were implemented as a way to preserve your programming efforts across current and future hardware environments. One of the underlying assumptions and expectations of using Fortran was that a faster machine will result in faster execution time of my program.
If you have been living in the single-core world, then the expectation of a “new processor means faster execution” is not guaranteed with multi-core and in most cases, is just not going to happen. Welcome to what keeps me up at night.
In the HPC world, the solution has been to augment existing codes so that they can take advantage of multiple processors. Things like PVM and MPI are used to send messages between systems and pthreads and OpenMP are used to manage cores. These solutions work, and they are expensive in terms of programming time (and finding people is another major issue).
The real solution is to provide a programming language (a way to express your problem) that can deliver on the expectation of “new processor means faster execution”. Enter the strange sounding programming languages.
There are languages like, ZPL, Erlang, Fortress, and Haskel, to name only a few. (See MPI-2: The Future of Message Passing.)
These languages, and those like them, will have a much larger role to play than in the past. The strangeness of the language is not really in the name, but rather in the awkward way of expressing your problem or program. The emphasis seems to be more on what you are trying to do than rather on how to do it. These types of languages are often called declarative because you “declare” what you want to do, but don’t really describe how you want it done. This is different than traditional procedural languages like C or Fortran.
The difference can be illustrated by thinking about repairing an automobile. If you are a procedural programmer you fix the car yourself, making sure the exact procedures/steps are followed. If you are a declarative programmer, you tell the repair shop to “just fix the car.” You don’t care about the procedure or steps involved in fixing the car, you just want it fixed.
The same is true for programming languages. Procedural languages all have some assumptions that stem from a single (Von Neumann) processor chugging away inside the computer. The programmer is responsible for managing the execution of the program. Declarative languages attempt to move away from this assumption and lift the programmer above the hardware and closer to their problem space. Decisions about parallel execution normally left to the programmer in a procedural language can now be hidden from the programmer. The cost of this abstraction is often, but not necessarily, performance and efficiency. It should be mentioned that not all declarative languages operate in parallel, but many of them can be modified to work in this way.
In addition, the declarative programming style is often difficult for many “procedural programmers” to grasp and sometimes places restrictions on what can be done, but once a problem/program is cast in a declarative framework all the parallel stuff can be done behind the scenes. Indeed, programs can even adapt to various hardware environments at run-time.
So if these languages are so great, why are they not used in HPC? For the most part, HPC is about getting the most out of the hardware. And, as I described, procedural languages are closer to the hardware (specifically closer to a single processor or core). This relationship has worked reasonably well until the “new processor means faster execution” assumption was revoked by multi-core. The complexity and cost to procedurally describe a multi-core and multi-node program is getting rather high. The declarative approach may turn out to be the better bargain in the end. We live in the days of robust and powerful hardware and yet the better it becomes, the harder it seems to be to use. Strange days indeed.
Douglas Eadline is the Senior HPC Editor for Linux Magazine.