dcsimg

Strange Names for Strange Days

Explicit parallel programming presents special challenges for software developers. Now a new group of languages are coming online to address the compounded problem of multi-core processors on high-performance clusters.

In the near future, you may start to hear about programming languages with some strange sounding names. Of course, it’s almost a requirement to have a unique and slightly esoteric name for a programming language. The new group of languages will be a bit different, however, as their strangeness will continue well beyond their name.

Of course, the interest in new languages, or in existing but obscure languages, is due to (ready say it with me now) multi-core. Yes, the thing that keeps me awake a night. That carnival of cores going on right now inside your computer that will make HPC mavens of us all.

In case you missed the previews, here’s the story so far. An HPC cluster is parallel computing. A multi-core processor is parallel computing. A cluster of multi-core systems is parallel computing squared. Explicit parallel programming is hard. Hard takes time. Time costs money. Now square the result. (See HPC Guru Don Becker: Why MPI Is Inadequate.)

In the good old days, the cost to create a program was based on a single processor model. It has worked well, for the most part. And as the processors got faster, so did your code. This speed-up is sometimes referred to as a “free lunch” but I don’t see it that way. To me, if I use a programming language, I expect it to keep me above the nuances of individual processors. I mean, who would really want to recode every time a new processor was released?

Back in the day, when every vendor’s computer had its own instruction set this was not how things worked. Portability quickly became important, and languages like Fortran were implemented as a way to preserve your programming efforts across current and future hardware environments. One of the underlying assumptions and expectations of using Fortran was that a faster machine will result in faster execution time of my program.

If you have been living in the single-core world, then the expectation of a “new processor means faster execution” is not guaranteed with multi-core and in most cases, is just not going to happen. Welcome to what keeps me up at night.

In the HPC world, the solution has been to augment existing codes so that they can take advantage of multiple processors. Things like PVM and MPI are used to send messages between systems and pthreads and OpenMP are used to manage cores. These solutions work, and they are expensive in terms of programming time (and finding people is another major issue).

The real solution is to provide a programming language (a way to express your problem) that can deliver on the expectation of “new processor means faster execution”. Enter the strange sounding programming languages.

There are languages like, ZPL, Erlang, Fortress, and Haskel, to name only a few. (See MPI-2: The Future of Message Passing.)

These languages, and those like them, will have a much larger role to play than in the past. The strangeness of the language is not really in the name, but rather in the awkward way of expressing your problem or program. The emphasis seems to be more on what you are trying to do than rather on how to do it. These types of languages are often called declarative because you “declare” what you want to do, but don’t really describe how you want it done. This is different than traditional procedural languages like C or Fortran.

The difference can be illustrated by thinking about repairing an automobile. If you are a procedural programmer you fix the car yourself, making sure the exact procedures/steps are followed. If you are a declarative programmer, you tell the repair shop to “just fix the car.” You don’t care about the procedure or steps involved in fixing the car, you just want it fixed.

The same is true for programming languages. Procedural languages all have some assumptions that stem from a single (Von Neumann) processor chugging away inside the computer. The programmer is responsible for managing the execution of the program. Declarative languages attempt to move away from this assumption and lift the programmer above the hardware and closer to their problem space. Decisions about parallel execution normally left to the programmer in a procedural language can now be hidden from the programmer. The cost of this abstraction is often, but not necessarily, performance and efficiency. It should be mentioned that not all declarative languages operate in parallel, but many of them can be modified to work in this way.

In addition, the declarative programming style is often difficult for many “procedural programmers” to grasp and sometimes places restrictions on what can be done, but once a problem/program is cast in a declarative framework all the parallel stuff can be done behind the scenes. Indeed, programs can even adapt to various hardware environments at run-time.

So if these languages are so great, why are they not used in HPC? For the most part, HPC is about getting the most out of the hardware. And, as I described, procedural languages are closer to the hardware (specifically closer to a single processor or core). This relationship has worked reasonably well until the “new processor means faster execution” assumption was revoked by multi-core. The complexity and cost to procedurally describe a multi-core and multi-node program is getting rather high. The declarative approach may turn out to be the better bargain in the end. We live in the days of robust and powerful hardware and yet the better it becomes, the harder it seems to be to use. Strange days indeed.

Comments on "Strange Names for Strange Days"

yoof

There was a time when we designed in FORTRAN, but then coded the tricky bits (that needed to be efficient) in assembler macros. I wonder if now we should design with something like LISP (with OO and implicit parallelization) but code the tricky bits in C?

Reply
nimityssj

I’ve been pushing parallel languages for a while, and it’s good to see a piece on it. I’m surprised the author didn’t mention MIT’s Cilk, which adds like 3 statements to the C lang. and the runtime parallelizes from there. The Nesl language, from CMU, has been used on numerous clusters, and supposedly can reduce code size significantly: on their quicksort it was “10 lines of code vs. 1700″, compared to C++/MPI. Executed effeciently, too. I believe that if we are to deal with the complexity of parallel hardware, we will need better tools, like these, to do it.

Reply
nimityssj

TO YOOF:

That was an excellent idea, and I have been asking the Erlang and Nesl people about this. I figure that if the critical, serial routines could be coded in C/C++ (i prefer C++), then incorporated into the parallel lang. code like how Java binds native code, then we could have the best of both worlds. Well, close enough.

Is there anyone out there who develops with these languages or dev.’s the lang.’s themselves? Would like your input on this, but with Erlang (my favorite for HA systems), there are some limitations to this approach. Perhaps it will be addressed later. Might be better off just modifying a high-performance Common LISP system, like AllegroCL. HPC LISP (*LISP) was one of the first languages for this stuff, used on the Thinking Machines Corp.’s 64,000 processor systems.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>