Preserving program state is necessary for parallel computation, but should we keep doing it the hard way?
Before I begin yet another discussion of “the ways we don’t know how to describe many things happening at the same time,” I feel obligated to point out the most non-optimal thing I have seen in a while. By “non-optimal” I really mean a politically correct way of saying stupid. I have told my daughter, never tell anyone they are being stupid, just use the phrase non-optimal, chances are, they will not understand it, in which case your intuition was right.
Getting back to today’s non-optimal tidbit. Even with spam filters, I usually get a handful of unwanted email each day. Today I have seen a bunch of emails slipping through with the most hilarious From: and Subject: lines. The messages it seems are from United Parcel Service or as I like to say UPS. The subject line reads FedEx Tracking N5421062126 or as I like to say FedEx. Maybe I did not get the memo, but I don't think UPS is delivering FedEx packages. There is a total impedance mismatch of the sender and the subject. This contrivance shows the spammer has either a basic misunderstanding of the current world-wide package delivery system or is just plain non-optimal (or both). And, I'm not even going to mention the contents of the email that references the postal package and October 18st. Of course, I assume the non-optimality will continue as Jane and Joe SixPack click and add yet another piece of crude to their computer.
As I was deleting this and other spam, I was watching a video about a new programming concept called Swarm. When I first read the description one thing jumped out at me; Swarm embodies the maxim "move the computation, not the data". I thought, "Yes, now this is a good idea - fluid dynamic computation." About mid-way through the presentation, Swarm designer Ian Clarke mentioned that he was using Scala 2.8 because it had something called continuations. I must admit I don't know anything about Scala, but Ian's description of continuations made some sense.
Continuations remind me of the C Blocks which I recently discussed in a column about Apple's Grand Central Dispatch. The basic idea is to put a bookmark in your computation so you can come back to it later, send it off to be executed elsewhere, or just wait.
The idea with Swarm is that the computation can be moved to the data (from computer to computer) because continuations allow you to capture or freeze program state and run it at another time or place. I have talked about program state and how it is the bane of parallel computing in a previous installment. Anything that allows program state migration always piques my interest, but I find it non-optimal to try and capture state in procedural or imperative languages. Capturing state in procedural languages is a delicate procedure. In addition to preserving the state of the execution stack, keeping track of memory you have touched and will touch is not easy.
Managing state is the difficult issue with parallel computation. That is, in a procedural language, the programmer must make sure each parallel part of the code does not have side effects -- changing (or not changing) memory values that are used by other parts of the code. If you program using MPI, you are managing state at a very low level where you are explicitly copying data or state from one machine to another and making sure each parallel procedure is independent. This approach works quite well for many of the big HPC codes, but can be a tedious and time consuming programming effort.
One possible pathway to reduce the complexity of parallel programing is the use of declarative languages. Pure declarative languages do not have state -- or at least the programmer does not have to worry about it. It makes for a very different way to program, but also opens up a bunch of possibilities for parallel execution because the user is decoupled from managing state or program execution. For this reason, I have advocated looking at functional languages like Erlang or Haskell. I also want to be clear, these languages are not the whole solution, but they offer a fresh non-procedural approach to concurrent programming.
If you have been paying attention, a few columns ago, I made a positive mentioned of C Blocks, which are a way to capture state in the grand-daddy of procedural languages. Indeed, I think C Blocks are a good idea because I consider C to be the universal assembly language. Plus, there is existing code that may benefit from C Blocks. What I think is a non-optimal idea is trying to capture state in many of the new age procedural languages. Of course, that is my opinion and I am sure some may disagree.
I'll stop at this point as I'm sure I stirred up enough issues for one column. I still have not reached Best Buy Manager Status on Twitter, but then again maybe if I posted something I might get more followers. I was going to tweet from the NVidia GPU programming conference, but I came down with a cold and really did not feel like doing much of anything. Plus, I'm not sure if tweeting about non-optimal issues like placement of power outlets in airports or people in the back of the airplane who stand up and grab their bag as soon as the seat belt light goes off is worthwhile. In any case, I'm off to pick up my FedEx postal package at the UPS office.
Douglas Eadline is the Senior HPC Editor for Linux Magazine.