Borrowing WiFi, wrapping up vacation, and seeking enlightenment through Erlang.
Through the miracle of broadband, and a neighbors wide-open wireless router, I’m still hanging out at the beach. By the time you read this, I’ll be heading back to the stately Eadline manor, but not to mow the stately lawn or pick up two weeks worth of newspapers in my stately driveway. I’m actually heading back to attend a series of lectures by the Dalai Lama. Really, I am. He is in town teaching this week at Lehigh University. Enlightenment is not far off my friends. Of course I would not mentioned this if it did not have something to do with parallel computing. Hopefully, I’ll have the link figured out by the end of the column.
Before, I dive into this weeks topic, a sidebar or two. Take a minute to check out this recent post about Erlang. It seems Erlang has not gone un-noticed by Amazon, Facebook, and others. (Or, perhaps my previous columns were more influential than I thought — wink, wink.) Also, for those that are wondering, Erlang is named after A. K. Erlang a Danish mathematician, statistician and engineer. Some also suggest that Erlang is an abbreviation of Ericsson Language, owing to its origin to Ericsson. Convenient in any case.
Getting back to my discussion of Erlang proper, this week I would like to talk about processes and threads. To be a bit more clear, I would like to talk about dynamic methods in parallel computing and how Erlang processes and CUDA threads are very similar in a unknown kind-of way. (I mentioned CUDA several weeks ago.) Both are independent portions of a program that can be executed in parallel. Most importantly, however, they don’t have to be executed in parallel. And that my friends is the dynamic aspect of these two programming methods. The end user is not responsible for where and in some sense when the threads/processes are executed. One advantage to this method is that codes can run on one core or a multitude of cores. Of course, if designed well, the program should run faster (scale) as the number cores is increased. (Note: CUDA does allow tuning the number of threads per block and the number of blocks making up the grid.)
It should be mentioned, that Erlang processes are neither operating system processes or operating system threads, but lightweight program controlled processes. Often referred to as green threads Erlang processes are executed “behind the curtain” as it were. That is, the user has no control of process execution, however the user can easily create a large number of processes. For example, on a 2.4 Ghz Celeron desktop with 512MB of memory, 20,000 Erlang processes were spawned in 9.2 microseconds (wall time). [From Programming Erlang by Joe Armstrong.]
CUDA threads are entirely different. A discussion of the CUDA programming model is beyond the scope of this column, however, CUDA threads are lightweight, can number in the thousands, and are controlled by a run-time environment. The way in which threads are executed might be exactly the same as Erlang. Then again it might be completely different. I do not know, and I do not want to know. That is the point. I’ll leave those decisions to the language, which by the way understands the program concurrency better than the operating system.
Giving up control is the path to enlightenment. When I write in C, I don’t tell the processor which registers to use or how to allocate memory. The compiler does that for me. When I want to write a program that runs on hundreds or even thousands of processors, I don’t want to assign cores or nodes to parts of my program. Besides being non-portable, it can be time consuming and a smart compiler or run-time system could probably do a better job. In both Erlang and CUDA, concurrency is expressed as part of the language. Decisions about parallel execution of the concurrent parts are done dynamically at run-time. Personally, I would prefer to have concurrency implied as part of the program, but I don’t think we are there quite yet.
I have written about this issue previously. In a series of three articles (one, two, and three), I discuss some ideas about dynamic parallel execution. I begin by considering the reliability of large systems. (i.e. at some point MTBF statistics will limit the reliability). My solution is dynamic program execution that can scale automatically and recover from failure. Of course, dynamic execution requires giving up the control we think we need. Expressing what what we want to do and not how to do it is what liberates us from low level hardware issues.
And what does this have to do with the Dalai Lama. Nothing. That is my final point. The Dalai Lama has nothing to do with how Erlang and CUDA processes/threads are executed and neither should you. In a very Buddhist kind of way when you move to a higher level threads happen. Namasté.
Douglas Eadline is the Senior HPC Editor for Linux Magazine.