Threads Happen

Borrowing WiFi, wrapping up vacation, and seeking enlightenment through Erlang.

Through the miracle of broadband, and a neighbors wide-open wireless router, I’m still hanging out at the beach. By the time you read this, I’ll be heading back to the stately Eadline manor, but not to mow the stately lawn or pick up two weeks worth of newspapers in my stately driveway. I’m actually heading back to attend a series of lectures by the Dalai Lama. Really, I am. He is in town teaching this week at Lehigh University. Enlightenment is not far off my friends. Of course I would not mentioned this if it did not have something to do with parallel computing. Hopefully, I’ll have the link figured out by the end of the column.

Before, I dive into this weeks topic, a sidebar or two. Take a minute to check out this recent post about Erlang. It seems Erlang has not gone un-noticed by Amazon, Facebook, and others. (Or, perhaps my previous columns were more influential than I thought — wink, wink.) Also, for those that are wondering, Erlang is named after A. K. Erlang a Danish mathematician, statistician and engineer. Some also suggest that Erlang is an abbreviation of Ericsson Language, owing to its origin to Ericsson. Convenient in any case.

Getting back to my discussion of Erlang proper, this week I would like to talk about processes and threads. To be a bit more clear, I would like to talk about dynamic methods in parallel computing and how Erlang processes and CUDA threads are very similar in a unknown kind-of way. (I mentioned CUDA several weeks ago.) Both are independent portions of a program that can be executed in parallel. Most importantly, however, they don’t have to be executed in parallel. And that my friends is the dynamic aspect of these two programming methods. The end user is not responsible for where and in some sense when the threads/processes are executed. One advantage to this method is that codes can run on one core or a multitude of cores. Of course, if designed well, the program should run faster (scale) as the number cores is increased. (Note: CUDA does allow tuning the number of threads per block and the number of blocks making up the grid.)

It should be mentioned, that Erlang processes are neither operating system processes or operating system threads, but lightweight program controlled processes. Often referred to as green threads Erlang processes are executed “behind the curtain” as it were. That is, the user has no control of process execution, however the user can easily create a large number of processes. For example, on a 2.4 Ghz Celeron desktop with 512MB of memory, 20,000 Erlang processes were spawned in 9.2 microseconds (wall time). [From Programming Erlang by Joe Armstrong.]

CUDA threads are entirely different. A discussion of the CUDA programming model is beyond the scope of this column, however, CUDA threads are lightweight, can number in the thousands, and are controlled by a run-time environment. The way in which threads are executed might be exactly the same as Erlang. Then again it might be completely different. I do not know, and I do not want to know. That is the point. I’ll leave those decisions to the language, which by the way understands the program concurrency better than the operating system.

Giving up control is the path to enlightenment. When I write in C, I don’t tell the processor which registers to use or how to allocate memory. The compiler does that for me. When I want to write a program that runs on hundreds or even thousands of processors, I don’t want to assign cores or nodes to parts of my program. Besides being non-portable, it can be time consuming and a smart compiler or run-time system could probably do a better job. In both Erlang and CUDA, concurrency is expressed as part of the language. Decisions about parallel execution of the concurrent parts are done dynamically at run-time. Personally, I would prefer to have concurrency implied as part of the program, but I don’t think we are there quite yet.

I have written about this issue previously. In a series of three articles (one, two, and three), I discuss some ideas about dynamic parallel execution. I begin by considering the reliability of large systems. (i.e. at some point MTBF statistics will limit the reliability). My solution is dynamic program execution that can scale automatically and recover from failure. Of course, dynamic execution requires giving up the control we think we need. Expressing what what we want to do and not how to do it is what liberates us from low level hardware issues.

And what does this have to do with the Dalai Lama. Nothing. That is my final point. The Dalai Lama has nothing to do with how Erlang and CUDA processes/threads are executed and neither should you. In a very Buddhist kind of way when you move to a higher level threads happen. Namasté.

Comments on "Threads Happen"


Hmm…very interesting. Threads issues out of the court? It’s definitely a two thumbs up.


This is more of a question than a comment. If the Erlang threads/processes are neither operating system processes nor operating system threads, does that mean that they all live inside a single operating system processes? If so, how do you take advantage of multiple cores?


Uh, CUDA threads have nothing to do with Erlang threads, except for perhaps the name and the idea of parallelism. CUDA threads are basically an abstraction for what looks like an old-school vector processor like the Cray 1. If you think “I’m programming a much smaller, cheaper, lower-power version of the Cray 1,” you’ll probably write very efficient CUDA code. If you try to write a CUDA program with the idea of Erlang threads (message-passing with a message queue), you probably won’t get very far.


mdavis0452: think of “Erlang thread” as a job. You can multiplex N jobs onto M operating system threads in which N >= M.


mdavis0452: in my reply to your comment above, note that the runtime does the work of multiplexing jobs to threads, rather than you. So replace “[y]ou” with “the runtime job scheduler.” I’m not sure how Erlang’s runtime does it, but plenty of other systems work this way: you put jobs on a work queue, and the runtime pulls off jobs from the queue and assigns them to processors.


You made some decent points there. I looked on the internet for the subject matter and found most people will agree with your site.


Pretty section of content. I just stumbled upon your blog and in accession capital to assert that I get in fact enjoyed account your blog posts. Any way I will be subscribing to your feeds and even I achievement you access consistently fast.


Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>