Pick one: portability or efficiency. Neither is guaranteed when writing explicit parallel code
In case you did not have a chance to read the column from last week, I am taking my yearly vacation at the Jersey Shore. Please refrain from the jokes, lest I pull out the Bruce Springsteen trump card. I try to spend two continuous weeks with family and friends each year. I have found that one week is just too short. I need two weeks. The first week is used to try and forget about all the stuff I did not get done before I threw the laptop in the car and say “let’s go.” The second week is used to try and remember and organize all the stuff I have to do when I get back. My plan usually breaks down somewhere around 10 AM my first day back to work.
This year I had a bunch of writing to do (including this column), so it was kind of a working vacation. Not to worry, I’ll have my feet in the Atlantic Ocean in few hours. In any case, my dilemma is as follows. Write an insightful column quickly and get to the beach. It may surprise some readers, but I do like to research some of the topics I write about. At a minimum, I like to include enough URLs so that if you actually want to investigate a topic further, more information is just a click away. As an aside, I am constantly amazed at how much content on the web has absolutely no external links to supporting material. I thought that was the whole idea. I mean how hard is it to add a Wikipedia link to a discussion of Clos Networks or some other networking technology.
Back to my dilemma. What can talk about that will get me to beach before the water ice guy packs up for day? Although, I don’t to like to rehash things I have written about in the past, I will be making a an exception this week. Not necessarily because it is easy, but because I think some messages need reinforcing. Therefore, all I have to decide is what message I should I hammer home on this July morning.
The answer is simple — understanding the difference between concurrent and parallel. I believe these two terms are often used interchangeably while, in my opinion, they are represent two different concepts.
Let’s start with concurrency. A concurrent program or algorithm is one where operations can occur at the same time. For instance, a simple integration, where numbers are summed over an interval. The interval can be broken into many concurrent sums of smaller sub-intervals. As I like to say, concurrency is a property of the program. Parallel execution is when the concurrent parts are executed at the same time on separate processors. The distinction is subtle, but important. And, parallel execution is a property of the machine, not the program.
If execution efficiency is important (i.e. you want things to go faster by adding more cores), then the question you need to ask is “If I run everything that is concurrent in parallel, will my code run faster?” If the answer were “yes” then we would not be having this discussion. And, since the answer, is “no”, then the question is “What should run in parallel?” which is obviously, the portions of code that lower execution time.
This decision is one of the reasons cluster parallel computing is hard. It really does depend on the machine. Take our integration case. If the integration interval is small, then breaking it up into small sub-intervals and sending them out to other nodes will result in extending the execution time of the program due to parallel overhead. If the integration interval is huge, then parallel execution may make sense. Because parallel overhead can vary from cluster to cluster, there is no easy way to predict overhead beforehand. (i.e. The parallel overhead is larger for GigE vs InfiniBand when sending small packets.)
The same applies to multi-core. The overhead for thread communication is lower, but there is still overhead (see my HPC Hopscotch for background on SMP memory). There is no free lunch — everyone has to deal with overhead.
In summary, the point I want to make is this, Concurrency is a property of the program and parallel execution is a property of the machine. What concurrent parts should and should not be executed in parallel can only be answered when the exact hardware is known. Which I might like to add leads to the most unhappy conclusion when dealing with explicit parallel programming, There is no guarantee of both efficiency and portability with explicit parallel programs. Yes, I know, a sad state of affairs. I’ll let you wrestle with that for a while, in the mean time, I’m going to the beach.