The Total Cost of Parallelization

Before tucking into that next project, developers really need to start asking themselves, "What's this going to cost me?"

I have calmed down since my previous rant. Looking at the comments, it seems I have stepped on more than a few issues. While I can’t address all the comments, I will make a general statement. There were some suggestions that solutions already exist. And, indeed, I can’t argue with that; you can write parallel programs in shell if you so desire. The big issue is the abstraction layer.

As a developer, I don’t want to think about the number of cores or nodes any more than I have to think about the addressing modes, registers, and floating point units in the processor. If parallel computing is going to truly be cost effective, then reasonable abstract layers need to exist. Many developers and most vendors will acknowledge an urgency to achieve the goal of simplified parallel programming lest we face some dire consequences going forward. The solution I have proposed is a shared GPL approach. If anyone has a different idea, I would love to hear it.

Moving on.

What’s This Going to Cost Me?

In the not so distant future — or last week depending on your development schedule — you may be tasked with building an application for multi-core (i.e. parallel) processors. The first question you may ask yourself is what language should I use to code this project? A good question and like any good engineer you should ultimately be considering the cost required to meet the design goals.

Before we go much further, I want to introduce a metric that will become increasingly important as we journey though worlds parallel: “total-cost-of-parallelization” or TCOP for the time-constrained. This particular metric is similar to “total-cost-of-operation” (TCO) or “total-cost-to-solution” (TOS) and defines the overall cost any parallel application, including hardware, software, planning, development, and deployment.

Let’s refine our goals so that we can perform a small thought experiment. We want a scalable application that will capable of running on 4, 8, and eventually 32 cores. Not an unreasonable request given the current multi-core march.

Here is something that may not sit well with most programmers. If your application can scale (i.e. increase performance as you add cores), then single core performance is not all that important. Gasp. Head-shaking.

To explain my point, consider the following scenario. I have two versions of the same program that we’ll call Application X. The first version of Application X is written in C (I consider C the universal assembler language). The second version is written in a scalable language that we will call SL.

The big difference here is that the SL (scalable language) allows Application X to use extra cores if they are available. Because the SL abstraction layer has moved you, as a developer, away from the nitty-gritty details of the processor cores and closer to your application you should immediately see two benefits: 1) The program should be easier to write and 2) the source code footprint will be smaller. And, possibly, a third: Sooner time to deployment.

Now, for the purposes of our thought experiment, let’s further assume that the SL version of Application X is five times (5x) slower than the C version on a single core and scales at about 80%. Let’s look at what happens as we add cores:

Cores C
seconds
SL
seconds
1 10 50
4 10 15
8 10 8

Now the question becomes which is the faster application? Assuming the SL version continues to scale, the version written in C seems to be at a disadvantage.

Of course, one solution could be to make the C code parallel using OpenMP or pthreads. Fine by us, your funeral. And, while you are at it, track your time/cost to make this conversion. All of it. From when you start paging through the OpenMP documents and examples all the way to debugging and to scalability testing. My claim is that this cost is going to be non-trivial.

The High Cost of DIY

Now ask yourself. What gets more expensive every year? Right, people. Conversely what gets less expensive every year, Right again, hardware. With your engineer hat on, think it over. If you write your version of Application X using an SL (scalable language) from the beginning, your overall costs may end up being much less because you have eliminated a large amount of expensive “programmer” time. On the other hand, by sticking with the standard language, you have possibly programmed yourself into an expensive corner. Additionally, focusing too much on single core performance may also limit your options. Of course, you want the code to run as fast as possible on a single core, but we need to weigh that performance against the cost of creating an overall scalable application.

One could argue that implementing the C program in parallel from the start will provide the needed scalability. Nice thought, but you are just moving the work from the back-end to the front-end. Explicit parallel programming is expensive. To be fair, there is also the one time start-up cost for learning the SL that needs to be factored in as well. It all depends on the TCOP (Remember that term!) of your project. As the core count continues to grow, so will the cost of programming with your current and comfortable software tools.

Next Time

Last week there was a comment that inquired about the role of Erlang. We’ll tackle languages like Erlang and how they can impact your TCOP next. If you are well schooled in the procedural approach to programming you will probably throw a hissy fit.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62