Incremental Twiddling

As GPU Clusters hit the market, users are finding small code changes can result in big rewards.

I have a theory. It goes something like this. At one point there was the primal program. Let’s call it hello.c it does something very simple, it prints Hello World. My theory states that virtually every program is a variation of this program. From the BASIC 10 PRINT "Hello World!" or Fortran PROGRAM HELLO WRITE (*,100) STOP 100 FORMAT (' Hello World! ' /) END to the classic main() { puts("Hello World!"); return 0; }, most beginners started adding code to these simple examples.

Don’t believe me? Maybe you used some modern wimpy language. I’m sure you started with a simple example program. Let’s do a survey. How many of you started programming with a one or two line program. Raise you hands. Keep your hands up. How about the rest of you. How many of you took an simple existing program and modified it as the basis of a new program. I see lot’s of hands. Some have both hands raised. Okay, you can put hands down now.

Building big things through small manageable changes is a worthwhile method. With regard to programming I call it incremental twiddling. Take something that works, make a small (reversible) change, make sure it still works the way you expect it to, change some more, repeat. This method is akin to only changing one variable at a time, or as like to call it turning one knob at a time. There is also the programming though massive change which works like as follows. Make massive changes. Test the program. Because it does not work, start ripping out the new parts until it works at some level, then add back in the new parts until it breaks again. I often refer to this as the massive deconstruction.

These are two very different methodologies and there is plenty of middle ground. One is built-up in small increments, the other is build-up, tear-down, in bigger chunks. The small increment method is probably how you learned to code. The massive change is probably what you tried once or twice when you thought you could code. By the way, sometimes the massive change method is unavoidable, but it does make for some rather late nights.

So what is my point besides some kind of slow and steady wins the programming race lesson. Well, it is also my belief that new software technologies that fit into the incremental change model rather than the massive change model have the best chance for success. For example, C++ was built from C so that you could start with existing code. Note, I make no judgment as to the value of the new method, just that it is easier to get started. MPI is an other example. While MPI programming can become complex, because it is a C,C++, Fortran library, users can start adding simple calls without disrupting the programming too much. I should also note, that with MPI programing there are times when the code has to be massively restructured and incremental twiddling is not really an option.

A historical example of how this works with new technology was the INMOS Transputer — a parallel and blazingly fast chip. When it first appeared the only available programming language supplied by INMOS was OCCAM. Now I’m all into high level parallel languages, but when the Transputer came out, there was no C or Fortran compiler. Not until some guys who names I have long forgotten from a company called Logical Systems came out with a C compiler and a pre-MPI communications library developed at Pixar (yes, that Pixar) were developers able to twiddle around with some things. Most found that once they could get something working without a large time investment, value could be quantified and decisions about continuing with the technology could be made. I have made similar observations about the clusters. The
low cost to entry and the ability to twiddle with existing MPI codes allowed for trying a new technology. By the way, the transputer died because the next generation processor never quite worked.

All this background is merely and introduction to what I really wanted to mention, but I tend to get nostalgic for the old days. Now then, to the point. As you know, GP-GPUs are receiving a lot of attention, as they should. There have piles of cores in those chips. Both NVidia and ATI/AMD have offerings in this area. However, most people have had more exposure to NVidia than with ATI/AMD. Call it marketing or just a really good idea, but NVidia has lowered the cost/barrier to entry through the introduction of CUDA. For those that don’t know, CUDA is a free (as in beer) C based parallel computing language that produces code for NVidia GP-GPU units. As with the other cases I have mentioned, CUDA allows one to incrementally twiddle with existing C codes on NVidia hardware. Quite often users state that they can easily try a few things, then based on their results, see how well GP-GPU processing works for them. Many find th
at as they continue to twiddle they see more benefit. The key here is low-cost incremental twiddling. ATI/AMD has been supporting the BrookGPU programming language as well, but there does not seem to be as much inertia behind BrookGPU as there is behind CUDA.

The low barrier for incremental twiddling also allows for rapid deployment. CUDA was introduced only two years ago and is now taught at over 125 Universities worldwide. Recently NVidia announced the availability of preconfigured, GPU Clusters — that is paring a 1U GPU co-processing unit with each node in the cluster. If your application fits into this model, you can expect to use less space and power in your data centers and spend less money for the same level of computing. For example, Amerada Hess found that a 32 NVidia S1070 node cluster provides the same computing power as 2000 CPU Servers, using 27 times less power and costing 20 times less.

Of course your mileage may vary, local and state regulations may apply, limited time offer, and all the other notices that suggest that you do some twiddling of your own before you commit to a new computing platform. In closing, I have skirted one important issue. The incremental twiddling approach has worked well in the single core world. Imperative languages like C and Fortran work similarly enough to languages like Java, Python, etc. that learning through twiddling (i.e. they all have a “do loop”) allows for easy uptake of new languages. The problem, as I see it, is that a truly high level parallel language, probably declarative in nature, will not afford twiddling in a imperative fashion and thus, makes adoption difficult. And, there begins another column.

PS I mentioned last time that I’m on Twitter. I also realized I suck at twittering or tweeting. For those who are following my every tweet, I’ll try and keep you in the loop as it where.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62