What He Said

Another voice of concern over parallel programming. Plus a wild idea.

Recently someone sent me a link to a blog called The Perils Of Parallel. The subtitle has a nice message: Comments related to the book I’m writing: “The End of Computing (As We Knew It)”. It’s about the potential black hole of explicit parallelism into which the industry is heading. Yes, I shouted. An ally in my parallel programming malaise. As I looked at the blog author, I immediately recognized the name — Greg Pfister. He is the author of a great book called In Search of Clusters, 2/E. This book was first published in 1997, but is still relevant today. In particular, his discussion of software on clusters and SMP machines makes a very convincing argument that although both are “parallel” there is very little overlap with program design. Essentially, they are two very different computer architectures.

The one blog entry that caught my eye was 101 Parallel Languages Part 2. In this entry, Pfister makes an observation about parallel languages in the HPC sector:

So they’ve got the motivation, they’ve got the skills, they’ve got the tools, they’ve had the time, and they’ve made it work. What programming methods do they use, after banging on the problem since the late 1960s? No parallel languages, to a very good approximation.

Before all the MPI and OpenMP fanboys jump out of their seats, let me say I agree with this quote. OK, now you can jump out of your seats. If you think about it, MPI is pretty much the predominant method used in HPC. And, MPI is an API (Application Programming Interface) not a language. For the purposes of sending messages and converting existing codes to parallel, MPI works quite well and I applaud those who have helped with these efforts. I do think, however, we need to be looking beyond MPI.

In other blog entrees, Pfister also talks about the “killer parallel application” — that is, those that are “embarrassingly parallel.” Such an application has not surfaced yet and we are all waiting for the Lotus 123 of parallel computing. I have some thoughts around this type of applications, but I’m not sure it will be embarrassingly parallel.

In the past, I have mentioned that Artificial Intelligence (AI) has potential in parallel computing. AI problems are hard and some require large amounts of computing resources. Please note, although AI has gone though its ups and downs like any other over hyped technology, it has been making steady progress. And, AI means different things to different people. AI is being applied in many places and solving fuzzy problems that do not lend themselves to strict or formal analysis. Indeed, AI systems often need to search large solution spaces or perform many repetitive asks which is a natural fit for parallel computation.

One interesting aspect of AI is that after an AI method gets absorbed into to the mainstream, it becomes, well, “mainstream.” As an example, Bayesian statistical analysis started as an AI method, and now it is an important component in email spam filtering. Most people don’t refer their spam filters as intelligent or possessing some kind of “AI.” They just work. In my opinion, another application that is very sophisticated and “intelligent” is a good compiler. The analysis and optimization methods that are done by today’s compilers would probably be called magic thirty years ago. Today they are standard and often taken for granted. When you type make you unleash a very sophisticated series of events that would be difficult for your average human programmer to understand. Like quantum mechanics, I'm not sure any one really understands Makefiles in any case, some of which are highly parallel (see the -j option)

An other area that has shown promise on clusters is Genetic Algorithms (GA). While some may not consider GA's strict AI, I think it is close enough. GA's are best used to solve difficult search and optimization problems. While the answers may be approximate, the ability to solve really difficult problems with brute force computing makes them attractive. And, GA's are naturally parallel.

Let's recap. We considered Pfister's well thought out discussion on the difficulties of parallel programming. From there I mentioned AI, then compilers, and finally GA's. Do you get where I'm going with this?

What if a high level programing description language was developed. Note I did not say programming language. This description language would allow you to "describe" what you needed to do and not how to do it (as discussed before). This draft description would then be presented to an AI based clarifier which would examine the description, look for inconsistencies or missing information and work with the programmer to create a formal description of the problem. At that point the description is turned over to a really smart compiler that could target a particular hardware platform and produce the needed optimized binaries. Perhaps a GA could be thrown in to help optimize everything.

This process sounds like it would take a lot of computing resources. Guess what? We have that.

Why not throw a cluster at this problem. Maybe it would take a week to create a binary, but it would be cluster time and not your time. There would be no edit/make/run cycle because the description tells the compiler what the program has to do. The minutia (or opportunities for bugs) of programming whether it be serial or parallel would be handled by the compiler. Talk about a killer application.

Comments on "What He Said"


Many years ago we at Floating-Point Systems had a more than serious flirtation with functional programming languages, as you have sometimes promoted as a solution to this problem. We defined a functional programming language called “Flo”, developed a compiler and parallel hardware to support that language and declared success. But it never sold (for many reasons, I suppose) so we promptly abandoned the idea in favor of more successful ideas. I’m sure we were neither the first nor the last to do so.

Ever since then I have pondered how we solve the problem of managing parallel complexity in large scale programs. Much of the difficulty lies in the nature of specifying what constitutes a correct program. If the specification is too high level, you may get what you asked for but not what you want. But the more detail you supply, the more it becomes like a programming language with all its associated problems and less like a correctness specification.

About the most promising approach I have seen so far is the algorithmic skeleton. My first exposure to that was Murray Cole’s book “Algorithmic Skeletons”. Extending the idea to parallel skeletons is not a far stretch and seems to strike a good balance between high-level specification and efficiency. I’m sure such work is already under way. High level operators are already supported in MPI in the form of reduce, allreduce and other functions. They’re ugly syntactically because they are MPI, but the concepts are there.

While declaring the ultimate demise of computing due to high levels of parallelism does make for good entertainment, all I would really expect to see is the price of computation fall as parallel resources increase. When that happens we may not have single applications that can span significant fractions of the available compute power, say a million cores in a cluster. But virtually free compute power won’t be left idle, either. It will instead be used for lower valued tasks.

Douglas Pase, PhD


Minor comment: You said that the modern compiler analysis and optimization would probably be called magic thirty years ago. I’ve been working in compilers for thirty years, and perhaps there are aspects of compilers where that’s true (perhaps for very high level, post-object-oriented languages), but in the optimization, parallelization, vectorization space, there has been at best incremental improvements. Data-flow analysis, loop parallelization, vectorization, all date back to the 1960s and 1970s. PRE goes back to 1978. The more recent widespread use of static single assignment (SSA) gives a convenient representation for analysis, but doesn’t give fundamentally better results than data-flow. The only significant improvement that I can think of is whole program analysis, which was unrealistic 30 years ago because system memory wasn’t big enough to hold the results of such analysis. Now it is. For what it’s worth.

-Michael Wolfe (Hi Dr. Pase)


Wow. That’s a excellent little vignette and a great suggestion for the killer application.

That said, I suppose that I fear change. My mind was honed pouring over assembly programmes, optimising them for size and speed.

If I could shave a few instructions, I might take another block (half K) off the image size, a significant consideration when the entire storage was one 35 MB washing machine.

While the “real programme” has been obscured since creation of the first compiler, I find the idea of “programming” in “PDL” to be truly alien, and a bit freakish.

I’d draw a parallel to modern electronics (especially digital) in which there is no fixed, discrete design.

While ‘twould be truly miraculous to be able to ask my computer to create an ad hoc application based upon my whim of the moment, a part of me will always long for the source listing.

We will have lost something important. We will have also gained more than we lost, but there will still be the ache of amputation.

Ah well, the tyrannosaur was likely no fan of progress.


Russ Bixby, geek of the plains

“What’s the bandwidth of localhost?”


Am I missing something? We have the killer app: weather analysis and other similar things like visual portrayals of Finite Element Analysis and DiffEq runs.

Not to mention my kid’s game machines. 8+1 core Cell processors aren’t going to cut it for very long, and my last perusal of the Stanford Folding@Home READMEs hinted that the graphics processors were scaling core count through the roof and that they were quite happily making use of them.

Another prediction: just as home video became more real and hi-res cameras are now $449 commodities, the Linux and FreeBSD clusters that do animation and architectural rendering will scale to the deskside within the next 5 years.

Sure, these are not going to sell gazillions of PCs or MPPs in the next two weeks, but we will see an explosion of visualization-enabled reality simulations that will start showing up next to desktops and gaming joysticks (hopefully) near you Real Soon Now.

I think back to my first bare board 8086 with 2kx16 of RAM and a similar amount of UV EPROM, and I start to giggle insanely when I realize that the 2GB USB key I toss in my pocket has more CPU juice than it did.

None of which is meant to disagree with anything said. I think the most cogent point you both make is that money people have to see value in the same way that they grokked and then bought spreadsheets on PCs. Having a great analysis won’t save or make money until some bright bulb figures out what to do with it besides play.

– Don Wilde
no egg on my mortarboard, but I do work
for a rather large commodity PC maker


Thanks for the kind words, Douglas. It’s good to find a kindred spirit. Everyone one else seems to either be ignoring the issue, bypassing it with virtualization, or trying to profit from it by selling lessons in how to parallel program (not a bad personal solution).

I do say that what everybody uses is MPI (mainly) and OpenMP (some). I just don’t consider those languages. As you say, they’re APIs.

One thing I do want to bring up: When I say “killer app” I think you understand but need to emphasize that it must be a client app, on the grounds that only clients have the volumes to justify the fabs. (See my “Clarifying the Black Hole” post.) There are many parallel server apps; no problem there. (See “IT Departments Should NOT fear Multicore” and elsewhere), But server volumes aren’t even visible next to clients.

Right now, it looks like those volumes are going to come from cell phones and other mobile applications, coming up from the bottom like PCs rose to meet workstations, with everybody stopping in the same place. I think I find this depressing. But a 32-way single-chip client is something I find, well, closer to terrifying.

- Greg Pfister

Thanks-a-mundo for the blog post.Really looking forward to read more. Great.

Greetings! This is my 1st comment here so I just wanted to give a quick shout out and say I genuinely enjoy reading through your blog posts. Can you suggest any other blogs/websites/forums that go over the same subjects? Thank you so much!

This design is wicked! You obviously know how to keep a reader entertained. Between your wit and your videos, I was almost moved to start my own blog (well, almost…HaHa!) Great job. I really loved what you had to say, and more than that, how you presented it. Too cool!

Really appreciate you sharing this blog.Thanks Again. Really Great.

Mi experiencia con el SAT INDESIT fue una primera ocasión en la que en periodo de garantía el aparato dejó de funcionar, ni tan siquiera encendía el testigo de funcionamiento, y cuando vino el técnico, con gran vergüenza por mi parte, me mostró cómo se había desenchufado. El técnico SAT oficial me dijo que me tenia que cobrar el desplazamiento mas 1/2 hora que era lo mínimo, y lo dijo como con apuro, pero entiendo que tenia que hacerlo.

Leave a Reply