I’m heading out to the NVidia GPU Technology Conference this week. Notice that the title does not state whether it is a Video or HPC conference and as a matter of fact it will cover both. This concept fascinates me. Dual use hardware is how the whole cluster thing got started. One difference back then was the pioneers were told, “You can’t do that.” Good thing for us they did not listen. Today, there are many HPC products designed for the cluster market. The reality is, however, as technology gets more expensive to produce the only way to make a profit is to sell large quantities. Designed dual use is a great solution to this problem. I am not talking about pulling things from other markets as it was done in the early days, but rather, products that are specifically designed for two markets — in this case HPC and Video processing.
NVidia is a great model for this approach. Not that long ago, they added some HPC features to their products and created software tools to help people use these new features. They also made sure their products were screaming fast for the video market as well. I image a possible conversation that went something like this,
You know, if we add a little gizmo here and here the HPC people could use these.
Will it hurt the video performance?
Can we sell more product?
Then do let’s do it.
And thus, the GP-GPU (General Purpose Graphical Processing Unit) was born.
One aspect of “designed dual use” with which designers must grapple is feature balance. If the HPC guys had their way, the GP-GPU would be a full blown data parallel computer on a chip. The market does not justify the cost and thus they need to negotiate chip real estate with the video side of things. In a way, designed dual use HPC will never have all the features users want, but most of what we want is better than none of what we want.
From a market perspective, the balancing act goes a bit deeper. The growth of cluster HPC was due in part to the ability of servers, Ethernet, and MPI to deliver pretty good performance to most people. This is a key point that is often missed. Let’s do a little math. Suppose there are 100 HPC users in the world and each year they spend $1 million dollars buying HPC hardware. Let’s further assume that for a certain period of time the market moves along with a handful of companies competing to sell large and fast big iron (million dollar) systems.
Along comes a technology (clusters) that can deliver 80% of the big iron performance for $100,000. Next assume, that 60% of the market is fine taking this reduced performance because it saves money. The traditional HPC ($100 million) market has been reduced to a $40 million market. The term that is is often used in this case is “disruptive technology”. Note that better features or performance did not take away market from the incumbents, it was just the opposite. It was the “good enough crowd” — those that could live without the best.
When I look at the GP-GPU market, I see some similarities. Yes it is not the best for all cases, but it may address a large portion of the market that can live with less features and lower cost. That is, a better price to performance ratio (i.e. dollars per TFLOP).
This trend is why I am watching the GP-GPU thing very carefully. And, like clusters, there is almost no barrier to trying. In the early days, you could cobble together a cluster and see what kind of “ball park” performance you could achieve. With GP-GPU computing, you probably already have a video card that can run CUDA applications. Unlike clusters, however, where MPI programs could be literally recompiled and executed, there is some reprogramming into CUDA required before you can run your codes on NVidia hardware. The good news is you can incrementally add “CUDA-ness” to your program, that is a complete restructuring is not needed. In terms of cost, it is basically your time. And, if it works for your application set, then you may be looking at quite a performance bump.
As I still must pack my suitcase and finish some emails, I’ll end here for today. I’m sure I’ll have some things to report back from the conference next week, so stay tuned. Besides, my sidekick and fellow HPC scrivener Jeffrey Layton will be there. We will be rooting around for all things HPC. And speaking for myself, I promise we will do a “good enough job” and deliver to 60% of the readers. The rest of you, well we will cover that some other time.
Update: Due to some scheduling issues, this installment did not get posted until after I returned from the GPU Technology Conference. As I mentioned, I wrote it just before I jumped on the airplane heading to the conference. I think the announcements and the event attendance support my point. As a matter of fact, I was genuinely surprised and delighted to hear the features of the next generation Fermi GPU. Here are some of the key points; support for ECC memory, 512 cores, 8x double precision performance increase, concurrent (CUDA) kernels, support for C++, and more. In terms of HPC, these could be game changing features. I’ll have more next week. For now, here is your homework, grab the Fermi White Paper and study the new features so next week when I jump into my HPC is at a cross-roads rant you will be ready.