Setting expectations in a fast moving and fast computing market sector.
The buzz has been loud at times. Almost sounds to good to be true. Use your video card for HPC and get a 10, or maybe even 50 times, speed up of your application. Those kind of comments get my attention. Initially there were some skeptics, but the results keep coming. And, the results were not from some academic lab with some esoteric application. The following is a list of some of the areas where General Purpose Graphical Processing Units (GP-GPUs) have make inroads
- Climate research
- Computational chemistry and biology
- Financial analysis
- Genetic research
- Oil and gas exploration
- Risk assessment
- seismic processing
- Signal processing
There is talk of a TFLOP or more from a single GPU. Have we opened up a new and lucrative vein in the HPC gold mine? Perhaps.
Turning Down The Buzz
Before I bring out my buckets of cold water, let me state that there is something very interesting going on here. In true commodity cluster fashion, there is larger market (video cards) driving a technology that can be used by the HPC community. Vendors have recognized that there is another way to use GPUs and have created products designed specifically for the HPC market. I find it telling that both NVidia and AMD/ATI sell GPU based products with no video connector.
There are a few things to keep in mind when evaluating GP-GPU technology. First, and foremost, do not get too caught up in the FLOPS rating. The headline numbers are for single precision FLOPS. And, these are the theoretical peak numbers. If your application can live with single precision (SP) arithmetic (and many can) then GPU computing is worth a serious look. Due to the GPU design, the double precision (DP) precision can be one quarter to one tenth the single precision numbers. It is a function of how much double precision hardware can be placed on the GPU. Of course the HPC market wants more DP and the video market wants more SP, so there is a balance. At least the HPC market is getting some attention.
A second point to consider is the nature of your problem. Video cards get there astounding performance from doing things in parallel. The have large numbers of processors that are used for high performance graphics. In the graphics world these are called shaders or stream processors and in the past were designed specifically for graphics processing. Modern video chips use more general “thread processors” or “stream cores” as the basis for video chips.
The general purpose nature of these processors is why a GPU is often referred to as a GP-GPU (GP = General Purpose). The processors are programmable and can be used for things other than rendering. In a typical GP-GPU there may be up to 400 individual cores on a single GPU. In general, the GP-GPU card has its own memory where program and data are stored. Of course the program still starts on the host processor, but data and program information is transferred to video card to improve execution speed. These transfers are often transparent to the user, however.
In parallel computing parlance, this type of computing is referred to as a parallel SIMD (Single Instruction and Multiple Data). A SIMD program work by having a group parallel processors execute the same instruction but use different data. One can easily see how this relates to graphic processing. In HPC, there are many problems that can be solved with a SIMD architecture. There are, however, other HPC applications that require a Multiple Instructions on Multiple Data (MIMD)architecture. The large performance gains often reported for GP-GPUs are for single precision SIMD applications. If your application fits in this category, GP-GPU may be a big win for your HPC needs. If you don’t fit into this category, do not despair, you to may have a chance to pull some cycles out of your video card — it may take a little more effort and it may be just as rewarding. I’ll talk a little bit more abut this in a moment, but first let’s take a look at what is available.
NVidia And CUDA
When one thinks of GP-GPU, NVidia is the first company that comes to mind. They offer a combined hardware and software solution for HPC users. All NVidia graphics hardware above the GeForce 8xxx series have support for GP-GPU computing and support their CUDA programming language (more on CUDA below). NVidia have segmented their product offerings in to three categories, however. The most common is the GeForce line that designed for media computing. The Tesla line is designed for the HPC audience and the Quadra line is for high end graphics systems. We will focus on the Tesla series for this article, but keep in mind many people start out prototyping on existing video cards. The following is the description of a Tesla C1060 PCIe processing card.
|Streaming Processor Cores
|Frequency of processor cores
|Single Precision floating point performance (peak)
|Double Precision floating point performance (peak)
|Floating Point Precision
||IEEE 754 single & double
|Total Dedicated Memory
|Max Power Consumption
||200 W peak, 160 W typical
|Number of PCIe Slots
Table One: NVidia 1060 Specification
The Tesla 10XX series is the first to support DP arithmetic. If you are really in need of performance you can put four of the C1060′s in a single PC case or buy one from one of the many vendors who are selling such system. NVidia also offers the C1070 which is essentially four of the C1060 Boards in a 1U rack-mount case. The C1070 must be connected to a host server.