Drawing Conclusions: The Promise of GP-GPU Computing

Setting expectations in a fast moving and fast computing market sector.

The buzz has been loud at times. Almost sounds to good to be true. Use your video card for HPC and get a 10, or maybe even 50 times, speed up of your application. Those kind of comments get my attention. Initially there were some skeptics, but the results keep coming. And, the results were not from some academic lab with some esoteric application. The following is a list of some of the areas where General Purpose Graphical Processing Units (GP-GPUs) have make inroads

  • Climate research
  • Computational chemistry and biology
  • Financial analysis
  • Genetic research
  • Oil and gas exploration
  • Risk assessment
  • seismic processing
  • Signal processing

There is talk of a TFLOP or more from a single GPU. Have we opened up a new and lucrative vein in the HPC gold mine? Perhaps.

Turning Down The Buzz

Before I bring out my buckets of cold water, let me state that there is something very interesting going on here. In true commodity cluster fashion, there is larger market (video cards) driving a technology that can be used by the HPC community. Vendors have recognized that there is another way to use GPUs and have created products designed specifically for the HPC market. I find it telling that both NVidia and AMD/ATI sell GPU based products with no video connector.

There are a few things to keep in mind when evaluating GP-GPU technology. First, and foremost, do not get too caught up in the FLOPS rating. The headline numbers are for single precision FLOPS. And, these are the theoretical peak numbers. If your application can live with single precision (SP) arithmetic (and many can) then GPU computing is worth a serious look. Due to the GPU design, the double precision (DP) precision can be one quarter to one tenth the single precision numbers. It is a function of how much double precision hardware can be placed on the GPU. Of course the HPC market wants more DP and the video market wants more SP, so there is a balance. At least the HPC market is getting some attention.

A second point to consider is the nature of your problem. Video cards get there astounding performance from doing things in parallel. The have large numbers of processors that are used for high performance graphics. In the graphics world these are called shaders or stream processors and in the past were designed specifically for graphics processing. Modern video chips use more general “thread processors” or “stream cores” as the basis for video chips.

The general purpose nature of these processors is why a GPU is often referred to as a GP-GPU (GP = General Purpose). The processors are programmable and can be used for things other than rendering. In a typical GP-GPU there may be up to 400 individual cores on a single GPU. In general, the GP-GPU card has its own memory where program and data are stored. Of course the program still starts on the host processor, but data and program information is transferred to video card to improve execution speed. These transfers are often transparent to the user, however.

In parallel computing parlance, this type of computing is referred to as a parallel SIMD (Single Instruction and Multiple Data). A SIMD program work by having a group parallel processors execute the same instruction but use different data. One can easily see how this relates to graphic processing. In HPC, there are many problems that can be solved with a SIMD architecture. There are, however, other HPC applications that require a Multiple Instructions on Multiple Data (MIMD)architecture. The large performance gains often reported for GP-GPUs are for single precision SIMD applications. If your application fits in this category, GP-GPU may be a big win for your HPC needs. If you don’t fit into this category, do not despair, you to may have a chance to pull some cycles out of your video card — it may take a little more effort and it may be just as rewarding. I’ll talk a little bit more abut this in a moment, but first let’s take a look at what is available.

NVidia And CUDA

When one thinks of GP-GPU, NVidia is the first company that comes to mind. They offer a combined hardware and software solution for HPC users. All NVidia graphics hardware above the GeForce 8xxx series have support for GP-GPU computing and support their CUDA programming language (more on CUDA below). NVidia have segmented their product offerings in to three categories, however. The most common is the GeForce line that designed for media computing. The Tesla line is designed for the HPC audience and the Quadra line is for high end graphics systems. We will focus on the Tesla series for this article, but keep in mind many people start out prototyping on existing video cards. The following is the description of a Tesla C1060 PCIe processing card.

Tesla GPUs 1
Streaming Processor Cores 240
Frequency of processor cores 1.3GHz
Single Precision floating point performance (peak) 933
Double Precision floating point performance (peak) 78
Floating Point Precision IEEE 754 single & double
Total Dedicated Memory 4GB GDDR3
Memory Speed 800MHz
Memory Interface 512-bit
Memory Bandwidth 102GB/sec
Max Power Consumption 200 W peak, 160 W typical
System Interface PCIe x16
Number of PCIe Slots 2

Table One: NVidia 1060 Specification

The Tesla 10XX series is the first to support DP arithmetic. If you are really in need of performance you can put four of the C1060′s in a single PC case or buy one from one of the many vendors who are selling such system. NVidia also offers the C1070 which is essentially four of the C1060 Boards in a 1U rack-mount case. The C1070 must be connected to a host server.

Comments on "Drawing Conclusions: The Promise of GP-GPU Computing"

prentice

In the sentences

“NVidia has also committed to supporting the recent OpenGL standard.”

and

“AMD/ATI has committed to supporting the new OpenGL standard as well.”

Did you mean to say “OpenCL” instead.

Reply
fabianmejia

This article is really interesting.
In a particular project we took the power of the GPU to perform simple transformations and results were encouraging. Main CPU was released from that weight and it was free to perform some other tasks in our application.

Errata:
“The buzz has been loud at times. Almost sounds TOO good to be true.”

http://fabianmejia.blogspot.com

Reply
tink

The only problem I have with GPU computing is the thought of tying yourself not only to one vendor, but to one range of their products.

And since someone else did ERRATA:
“Video cards get THEIR astounding performance from doing things in parallel.”

Reply
buggsy2

Well, if errata about typos are allowed…no, I won’t start. There are more than a dozen in two web pages.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>