Drawing Conclusions: The Promise of GP-GPU Computing

Setting expectations in a fast moving and fast computing market sector.

As impressive as the NVidia hardware is, much of the success of NVidia GP-GPU computing can be attributed to the CUDA programming language. CUDA (which originally stood for Compute Unified Device Architecture) as a programming environment designed for GP-GPU computing. It is based on the C programming language and provides a high level abstraction layer so the end user does not have to worry about managing threads across the hardware. In particular CUDA is a well thought out programming language based on C. It does not support recursion due to memory limitations on the video card, but it does present and easy learning curve for most C programmers. In addition, there are plenty of pre-existing libraries (as source code) available to programmers. Some of these are listed below:

  • Parallel bitonic sort
  • Matrix multiplication
  • Matrix transpose
  • Performance profiling using timers
  • Image convolution
  • 1D DWT using Haar wavelet
  • CUDA BLAS and FFT library usage examples
  • CPU-GPU C- and C++-code integration

  • Binomial Option Pricing
  • Black-Scholes Option Pricing
  • Monte-Carlo Option Pricing

The Cuda toolkit also includes optimized FFT and BLAS libraries. NVidia has also committed to supporting the recent OpenGL standard.

AMD/ATI

While many of the GP-GPU headlines have been about NVidia based system, AMD/ATI has not been sitting idle. The new AMD FireStream 9270 is expected to launch in the first quarter of 2009 and boasts over a TFLOP of single precision performance. Double precision is expected to be 240 GFLOPS. While the peak performance numbers seem impressive, remember your application may vary and the only true test is your code. The 9720 is available as a PCIe card. There is no rack-mount solution at this time. The following table summarizes the soon to be released FireStream processor from AMD/ATI.

Number of GPUs 1
Stream Cores 800
Peak compute rate 1.2 TFLOPS (single), 240 GFLOPS (double)
Floating point formats IEEE single & double precision
GPU local memory 2GB GDDR5 SDRAM
Memory interface 256-bit @ 850 MHz
Peak memory bandwidth 108.8 GB/s
System interface PCIe x16 Gen 2
Power consumption 160 watts typical, <220 watts peak
Number of PCIe Slots 2

Table Two: AMD/ATI 9720 Specification

Unlike NVidia, AMD/ATI did not develop their own programming model. Instead they have adopted and enhanced the
BrookGPU stream processor language. The current AMD/ATI version is called Brook+, Like CUDA it is based on the C language. A good overview of BrookGPU can be found here. It is also worth mentioning that the original Brook project supports both ATI and NVidia video cards (Form the web page: ATI 9700 (R300) and above and Nvidia 5200 (NV3X) and above should work). AMD/ATI also makes available GP-GPU library functions like BLAS, SGEMM and DGEMM. For those interested in historical completeness, ATI had developed an API called Close To Metal a low-level programming interface for GP-GPU computing. The project was short lived, however. AMD/ATI has committed to supporting the new OpenGL standard as well.

Intel

One cannot talk about HPC without considering Intel. In terms of GP-GPU processing, Intel has taken a different approach than NVidia and AMD/ATI. Intel has announced the Larrabee processor that consists of 32 X86 (Pentium, P54C) processor cores (later version are expected to include 48 cores). Larrabee is expected to compete with GP-GPU products from NVIDIA and AMD/ATI and use the x86 instruction set with Larrabee-specific extensions. Some interesting aspects of the Larrabee include, cache coherency across all its cores and the absence of specialized graphics hardware. However, recent comments by Intel suggest that Larrabee will not be positioned as a GP-GPU/HPC solution.

Programming Models Galore

There are a number of ways to program GP-GPU computers. Unfortunately, until recently there has not one method that was portable across all hardware. Also keep in mind, the programming models are designed to work on a single motherboard/GP-GPU systems. None of these are distributed programming languages as they all work within the confines of a single compute node (with exception of the PGI products, see below). Cluster mavens considering GP-GPU programming may find it helpful to look at Comparison of MPI, OpenMP, and Stream Processing for a comparison of various programming methodologies. While a full discussion of the various programing languages and methodologies would constitute another article (or two) the following table should serve as brief snapshot of the current state of affairs. Be aware that GP-GPU computing is a rapidly moving market. Indeed, the recent OpenCL spec took less than 6 months to become a standard. OpenCL (not to be confused with OpenGL) is a low level specification for GP-GPUs and multi-core CPUs.

OpenCL Standard API for GP-GPU and multi-core. Large amount of vendor support. Powerful, but low level
CUDA Very nice programming model. Abstracts way thread management from the user. Similar to C, but no recursion. Fortran and C++ support is coming. Works on NVidia hardware only. Free binary available, no registration required.
BrookGPU The Stanford University Graphics group’s compiler and run-time implementation of the Brook stream programming language. Still considered “beta” (although it has been around for a while). Supports both NVidia and AMD/ATI hardware. BSD license
Brook+ Available form AMD/ATI, based on Stanford University Graphics group’s compiler and run-time implementation of the Brook stream programming language. Based on C, but no recursion. AMD/ATI hardware only.
Intel Ct Currently a research language. Based on C/C++. designed to support hundreds to thousands of hardware threads. Supports Intel hardware.
Rapid Mind Portable API for multi-core, GP-GPU and cell processor. Commercial product for C++, non-standard.
PGI Partial support for automatic use of CUDA primitives by compilers. Maybe attractive solution for existing codes.
MOG Currently a research project, but it allows MIMD On GP-GPUs (MOG)

Table Three: Programming Tools for GP-GPU hardware

Some Final Words

The most interesting thing bout GP-GPU processing is what I call the low barrier of experimentation. Many of the early successes in GP-GPU computing came about because someone became curious about the processing power that might exist in their video card. They set out on their own, at home or in their office and ported a portion of their code to CUDA or BrookGPU. After what they described as minimal effort, they started to see speed-ups of 5-10 times (maybe more). Convinced they were on to something, they did more porting and in some cases managed to achieve remarkable speed-ups. At this point, they were able to convince their colleagues that GPU processing actually worked.

All of this sounds very familiar to how cluster HPC came to into the market. Like clusters, the cost to get in the game is minimal. For example, there are over 70 million CUDA enabled GPUs sitting in workstations out there. If you don’t have one, a the cost of a basic GeForce video card was less than $100. As for the software, it is freely available. NVidia and AMD/ATU, quite wisely, makes the tools available at no cost (and with no registration hassles). It is essentially the same cluster recipe, a low (or no) cost of entry, a possible big pay-off, and some spare time.
It is no wonder people are “playing” with this new technology.

If you are serious about GP-PU computing (i.e. you want to use it in a production environment), then consider the NVidia Tesla or AMD/ATI FireStream lines of hardware. Although the same GP-GPU may be used in less expensive video cards, the HPC optimized solutions are more than clever re-packaging. The HPC versions are designed for continuous operation which requires higher quality parts and attention to cooling. While a typical video card can reach the processing levels of an HPC application, most video processing happen in bursts. There are times when the GP-GPU is waiting for user input — like when you are about to yank that guy out of a car and start your crime spree.

The HPC enabled GP-GPUs are making some serious in-roads. There seems to be some healthy competition and plenty of success stories in the market. A low barrier to entry plus a spectrum of development tools make this a possible addition to your HPC arsenal.

Douglas Eadline is the Senior HPC Editor for Linux Magazine.

Comments on "Drawing Conclusions: The Promise of GP-GPU Computing"

prentice

In the sentences

“NVidia has also committed to supporting the recent OpenGL standard.”

and

“AMD/ATI has committed to supporting the new OpenGL standard as well.”

Did you mean to say “OpenCL” instead.

Reply
fabianmejia

This article is really interesting.
In a particular project we took the power of the GPU to perform simple transformations and results were encouraging. Main CPU was released from that weight and it was free to perform some other tasks in our application.

Errata:
“The buzz has been loud at times. Almost sounds TOO good to be true.”

http://fabianmejia.blogspot.com

Reply
tink

The only problem I have with GPU computing is the thought of tying yourself not only to one vendor, but to one range of their products.

And since someone else did ERRATA:
“Video cards get THEIR astounding performance from doing things in parallel.”

Reply
buggsy2

Well, if errata about typos are allowed…no, I won’t start. There are more than a dozen in two web pages.

Reply

you to may have a chance to pull some cycles out of your video card — it may take a little more effort and it may be just as rewarding. I’ll talk a little bit more abut this in a moment, but first let’s take a look at what is available.travel dog beds

Reply

The only problem I have with GPU computing is the thought of tying yourself not only to one vendor, but to one range of their products.Love quotes

Reply

They allow you to take a static chart and turn it into a powerful analytic tool that can be interesting to thousands (or millions!) of different people.epoxy

Reply

Success is not final, failure is not fatal: it is the courage to continue that counts.

Reply

Thanks for sharing such a good information about this topic over here nice post. I just Thanks for sharing such a good information about this topic over here nice post. Ditra Matting

Reply

Everyone nowadays seem to go to extremes to either drive home their viewpoint or suggest that everybody else in the globe is wrong. seo company

Reply

Getting PayPal account in Pakistan is easy .We provide Verified PayPal account in Pakistan with your own name and e-mail address.Business Consultants

Reply

You are able to find many kinds of discussions that you can follow at this time in accordance with your wishes. There are many types of forums that can make you a smarter person again. tolsom skin-care

Reply

Great stuff here. The information and the detail were just perfect. I think that your perspective is deep, its just well thought out and really fantastic to see someone who knows how to put these thoughts down so well. Great job on this laura ashley tiles

Reply

Greetings! I’ve been reading your web site for a long time now and finally got the bravery to go ahead and give you a shout out from Porter Texas! Just wanted to tell you keep up the good work!

Also visit my site: florida accountant

Reply

Very useful post and I think it is rather easy to see from the other comments as well that this post is well written and useful. I bookmarked this blog a while ago because of the useful content and I am never being disappointed. Keep up the good work. indonesia plywood

Reply

I will really appreciate the writer’s selection for choosing this great article relevant to my affair.Here is deep characterization about the article issue which supported me more. dissertation help I’ve seen progression in every post. Your newer posts are simply wonderful compared to your posts in the past. Keep up the good work.

Reply

Ι think this іs among the such a lot important info for me.
Anԁ i’m happy reading your article. But wanna observation on some common things, The web site taste is great, the articles is in point of fact excellent : D. Good task, cheers

Reply

was actually excited enough to leave a leave a responsea response ;-) I actually do have a few questions for you if it’s okay. Could it be just me or does it appear like some of the comments come across like written by brain dead individuals? :-P And, if you are posting on additional social sites, I enjoy some of content in the post.. please keep it up.. i

Reply

not going to say what everyone else has already said, but I do want to comment on your knowledge of the topic. Youre truly well-informed. I cant believe how much of this

Reply

Really your post is really very good and I appreciate it. It’s hard to sort the good from the bad sometimes, but I think you’ve nailed it. You write very well which is amazing. I really impressed by your post. I like this shop. Im continually obtaining new issues that I want, or need, or

Reply

sometimes, but I think you’ve nailed it. You write very well which is amazing. I really impressed by your post. I like this shop. Im continually obtaining new issues that I want, or need

Reply

Great point regarding the growth of computer use and the general conception of computing in today day and age, I did see a very interesting article about this phenomenon here at allens woodworking guide here Precision.com its a must read for all computing enthusiast.

Reply

come back at some point. I want to encourage you to ultimately continue your great posts, have a nice weekend! I am satisfied to locate numerous useful information here in the post. Thank you for sharing….

Reply

information and the aspect were just wonderful. I think that your viewpoint is deep, it’s just well thought out and truly incredible to see someone who knows how to put these thoughts so well. Good job!

Reply

Leave a Reply to Niayee Cancel reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>