Introduced less than three years ago, the rapid growth of the CUDA software model is no accident
In my previous column, I discussed the issue of multi-core (CPU) and streaming-core (GPU) software. As hardware vendors create and promote solutions for their platforms, I noted that there is a “balkanization” of software tools. There are some great comments as well.
I also mentioned several “vertical” solutions that included MPI, OpenMP, CUDA, and OpenCL. Another new language called Ct will be available real soon from Intel (new in the sense that a beta version will be available). With all these options, the one that seems to have the most “buzz” is CUDA. I often wonder about language adoption and I am constantly amazed at how fast CUDA has gained ground in the HPC space.
From what I have seen, the “buzz” around CUDA (Compute Unified Device Architecture) is more than marketing hype. People are using it for real work, i.e. it is past the curiosity stage. NVidia recently released some statistics related to CUDA usage:
- 2700+ CUDA-related citations on Google Scholar
- 800+ CUDA-related videos on YouTube
- 670+ submissions to CUDA Zone by CUDA community
- 350+ registrants for current CUDA Superhero Challenge
- 300+ Universities Teaching the CUDA Parallel Programming Model
In addition, I attended the NVidia GPU Technology Conference this past fall and it was full of CUDA programmers attending workshops, building relationships, and learning as much as they could about this new programming method. New is an understatement. CUDA 1.0 was released in June of 2007, that is less than three years ago.
The rapid uptake of CUDA applications by programmers was no accident. NVidia did an excellent job making CUDA accessible and promoting the toolkit. There are other less obvious factors that, in my opinion, made CUDA a rapid success. For those looking to duplicate the success of CUDA, it may be worth considering the following points.
The CUDA Toolkit is Free as In Beer
I am a big believer in “free as in speech software,” particularly for HPC. While CUDA is not open source, it is freely available at no cost and no run-time fees. Indeed, there is no need to register before you can get a copy of the CUDA toolkit. Anyone can go to the CUDA download page and click on the version you need. The “non-registration” aspect cannot be over emphasized. The traditional marketing model would demand that users be tracked so they can be “sold to” at some point in the future. This approach is a big deterrent for many users who would normally devote some time to a new technology. In addition, if you give it away, and create resources like the CUDA Zone the successful users often “phone home” with their applications.
CUDA Lightly Touches Standard C
I am convinced that any new program model that can work within the “C Universe” has a good chance of success. Of course in HPC, the “Fortran Universe” is also important and I’ll speak to that aspect in a moment. The CUDA designers quite wisely did not invent a completely new language. They intentionally designed a minimal set of extensions to standard C so that users could experiment in a familiar environment.
Incremental Application Improvement
I have talked to people who experimented with CUDA and their experience always involved taking small incremental steps. The integration with standard C allows users to take an existing C program and add CUDA code incrementally. Progressing in this fashion users can usually see some benefit for a small amount of work. i.e. the user does not have to rip their program apart and rebuild it to see if CUDA and an NVidia GPU will provide some speed-up.
Minimal or Zero Hardware Cost of Entry
Getting people to try anything new is difficult. If there is new hardware required to “see if it works for me,” then many uses may find it difficult to justify betting money on a new idea. In the case of GP-GPUs, many users already had a CUDA capable video card from NVidia (or they could easy get access to one). Thus, it was relatively easy to “try a few things and see if works for me.” If it did work, then investing in much more powerful hardware made sense. Indeed, the nature of CUDA thread based processing allows for highly scalable execution. When a minimal hardware cost is combined with a low software modification cost the carrot of a 10X (or more) performance increase is hard to resist.
NVidia Supports and Promotes CUDA Successes
Taken together, the above factors probably would have pushed CUDA to the forefront of GP-GPU programming. The support from NVidia cannot be understated and has in all likelihood accelerated the uptake. I consider the growth of MPI on clusters similar to CUDA and NVidia GPUs. MPI shares many of the features mentioned above, except it had no single company pushing the model in a big way (read that as a concerted marketing effort). NVidia did a good job of getting the word out and shining light on the success stories. For example, there is no MPI Center of Excellence Program as there is for CUDA. And, there should be.
The CUDA universe is not done expanding. I recall speaking with Portland Group (PGI) two years ago about their CUDA based directives for Fortran. The effort actually started independently of NVidia and was a pleasant surprise for NVidia. I suspect that if the barrier to entry were higher (in terms of money or time) PGI may not have jumped in so quickly. PGI and NVidia have since teamed up to offer PGI CUDA Fortran which is well worth considering if you want to run Fortran codes on NVidia GPUs. PGI has a 15 day trial offer that will let you play a bit with the tools.
The success of CUDA should serve as a model for other vendors. Never underestimate the “low cost of playing” with any new technology. There is an obvious pay-off for NVidia. When the new Fermi arrives, the applications and purchase orders will be waiting. If you build and promote open and free playgrounds the developers will come.