nVidia CUDA may be the rage, but OpenCL is a standard that has some features you may need.
In a previous column, I bemoaned the state of HPC Software. This column was actually a prelude to my column on nVidia CUDA computing. I was particularly impressed at how fast CUDA has gained traction in HPC and other areas. The CUDA wave has definitely hit the beach and I’ll have more on nVidia as the Fermi GPU begins to filter into the HPC trenches. In this column I want to talk about the other GPU language: OpenCL.
Before I launch into OpenCL background, I want make a prediction. I believe OpenCL will gain acceptance in much the same way nVidia CUDA has. Like CUDA, OpenCL has a freely available SDK (Software Development Kit), is based on the C language, and can be explored using low cost video hardware. OpenCL brings two other features to the table, however. These are open standard compliance and support for data-parallelism (GP-GPU) and task-parallelism (CPU) methods. I’ll take a closer look at these below, but first some background will be helpful.
Currently the GP-GPU competition is between AMD/ATI and nVidia. HPC at IBM was making some inroads with Cell, but has decided to switch to an OpenCL platform and presumably use AMD/ATI hardware. In addition, the much discussed Intel Larrabee never really made it out of the gate and now we are left with two main contenders, both of which have a strong desktop market to support development and production costs.
Historically, AMD/ATI supported the BrookGPU model and included an enhanced version in their SDK as Brook+. While Brook+ allowed for GP-GPU computing, the entire industry realized that some form of standard was required. Thus in June 2008, The Khronos Group and a rather impressive group of companies launched the OpenCL Working Group in an effort to create a standard for GPU/CPU programming (The Khronos Group is a member-funded consortium focused on the creation of royalty-free open standards. In addition to OpenCL, they also maintain the OpenCL graphics standard). In part due to the contributions from Apple Computer, the OpenCL 1.0 standard was ratified in December of 2008 just 6 months after the Working Group was formed. The list of participating organizations includes 3DLABS, Activision Blizzard, AMD, Apple, ARM, Broadcom, Codeplay, Electronic Arts, Ericsson, Freescale, Fujitsu, GE, Graphic Remedy, HI, IBM, Intel, Imagination Technologies, Los Alamos National Laboratory, Motorola, Movidia, Nokia, NVIDIA, Petapath, QNX, Qualcomm, RapidMind, Samsung, Seaweed, S3, ST Microelectronics, Takumi, Texas Instruments, Toshiba and Vivante. In other words, this is a serious effort.
As a supporter of OpenCL, AMD has recently released the ATI Stream SDK v2.01 for both Linux (RHEL 5.3, Ubuntu 9.10, openSUSE 11.0) and Windows (XP, Vista, and 7). In terms of Linux, the software team at AMD/ATI have made some efforts to integrate OpenCL with the current open tool chain. For example, it is now possible to use gdb to debug OpenCL kernels (In OpenCL a kernel is the basic unit of executable code and can be thought of as a C function that runs on the GPU or a multi-core CPU.) There is also a Stream KernelAnalyzer that is currently available only for Windows. As with CUDA, OpenCL can be used on existing AMD/ATI video cards, but it is always good to check the system requirements to be sure.
At this point, you may be wondering “What about CUDA and nVidia?” If you read the list of companies involved with the OpenCL specification you should also note that nVidia is part of the Working Group. nVidia has been very vocal about the their support any programming language that allows you to program their GPUs and they offer their own version of OpenCL (for their hardware).
As stated, the fact that OpenCL is standard weighs heavily in determining its future. Having the support of pretty much the entire computer/video hardware industry helps a bit as well. From an ISV (Independent Software Developer) standpoint, OpenCL is is the gateway to hybrid (CPU/GPU) computing. As anyone with scar tissue in the HPC industry can tell you, investing resources and time into non-standard APIs (Applications Programing Interfaces) is a risky business. MPI was developed for similar reasons (i.e. programmers did not want to recode every time a new parallel computer architecture hit the server room).
One final feature of OpenCL should not be overlooked. As mentioned, OpenCL supports data-parallelism and task-parallelism. In the hybrid computing world, there is currently an implied assumption that the GPU is a slave to the CPU, that is the GPU cannot run on its own as it must have a CPU present. Given this assumption, one should be able to write OpenCL programs that can adapt to the hardware environment and run minimally on a single CPU (core). Of course it will run slower, but it will still run. If more cores or GPUs are found in a different hardware environment, then an OpenCL program should be able to adapt to the new hardware at run-time. The rather distasteful alternative is separate binaries for various combinations of CPU and GPU resources.
Everyone is pretty much convinced at this point that hybrid computing is going to play big in HPC. Like all things software, development tools are constantly behind the hardware advances. While OpenCL does not address off-node computation like MPI, it does provide a standard method to move forward with hybrid computing. The other good thing about it is you can always grab a cheap video card and free OpenCL SDK to see if it works for your codes. Like any new software model, your biggest invest is time and a few hundred cups of coffee.