Hiding the details of the multi-core and GP-GPU hardware is a really cool goal
In all my worrying about multi-core, I kind of forgot about the whole GP-GPU thing. I knew GP-GPUs were coming and I thought the idea was pretty interesting. I was not sure at the time, however, that GP-GPU processing was going to become mainstream. At present, I am convinced GP-GPU computing is going to become a staple of HPC computing. My rationale is simple, the stream processors (the video card processors) are going to be everywhere in form and therefore they will get used. Just as HPC people don’t like to see cores idle, so will be the stream processors on the video hardware.
As an example, consider the low end desktop motherboards chip-sets. If you use a GeForce 9400 chip-set for Intel Core processors you get sixteen stream processors (540 MHz) in your box. If you chose the AMD 790GX you now have forty stream processors (700 MHz) at your disposal. Expect to see GP-GPU capable hardware anywhere there is a video port, which will be just about everywhere.
From a hardware standpoint, multi-core and GP-GPU hardware is a bargain – lots of cores for cheap. But, there is the software issue. How does one program such a conglomeration. There is some good news as the OpenCL language is on its way. Open CL was developed by Apple Computer and is a standard programming API for GP-GPU and multi-core hardware. It is based on the ANSI C language, but adds some extensions to support parallel operations. At this point if you want to play with OpenCL, you can download an x86 version, as part of the ATI Stream Software Development Kit (SDK), from AMD. Both NVidia and AMD/ATI have pledged support for the standard. An important aspect of OpenCL is the ability to choose different resources at run-time. That is, if an OpenCL application is designed correctly, it can probe for available hardware and adjust execution based on the current environment (i.e. run-time binaries can be portable across many different hardware platforms.) OpenCL also supports a memory hierarchy often found on GP-GPUs. Because of it’s complexity, OpenCL is considered a low-level interface and not the best choice for novice programmers. Indeed, as many HPC applications are already written in Fortran or C, only C programs are possible candidates to port to a GP-GPU. Which is a nice segue into one thing I’m excited about.
Anyone who has been in the HPC game, should know about Automatically Tuned Linear Algebra Software (ATLAS) libraries. This software project was developed by Jack Dongara’s (The Top500 guy) and his crew at The Innovative Computing Laboratory at the University of Tennessee. ATLAS was needed because crafting optimized linear algebra routines for different processors was tedious. (although both AMD and Intel provide hand tuned libraries is for their processors). The nice thing about ATLAS is that after running it on a target platform you end up with an optimized library. It has become so automated, the optimization process can be part of the rpm installation – although you may want to head down to the corner for a cup of coffee and a newspaper. In summery, ATLAS is a nice piece of work that solves a difficult problem.
Given the success of ATLAS, I was excited to hear that Dongara and team are working on the the multi-core/GP-GPU issue. The have recently released the first version of MAGMA or Matrix Algebra on GPU and multi-core Architectures. As stated on the web page, the project’s goal is to develop innovative linear algebra algorithms and to incorporate them into a library that is similar to LAPACK in functionality, data storage, and interface but targeting the next-generation of highly parallel, and heterogeneous processors.
For those that don’t know, LAPACK (Linear Algebra Package) is a set of widely used (and well written) subroutines for HPC. The optimized ATLAS routines are used by LAPACK. The goal of MAGMA is to allow LAPACK users to use subroutines optimized for simultaneous multi-core and GP-GPU execution. That is, as a user you don’t need to know about the details of your underlying hardware only that the LAPACK software is running optimally on your hardware.
There is no doubt the MAGMA project is tackling a difficult problem. I expect the results to take some time, but I am sure there will be great strides made with this project. As an end user, you can expect to benefit from this work in the future. And, as a member of the community, you can help the project right now. If you have the right hardware, why not pull down a version and play with it. Your feedback will be important to the MAGMA team and help them build a better package. Don’t worry though, your hands won’t get burned touching this MAGMA unless you are holding your video card.