My yearly check-in with Intel's lead software director/evangelist covers compilers, AMD processors, new tools, benchmarks, and other developments that make ordering your bits a bit easier
As we are now officially in the holiday season and done with SC10, I can only think of one thing. Next year’s SC11 in Seattle. Just kidding. I’m still digesting SC10 and still surprised we are heading back to Seattle next year. My understanding is that SC tries to rotate east coast then west coast every other year. Let’s see, SC10-New Orleans, SC09-Portland, SC08-Austin, SC07-Reno, SC06-Tampa, SC05-Seattle. I see a little bias for the left side of the Mississippi river. Except for Tampa, the last time it was even near the east coast and the northeast corridor was SC04 in Pittsburgh. And, now we loop back to Seattle next year. I’m not even going to guess who Microsoft will foist on us for the keynote.
Back to New Orleans. On Monday of SC10, before all the hoopla began, I managed to have lunch with James Reinders of Intel. James title is “Director, Evangelist, Intel Software.” He also has written quite a lot about parallel computing and blogs about many of the major software advances coming out of Intel. Each year I check in to see what is new and interesting.
Before we started with new developments, I had a question that I wanted to ask. I often talk with end users and there has been somewhat of a debate as to the use of Intel compilers (C/C++, Fortran) on non-Intel platforms (i.e. AMD). There are those that suggest Intel is biased against AMD hardware and probably does not optimize code as well as it does for its own processors. Indeed, many suggest using the AMD Open64 Compiler Suite for this reason. (AMD Open64 is freely available.) That seems like it might make sense, only if it were true. I asked Jim flat out if this rumor had any basis in fact. His response was something to the effect, “We work to win business by striving to offer the best performance of any compiler or library, please let us know if you find we do not.” Then he showed me some numbers.
The new versions of the Intel Compilers (Version 12.0) were found to be 1.4 times faster on an AMD Opteron than the best of Portland Group and Windows Visual Studio for the specfp*_base2006 (C/C++) benchmark. Results for the specint*_base2006 were 1.1 times better than the best of the other compilers. The results were run on Windows, so I won’t make any claims about Linux performance except to say that other than managing memory, the OS does not have a big influence on floating point performance. The take away; there is no penalty for AMD hardware when using Intel compilers. And, if you find the Intel compilers are on the loosing end of your AMD processor benchmark (or Intel for that matter), Intel wants to know about it. By the way, the new 12.0 Intel compilers show improved results onIntel processors as well.
Moving on, I want to highlight some of the new features that may be of interest to HPC users, but first make note of some of the big picture things we talked about. Of course, one of the recent developments in HPC has been the use of GP-GPUs to accelerate Single Instruction Multiple Data (SIMD) programs. Jim and I share the opinion that parallelism needs to be looked at from a higher level of abstraction than many of the tools currently in use. One such approach, and newly offered by Intel, is Intel Cilk Plus which is based on the Cilk Project from M.I.T. (pronounced like “silk”) Aside from a rather elegant parallelization model, Cilk does something any respectable parallel language must do — make parallel execution decisions at run-time. If you are wondering why people care about abstraction level, remember this, low abstraction level higher software costs, high abstraction level lower software costs. Ask anyone who writes all their software in assembly language about that concept.
There was also some re-packaging of all the tools. While I am not one to try and explain the marketing reasons for such a change, there are now two major bundles or suites. The first is Intel Parallel Studio XE which includes Intel Composer XE (Compilers, MKL, and libraries), Intel VTune Amplifier XE, and Intel Inspector XE. The second bundle is the Intel Cluster Studio which includes Intel Composer XE (Compilers, MKL, and libraries), Intel MPI (now scales beyond 50K cores), and Intel Trace Analyzer and Collector. All components are available as separate products as well. There is also a Intel Parallel Studio (no “XE” for extreme edition) for Windows Visual Studio. Check the product site for more information.
On to the news you might find most interesting, Intel Fortran Compiler XE 12.0 now has complete support for the Fortran 2003 standard and some support for Fortran 2008 standards, including Co-Array Fortran, vector optimizations with Intel AVX, and help in auto-parallelization. Good stuff.
One other interesting development is frame analysis in Intel VTune Amplifier XE. A frame is a repeatable unit of the program marked by an special API call. Using frames allows one to sort and filter execution information. Rather than focus on the function and call stack traces right away, frames allow you to focus on the individual repetitions (of the framed area) that run slow. Frame analysis was inspired by the needs of video applications like games that must meet a latency target for each video frame.
In closing, I should mention we had lunch at Emeril’s in New Orleans. You know the “BAM” guy. Unfortunately, I was not all that hungry and had a standard salad. It was good, but I probably missed out on the chance for a crawdad casserole XE (extreme edition) or some other New Orleans treat.