May’s Law and Parallel Software

This little known "law" is a corollary to the more famous Moore's Law of semiconductor growth.

My last column created some interest outside of the Linux Magazine domain. In addition to being accused of shilling ARM processors, there were those who thought my prediction of ARM based supercomputers quite absurd. Of course, I have been wrong before, but not this time.

How can I be so sure? Let’s take a look some evidence. Consider exhibit A. Take a look at this ARM patch entry on the Open MPI developers list. I have it on good authority that the ARM patch was rather complex and needed no modification. Either my rant about cell phone clusters is more truth than fiction or someone out there is interested in using ARM cores to do HPC.

Moving on to Exhibit B I give you NVidia Project Denver. The marriage of a CPU and GP-GPU makes a lot of sense. AMD seems to think so. From an HPC point of view, it is really a processor and an array-processor. The array processor (like a math co-processor) does the numerical heavy lifting for the CPU. In many cases, scientific codes are dominated by array operations and the “host processor” does more bookkeeping, set-up, and tear-down than actual number crunching. In addition, the new fusion type of processors will have a shared memory design between the CPU and GPU. There will be no need to move data on and off the GPU over the PCIe bus.

Still not convinced. Have a look at the Marvell ARMADA XP processor. It has up to 4 ARM cores running at 1.6GHz with 2 MB L2 cache, four GigE MACs, and supports 64-bit DDR2/DDR3 ECC memory interface. Finally, last fall, ARM announced the Cortex A15 MPCore which can have 1-8 cores with a total address space of one terabyte. These are not phone or pad processors.

All these ARM designs give me visions of small “cell phone” sized modules with 16-32 cores, 32 GBytes of memory, and maybe an SSD plugged into a high speed backbone network. Perhaps, SiCortex was on to something – a pile of low power processors, fast interconnect, and a good software stack.

What about programming. How will these devices be programmed? In particular, will you code a multi-core/GPU phone like a cluster node? And more importantly what tools will you use?
This is where all the hardware excitement meets the cold reality of parallel programming. We are all familiar with “Moore’s Law” (the transistor density doubles every two years). Many people have probably not heard of “May’s Law.” (for the purist, you can substitute “trend” for “law”). In any case, David May states his law as follows, “Software efficiency halves every 18 months, compensating for Moore’s Law.” Think about it. Every new generation of hardware introduces some new form of hardware optimization. Usually these optimizations can be handled by the compiler, which have become quite complex. Compilers hit a wall with parallel computing. When more cores started showing up, due to Moore’s law and some laws of physics, writing software got more complex. When GP-GPUs started showing up, programming became much more difficult (i.e. it takes more work to get your problem to run efficiently on the new hardware).

For example, consider a matrix multiplication code. In almost all cases, writing a parallel matrix multiplication expands the size of the program code. Look at any OpenMP, MPI, CUDA, OpenCL versions and compare them to the serial counterpart. The least explosive and most restrictive is OpenMP where it tries to help the compiler by providing directives. Other methods tend to expand the code, like using assembly language instead of C or Fortran. Also I want to be clear. I fully support the use of OpenMP, MPI, CUDA, OpenCL, etc. I just wish parallel programming was easier.

That software always trails hardware is a well known in this business. That new hardware complicates software development is less talked about. In the case of parallel computing, which is a difficult nut to crack, software tools are slow to emerge (if at all) and getting more complex.

Are there any alternatives? Yes, but there is no clear path forward. First, let me mention, I am big fan of Functional Languages like Erlang and Haskell. I have written about this concept in the past. I also like the work Portland Group is doing with their PGI Accelerator Compilers that allow OpenMP like compiler directives to be added to existing code. Currently they support C and Fortran and CUDA based hardware. Intel is promoting Cilk Plus (pronounced “silk”) as a solution. It is based on the MIT Cilk Project. Cilk has some nice features and it includes a runtime system that takes care of details like load balancing, synchronization, and communication protocols etc. between cores. It is based on augmenting existing C/C++ codes with just three new keywords.

As software progress crawls along, I am convinced that future large scale HPC applications will include dynamic fault-tolerant runtime systems. The user needs to be lifted away from low level responsibility so they can focus on the application and not the complexity of the next hardware advance.

Douglas Eadline is the Senior HPC Editor for Linux Magazine.

Comments on "May’s Law and Parallel Software"

coolul007

Unfortunately, MPI and others want to start a religion rather than solve a problem. Yes, communication is the key to all HPC. Creating a communication interface with more instructions than the host language does not endear one to writing programs. What ever happened to “send and receive”?

Reply
    markhahn

    if your code only uses send/recv, it’s either trivial or has reimplemented some of the higher-order communication forms that MPI provides. reimplementing is not the greatest sin, but often results in poorer versions. for instance, do your collectives scale as well as MPIs? do they take advantage of hardware support that the interconnect may provide?

    Reply
zakhurlifesbane

Gee, Erlang is parallel and fault-tolerant. I am thinking some hardware needs to “back off” in terms of how it manages cache and processors need dedicated roles in the sense that some (like GPUs) do computation and others do logic, setup, cleanup, AND cache management. Hardware cache management is optimized for linear computing and not always for that. Put that back under program control (well sorts, like setting up DMAs and defining limit addresses for incoming, outgoing and working cache with rollin-rollout of each) CPU Cache is many times faster than main memory and needs good management.

HPC will use all of those, but we are still a long way from any really viable AI except in limited areas. Many of those require breakthroughs in other areas of understanding–such as speech-to-text where hardware can hear a lot more than the human ear and we need to focus on what makes speech meaningful.

Reply
linus_cal

Great post DEadline…your prediction couldn’t have come at a better time.
Let the wars begin…
http://blogs.forbes.com/briancaulfield/2011/03/17/intel-snaps-up-mobile-multimedia-specialist-silicon-hive/

Reply
wgodoy

I think the author of this article needs to take a look at Java Threads. It’s not even mentioned. Java threads (which is not hard to program) combined with MPI gives a boost to HPC and more control on your code then OpenMP.

Reply
Jonathan May

David May would have been a good person to interview about this! I am sure he has quite a strong view – particularly given your very specific interpretation of his law.

David very publicly takes the view that parallel programming, concurrency, real-time software and event-driven programming are all easy: simple problems to solve given the right tools. In fact, much of his commercial work (transputer, Occam, XMOS etc), and his academic work, has been about making highly efficient, easy-to-use concurrent devices, compilers and languages.

Can I suggest a follow-up article where you interview him? I can arrange an introduction if you like.

Reply
    pavlik

    Surely, why not an Occam revival? Great idea, great language! Ahead of his time?

    Reply

    agreed jonathan, perhaps he and Carl Sassenrath, CTO
    REBOL Technologies and lead amiga OS tech can talk now that REBOL is goign open source http://rebol.com/
    ” Fighting software complexity…
    Software systems have become too complex, layers upon layers of complexity, each more brittle and vulnerable to failure. In the end software becomes the problem, not the solution. We rebel against such complexity, fighting back with the most powerful tool available: language itself.”

    the fact this artical is over 12 months old now perhaps Douglas and linux mag might like to get them together and do a retrospect from two long time legends and how they can help the new massive ARM Linux commercial world vendors di it right and give the world a new direction in KISS

    Reply

It’s hard to tell with these Internet startups if they’re really interested in building companies or if they’re just interested in the money. I can tell you, though: If they don’t really want to build a company, they won’t luck into it. That’s because it’s so hard that if you don’t have a passion, you’ll give up.
Divorce solicitors london

Reply

Leave a Reply to pavlik Cancel reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>