Cloud and Multi-core offer new modes of High Performance Computing. Will it suit your needs?
I’ll get back to my coverage of R real soon, but I wanted to continue my thoughts on Cloud HPC. In addition, one of the reasons I need to postpone this article again is my small personal Limulus cluster had to be been taken apart, measured, checked, and reassembled. I use this cluster to try things (like R) and develop software. I am working with a sheet metal fabricator on the next (and final) revision of case modifications. I also installed a new kernel that caused some USB issues. I resolved the issue by using a different cable, but the old kernel still works fine with both cables, go figure. Without USB I cannot control the power to the nodes (unless I rewire some things), so it was slow going for a while. In any case, I had more thoughts about Cloud HPC as well.
First, some equal time, after last weeks column on the new Amazon Cluster EC2 offering, I received a note from Adaptive Computing (previously Cluster Resources), best known for their Moab and Torque packages, that announced support for EC2 Cluster instances. It seems a cluster in the Cloud my not be to far off for many people. Which brings me to my latest prognostication. I propose there may be another type of split in the HPC world, but first a little background on the first possible “split.”
In a previous column column, I suggested that with increased core counts HPC programming models would diverge. This split is due to the inability of many HPC applications to scale above 32 cores and the ability of modern processors to put 48 cores in a single SMP domain. I assumed that a thread based model (OpenMP) would be adopted by many “small” applications and a distributed model based on the current message passing (MPI) methodology would continue to be used on clusters. I even set up a poll that asks How would you use a 48 core PC (4P with 12 core Magny-Cours) next to your desk?. Interestingly, 40% percent of the 99 respondents said they would use this machine to run all their applications (parallel and sequential) and 25% said they would use a combination of this machine and a cluster to run their applications.
These results correspond a previous poll about the scalability of applications where roughly 55% of 175 respondents said they use less than 32 cores. IDC has reported that 57% of all HPC applications/users surveyed use 32 processors (cores) or less. Thus, as I suggested previously, we could be in for a change in the way HPC applications are written — threaded (OpenMP) for low core counts, distributed (MPI) for large core counts (although MPI will work well in both cases).
If we were to assume half of the cluster users may go away due to multi-core, then what happens to the rest of the market? Note, I don’t think this is will actually happen because there are issues of support, power, cooling, utilization, etc. that get factored in to desk-side systems, but for the sake of provocative journalism, let’s assume a large portion of the HPC market migrates to SMP systems. What about the other 50% of the market. According to my little survey, 25% of these people would use both a desk-side system and cluster. Enter Cloud HPC.
An HPC Cloud option could be very attractive to this dual use population. Consider that this user could test, develop, and run applications “desk side” and when they need more cycles, off-load jobs to their local cluster, to a cluster in the Cloud, or to both. Here is where it gets interesting. If Cloud HPC works for your application(s), it is almost always going to be cheaper and easier than buying a new cluster.
Indeed, one of the issues that still plagues the market is cluster administration and support. I often get inquiries asking if I know of competent cluster administrators. This shortage is often considered a “hold back” to market growth. Procuring and running a local cluster requires people, power, cooling, space, and time. The total cost of ownership for a cluster can be several times the hardware cost. Cloud HPC reduces many of these costs and removes the need to find qualified people to administer a cluster.
Getting back to my little Limulus machine. Right now it has 10 cores and very soon will be upgraded to 16. I could very easily test, develop, and run applications locally that I want to scale up large numbers of nodes. Once the applications are ready to run, I could submit jobs to the Cloud using the same scheduler that I use to run jobs on my local system (in my case Sun Grid Engine). Because I had a platform to test and run codes before I ran them on larger number of cores, I could make better use of my Cloud cycles (i.e. I would not need the Cloud to debug or test codes. Having a local machine allows me to “own the reset switch” as it were.) The same could be said for 48 (or 24) core desk-side SMP machines. My local desk-side HPC machine has become a portal to the HPC Cloud.
And, finally, where does this leave the local data center clusters? There are many cases where local clusters makes sense including security, privacy, reliability, and performance. For instance, the current EC2 offering from Amazon is based on 10 GigE, which may not be adequate for many users who require InfiniBand performance. I like to think that Cloud HPC will augment and expand what is already there.
The end result is a lower cost of entry for HPC, which allows HPC use to increase. Based on these trends, there may be three “modes of HPC” emerging . The first will be the traditional local cluster with which we are all familiar. The second may be the desk-side SMP or cluster systems. These systems will handle small jobs, less than 32 cores, and may include both OpenMP and MPI applications. The third mode could be the desk-side/Cloud users. These users will employ their desk-side system as a primary HPC resource and expand to use external (Cloud or local cluster) cycles as needed. Either way, HPC is getting closer to the desktop.