To the casual observer, cloud computing often looks like grid computing, but from an HPC perspective it is not even close.
A while back, I read “Cloud computing is grid, but easier.” Maybe. I believe cloud computing is different and not all that close to grid. In addition, I believe the challenges presented by creating grids coupled with virtualization actually opened the door for cloud computing. I’ll back up my arguments in a minute, but first I invite you to take a look at a grid article from 2004 written by Ian Foster — the architect of the grid concept.
So what is a grid? Foster defined three criteria for a grid:
- A grid must coordinate resources that are not subject to centralized control.
- A grid must use standard, open, general-purpose protocols and interfaces.
- A grid must deliver nontrivial qualities of service (e.g., relating to response time, throughput, availability, and security) for co-allocating multiple resource types to meet complex user demands.
Sounds great in principle. In practice, however it can be rather hard to do. There was at one point the promise of ubiquitous grid computing. Compute cycles would be delivered like a standard utility. If one needed some cycles, just “pull” them from the grid like electricity. That general usage mode never really materialized. Instead, more specialized grids developed. There are some successful and noteworthy grids, for example TeraGrid, Open Science Grid, and LHC Grid.
In particular the physics grids, which are often built around grand experiments like the LHC (Large Hadron Collider), allow almost instantaneous and worldwide transmittal and processing of results. Indeed, the LHC data processing system is a world wide grid. As the technology matured, true general purpose utility computing was still out of reach. The most successful grids, like the science grids tend to be specialized or support a focused application set.
The difficulty in implementing a utility type grid is largely due to managing the details of the execution environments.
Grid works at the “library level” where the users are provided an known “environment” in which their applications should be able to run. Because grids are “open”, the variations in end points can be significant and small details mattered. i.e. ensuring consistent software libraries across hardware domains can present problems. Other aspects such as administrative domains, security, and data transfer are part of the grid environment as well.
Enter the cloud. In a sense cloud computing offers what grid cannot, a predictable execution environment. Thanks to virtualization, the exact execution environment can be created and cloned in the cloud. Grid attempts to link geographically distributed hardware with unique execution environments. Obviously attempts are made to create uniform execution environments, but a large part of grid software is devoted to publishing these environments so that domains may interact. A cloud, on the other hand, is an expandable hardware platform on which a virtual environment is created on-demand.
There are other differences, but I trust you get the idea. In terms of HPC, I think clouds have unintentionally done some “slight of hand” when one looks at the parallel computing performance picture. Where grids paid attention to certain HPC performance guarantees, clouds, in order to be easy to use, have declined such guarantees. In particular, HPC requires a predictable and guaranteed level of I/O — both for storage and compute traffic. Unless a cloud has been specifically designed for HPC, the user cannot expect consistent and/or high performance. There are two papers which discuss this very idea. The first paper looks at Benchmarking Amazon EC2 for High-performance Scientific Computing and the second paper asks, Can Cloud Computing Reach The TOP500? While you are at it, check out our take on HPC in the Cloud.
As you read the preceding paragraphs, you may think I am not in favor of Cloud computing. On the contrary, I believe it is an interesting model, but it is not a solution for HPC (in its current form). And, if you build an HPC cloud, it would start to look more and more like a cluster that supports virtualization on the nodes, which comes with its own performance issues (I/O and memory virtualization). If you have many smallish HPC jobs (i.e. those that could run on an 8 core node) then clouds may be a possibility, but like anything HPC, at the end of the day it is about price to performance. My issue with clouds is that they are often categorized as “grid like” and then were somehow (incorrectly) considered “hpc like.” Cloud offers utility computing like grid promised, but has pushed the application layer further away from the hardware. HPC practitioners spend a lot of time making sure the application is as close to the hardware as possible.
Before I sign off for this week, it should be noted that there are some HPC applications that do not require predictable I/O. [email protected] and [email protected] are two good examples. These applications could easily run in a cloud (in a sense they do run in the Internet cloud). Keep in mind they have been designed to work in a robust distributed fashion and are not virtualized. Clouds can be enticing and even enabling for some applications, but remember a collection of nodes does not a cluster make.
PS I am still Twittering. So far the only value I see the ability to shoot a news story or tech update out to my crew of 23 “followers”. I decided that when I hit 50 followers, I’ll give away a free classic “Beowulf Underground” T-shirt to one of the flock.