HPC without coding in MPI is possible, but only if your problem fits into one of several high level frameworks.
Most people know Cellular Automata (CA) from the game of Life developed by mathematician John Conway at Cambridge in the late 1960′s and popularized by Martin Gardner in Scientific American. The concept is quite simple. A grid of cells each have a state. The state of a cell is determined by using a set of rules whose input are the state of neighboring cells. Each generation represents the application of the rules to all cells. The game of Life uses a simple set of rules and helps illustrate some basic is ideas behind CA. When using the right set of rules, complex behavior can be observed for a CA. This complexity from simplicity is what makes CA so interesting.
To illustrate a CA, the rules for Life are given below. Consider a square 2D grid (although CAs have been run on triangular or hexagonal 1D and 3D grids as well). In the game of Life, a cell is either alive (on state) or dead (off state). An initial pattern is usually placed on the grid and then the automata is started. Each generation represents one pass over the entire grid.
- If a space is populated (on state):
- Any cell with one or zero neighbors dies, (death from loneliness).
- Any cell with four or more neighbors dies, (death from overpopulation).
- Any cell with two or three neighbors survives.
- For a space that is unpopulated (off state)
- Any empty cell with three neighbors becomes populated. (birth)
If you want to play with Life on-line you can go to sites like this one and enter patterns to see what happens. One thing to note is that small variations in patterns can cause huge changes in outcomes.
A popular and controversial book by Stephen Wolfram called A New Kind of Science was mostly about the properties and possibility of CAs. In general, CA works well when you want to model complex things with lots of little parts. Of course there are other ways to model complex things, but CA has one advantage that it starts out with “simple rules” and generates complex results. There are CA’s for fluid flow, ecosystems, populations, environmental systems, urban growth, and even traffic. There are other areas as well. CA are often used to study “emergent systems” — those systems that start out simple but then self organize.
As you can imagine a CA can be executed in parallel. There are really no restrictions on how far a neighbors interaction can reach, but in general, most interactions are very local, i.e. interactions take place between close neighbors. This type of interaction keeps the dataflow local and thus makes parallel execution possible. If the grid is broken in to parts, only the boundaries need to communicate. Like, Genetic Algorithms, CAs are naturally parallel and can take advantage of the cheap and plentiful parallel computing cycles that clusters offer. There are even parallel CA programming languages like CAOS (which stands for Cells, Agents and Observers for Simulation).
There is a certain appeal to using CA as a tool to simulate large systems. Indeed, CA’s can even produce pseudo randomness as part of their behavior. Cellular automata provides an alternative method to study the behavior of dynamical systems. Traditionally such systems are studied with differential equations. While differential equations are the work horse of such systems, difficulties arise from non-linear behavior, round-off errors, and variations in initial conditions. Because CAs are “simple rules” they are easier to analyze than differential equations and can tolerate modifications without affecting the stability of the analysis.
Keep the CA approach in mind when you have some extra cycles and want to play with an new idea. For instance, what if you try to simulate cluster queuing algorithm. You never know what may happen. The often noted problem with CA’s, which is similar to Genetic Algorithms, is you can find something that works, but you are not sure why. If you want find more information about CA’s then have a look at a 2003 Survey of Cellular Automata or take a gander the old but still useful Cellular Automata FAQ
Our last sledgehammer approach to HPC is currently quite popular. If you have searched for anything on the web, you have probable used a MapReduce algorithm. The idea is basically the same thing *nix users have been doing for years, but MapReduce does it on much larger scale. A good example is given by Steve Staso Sun Microsystems, Inc. in a Hadoop Primer. (Hadopp is a free implementation of MapReduce, see below). Consider the following “piped” set of commands:
cat * | grep | sort | unique -c | cat > file
Nothing fancy there. Now try this on 20 TBytes of data. MapReduce works by distributing the work across multiple servers. Think of it as a partitioned file system where files are broken in to large blocks. Each block is often replicated on several data nodes (for reliability). With this arrangement, searching can be done in parallel. Figure One shows the general MapReduce algorithm.
Figure One: MapReduce Algorithm
In terms of our command string above, the MapReduce algorithm can be thought of as the following
|cat * ||| grep ||| sort ||| unique -c ||| cat > file|
|input ||| map ||| shuffle ||| reduce ||| output|
A MapReduce operation works with key/value pairs. It is primarily used for log processing, web search indexing and ranking, and Ad-hoc data queries. Because the search is distributed and independent, MapReduce algorithms can have near linear scalability. As mentioned the idea is simple, by breaking the data up across servers (hard drives) you can spin many disks at that same time. Another MapReduce trick is to move the processing to the data and not the other way around. This drastically improves performance. Another key aspect of MapReduce is the assumption that the data is static, i.e. it is not going to change while the search is in process. Because the data are often redundantly stored, there is also a level of fault tolerance in a MapReduce clusters.
The most popular MapReduce package is the the Apache Foundation Hadoop project. (Named for project creator Doug Cutting’s child’s stuffed elephant.) Hadoop has two major components, The Hadoop Distributed File System (HDFS) and MapReduce engine with JobTracker and TaskTracker.
The HDFS is built from a cluster of DataNodes. DataNodes can talk to each other to re-balance data, to move copies around, and to keep the replication of data high. There is also a central NameNode serves as both directory namespace manager and “inode table” for the HDFS. Currently the NameNode is a single point of failure in HDFS.
The MapReduce Engine sits above the HDFS. Clients submit MapReduce jobs to the JobTracker which attempts to keep the work as close to the data as possible. HDFS is rather unique in that it is “rack aware.” The JobTracker pushes work out to available TaskTracker nodes in the cluster. This step is where the MapReduce gets its performance — the search is done by multiple tasks. If an individual TaskTracker fails or times out, that part of the job is rescheduled. If the JobTracker fails, the entire job is lost and must be resubmitted. Hadoop is an active project and represents a powerful way to search through reams of data quickly. The trick, by the way is the HDFS. Hadoop has even been integrated with Sun Grid Engine.
If I Had A Hammer
While the three methods I mention above are not typical uses for HPC clusters, they are none the less capable of throwing large amounts of compute power at an appropriate problem. In each case, the user is also removed from much of the parallel computing minutia all to familiar to many HPC users. Such an advantage only really works if you swing your hammer and hitting a round peg in a round hole. If you have a square peg and round hole, a sledgehammer may make it fit, but it may not give the best result. Swing carefully.
Douglas Eadline is the Senior HPC Editor for Linux Magazine.