Emailing HPC

Email is not unlike MPI. The similarities may help non-geeks understand parallel computers a little better.

Parallel computing seems like a new concept to many people. In particular, multi-core now requires those that program for a living to grasp the basic ideas behind parallel computing. The concept of “parallel” should not be a surprise to anyone, however. The world is a very parallel place. We (humans) are social animals and thus must work (progress) together to solve many common problems, which is a pretty good definition of parallel computing.

Most news stories always provide a quote that relates the speed of a new parallel computer to a single processor. (e.g. “If this program were run on a single processor, it would take 100 years to finish.”) Sounds like breakthrough work, but no one really mentions the building that holds the cluster would take one person 50 years to build. Those types of statements seem silly. So why do parallel computers seem so exotic?

Working together seems natural and obvious. Indeed, parallel computing has been around for a while, but not in the mainstream. When venturing into parallel computing, there are concepts that are very familiar to any construction crew (like those building a new computer center); Amdahl’s Law of parallel computing is one example. It basically says you can’t go faster than the slowest single step. Anyone who has worked on a team project understands this. Message passing is another concept with which everyone is familiar — probably more than they know. I am thinking specifically of email which is probably the largest parallel human/silicon computer on the planet. Think about it. There are bidirectional messages connecting meat bag processors (people). Not only are commands sent and acknowledged, but data are also sent over this channel. For instance, consider a common email conversation:

Taylor sends to Lauren: Add your results (a command) to the attached document (the data)
Laure responds: Okay
Lauren sends to Taylor: The new version (a command) is attached (the data)
Taylor responds: Okay

Email also works as a universal transport mechanism. Pictures, documents, spreadsheets, music, all are sent via email. Actually with the right client it can be a very smart mechanism because it will often associate an application with the data payload.

The similarities to a typical HPC program are interesting. A large amount of collaborative (parallel) work is done via email. For instance, a document is sent to a group of people for processing (MPI_BCAST). They each complete their part and send it back to the team leader (MPI_SEND) who then does a reduction of the data. There may also be some two-way communication with various individuals (MPI_SEND/MPI_REC).

Like MPI, Email is “sent” when it is copied from the user to the host for transport. There is no “blocking” sends or receives or barriers, but these kinds of things could be worked out by the human endpoint processors. MPI requires a more explicitly send/receive structure, where sends and receives need to be matched. Email is totally asynchronous.

The MPI_COM_WORLD equivalent in email is literally the whole world, and therein lies a problem because a message will be accepted from anywhere — and thus SPAM. In general, MPI has a defined subset of endpoints, but unexpected messages can still be received if the endpoint is waiting for input. Email is in a constant receive loop.

Similar to an MPI program, data movement can be optimized by observing how it is used. For example, it is not uncommon for someone to create excessive email traffic by sending a large video file to all their friends. This type of broadcast is allowed in the email protocol. By placing the video in one location and broadcasting a short pointer to the video (e.g. posting on youtube and sending the URL to your friends), the endpoints can then use this data as they seem fit. A similar method is the use of Google Docs. A central copy (global data) is located in one location and pointers to the data are sent to the endpoints (collaborators). The endpoints can download some, none, or all of the global data. Similar concepts are used in parallel programming where optimal data movement can improve performance.

While I am not advocating “email based HPC,” I do think that it may be possible to build a simple generalized protocol on top of email that allows one to “play” and learn about parallel computing. For instance, a participating client (like your desktop) could handle “MPI-Mail” from other machines. If it were a matrix multiplication application and your client had the right handler program, it would just “know” what to do with it, process it and send a result email back to the sender (or another system designated by the sender). The human never sees the email, it is all done in the background.

Similar to BOINC, there could be a general framework for applications. Those that are developed using “MPI-Mail” can be run on “willing” clients. Of course, I glossed over a pile of things that would need to be addressed, but in principle it would make an interesting Senior project.

Finally, what I just described could also be thought of as “bot-net” programming language, so keeping it as a local academic exercise may be the way to go. In any case, the next time someone asks you, “What is MPI and parallel computing?”, just reply, “Email for computer processors, without spam.”

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62