The Spread Toolkit
This month's installment of "Do It Yourself" switches gears a bit. Rather than focus on an application, this month's column looks at a development library called the Spread Toolkit, a powerful network communication system. Spread isn't a new project. It's existed in one form or another for roughly five years. Strangely, during all that time, it hasn't received the attention it deserves.
This month’s installment of “Do It Yourself” switches gears a bit. Rather than focus on an application, this month’s column looks at a development library called the Spread Toolkit, a powerful network communication system. Spread isn’t a new project. It’s existed in one form or another for roughly five years. Strangely, during all that time, it hasn’t received the attention it deserves.
Developed at the Center for Networking and Distributed Systems at Johns Hopkins University and partially funded by the Defense Advanced Research Projects Agency (DARPA, the same folks who brought us the Internet) and the National Security Agency (NSA, the same folks who brought us Security Enhanced Linux: http://www.linux-mag.com/2001-09/se_linux_01.html), Spread consists of a code library and a server process designed to solve a number of the problems that arise naturally when attempting to build network-based, reliable, and high-performance applications. By using the Spread library appropriate for your project (there are interfaces in C/C++, Java, Perl, Python, PHP, and Ruby), your code can efficiently and reliably communicate with one or more peers on the network.
Break it down
The project’s Web site describes Spread this way:
Spread is a toolkit that provides a high performance messaging service that’s resilient to faults across external or internal networks. Spread functions as a unified message bus for distributed applications and provides highly tuned application-level multicast and group communication support. Spread services range from reliable message passing to fully ordered messages with delivery guarantees, even in case of computer failures and network partitions.
That’s quite a dense description of what Spread provides. Let’s have a closer look at some of Spread’s core features.
- Multiplexing network connections. If a distributed application consists of 10 processes each, running on 30 different servers, there may be thousands of connections among the various processes. Spread enables any two processes to communicate without establishing a dedicated connection for each process pair.
- Scalability. Spread’s use of multicast UDP (a bandwidth-efficient mechanism for broadcasting data ) provides scalability. Need to double the number of machines in your cluster? No problem. Spread can scale with you.
- Reliable message delivery. Spread makes sure you don’t lose messages. This may not seem terribly impressive until you realize that UDP is an inherently unreliable protocol. Unlike TCP, UDP has no acknowledgements and no provisions for re-sending missed packets. Spread handles the hard work of ensuring delivery over a potentially unreliable network.
- Sequenced delivery. This goes hand-in-hand with reliable message delivery. Spread makes sure that messages are presented to your application in the proper order — even if they arrived completely out of sequence.
- Fail-over. Spread provides the infrastructure for building network services that can survive the failure of several nodes. By using Spread, you needn’t worry about all the details of sharing state and coordinating communication among the servers.
That’s just a small sampling of the capabilities you get with Spread. It’s also worth pointing out that Spread works on most popular platforms (Linux, BSD, Windows, etc.), so it’s an excellent choice for cross-platform, distributed applications.
Spreading the Love
One of the more popular Spread-related projects is mod_ log_spread (http://www.backhand.org/mod_log_spread). As you might guess, it’s an Apache module that enables “logging” of web traffic across a Spread network. It can be a real time saver if you’d otherwise have to automate the log collection process across a large cluster of web servers.
Using Spread, you can designate a small number of hosts for real-time log collection and let Spread worry about making sure all the data gets there. Think of it as adding syslog‘s remote logging capabilities to Apache with the added benefits of guaranteed delivery — you won’t miss log entries.
Java hackers might find the JMS4Spread (http://www.spread.org/JMS4Spread) project interesting. Using Spread for network communication, JMS4Spread provides a completely decentralized JMS (Java Messaging Service) implementation.
Do you have an idea for a project we should feature? Drop a note to firstname.lastname@example.org and let us know.