Shifting Applications Into Gearman

Gearman enables a new level of software abstraction. With this lightweight infrastructure you can to outsource work to better-suited computers, run tasks in parallel, and combine code written in different computer languages.

In real estate, it’s location, location, location.

In computer science, it’s abstraction, abstraction, abstraction.

For example, a CPU is billions of transistors, but its intricacies are veiled by assembly language. Or, consider assembly language: it manipulates memory, registers, and the stack, but such tediousness is disguised by the compiler. And so on, and so on. Programming interfaces hide implementations; objects encapsulate data structures; data structures model the problem… and, well, you get the idea.

Today, even the computer itself is abstracted. The fundamental tenet and real promise of cloud computing is independence from the machinery.

For example, instead of referring a number crunching task to servers groucho, chico, and harpo, your application can delegate work to the ensemble marx_brothers.

Of course, it takes a lot of shenanigans to realize the marx_brothers—it’s still a collection of machines (physical or virtual) and dedicated processes, workload must be balanced, and the additional zeppo might boot from time to time to help out—but the hassle is abstracted from your application.

Better yet, at least for a Web application, the cloud can absorb work that might otherwise bog down machines devoted to data persistence or user interaction.

For instance, consider the chain of events sparked at the close of an auction on eBay: all losing bidders receive notification of the final terms; the buyer and seller receive confirmation of the sale; and the buyer receives an invoice. None of this work need occur “live” it can occur in the equivalent of a computing back-office.

And that’s the premise of Gearman, a lightweight infrastructure to outsource work to additional machinery—your very own cloud. Gearman is a matchmaker (picture a geeky Chuck Woolery) or perhaps better envisioned as a recruiter: it connects workers with employers. Gearman can shift tasks to better-suited computers, run tasks in parallel, distribute copious work, and combine code written in different computer languages.

Gearman was written by Danga Interactive, the progenitors of memcached. Like memcached, Gearman is simple to use, reliable, and continues to evolve to meet real-world challenges. Specifically, contributors hope to add persistence this week to keep the work queue intact in case of failure. Ongoing work also aims to replicate the queue among actors for resilience and failover. (This new work is to be debuted at the upcoming OSCON 2009.)

Here, let’s deploy Gearman on a single machine to demonstrate its capabilities. Extending Gearman from one machine to many is a snap, as you’ll see.

Installing Gearman

A Gearman configuration requires three components: A client, a worker, and the Gearman daemon.

  • The client requests work. More specifically, the client requests a task by name, such as resample or render and provides the raw materials (an image, a scene, URLs, and so on).
  • The worker does the heavy lifting. Each worker can perform at least one kind of task, such as render.
  • The daemon brokers the transactions between the two other constituencies. It registers workers and anticipates clients, and makes matches to facilitate work.

You must write the client and worker; Gearman libraries exist for Perl (CPAN), PHP (PEAR), Python (Subversion repository), and Ruby (Subversion repository).

The daemon is written in C, is open source, and can be downloaded directly from the project page. Ubuntu users can install Gearman via APT with sudo apt-get install gearman-server.

If you choose to build from source, the process is typical and easy, as the package has no unusual prerequisites. The latest version of the Gearman source is v0.5.

$ wget http://code.launchpad.net/gearmand/trunk/0.5/+download/gearmand-0.5.tar.gz
$ tar xfzv gearmand-0.5.tar.gz
$ cd gearmand-0.5
$ ./configure
  …
$ make
  …
$ sudo make install
  …
$ sudo /usr/local/sbin/gearmand -u root -d

(By default, Gearman is installed in /usr/local and associated subdirectories. If you want to install it into /usr, use ./configure --prefix=/usr instead.)

With the Gearman daemon up and running, you can now create a worker and a client (which requests work). Before writing any code, try Gearman from the command-line.

$ # Launch a worker, providing service 'count'
$ gearman -w -f count wc &

$ # Launch a client, requesting service 'count', sending data 'Hello World'
$ gearman -f count "Hello World"
       0       2      11

Cool!

You can start the worker and the client in any order. If no suitable worker is available, the client waits for service (for some reasonable period).

Writing a Gearman Client and Worker

The next steps are to create a client and worker in code. Here, let’s use Ruby to create two simple programs (the Perl, Python, and PHP implementations are identical in spirit and complexity).

As a convenience, let’s grab Andy Triggs Gearman classes from his repository on GitHub.

$ git clone git://github.com/andyt/gearman-ruby.git
$ cd gearman-ruby
$ ls
examples	gearman.gemspec	lib		test

Assuming you have installed the Gearman classes in a central directory for all Ruby programs to access, here is the client.

require 'gearman'
Gearman::Util.debug = true

server = 'localhost:4730'
client = Gearman::Client.new(server, 'example')
taskset = Gearman::TaskSet.new(client)

task = Gearman::Task.new('reverse', ARGV[0])
task.on_complete {|d| puts d }

taskset.add_task(task)
taskset.wait(10)

The standard port for Gearman is 4730. This client looks for a worker providing reverse. It passes the sole command-line argument as data, and when a reply arrives, prints the result.

Here is the worker.

require 'gearman'
Gearman::Util.debug = true

server = 'localhost:4730'

worker = Gearman::Worker.new(server, 'example')
worker.add_ability('reverse') do |data,job|
 data.to_s.reverse
end

loop { worker.work }

The worker registers the task reverse, which merely converts the incoming data to a string and reverses the characters. Here’s the output.

$ ruby simple_worker.rb &
20090513 063030 Connecting to server localhost:4730
20090513 063030 Sending grab_job to localhost:4730
20090513 063030 Got no_job from localhost:4730
20090513 063030 Sending pre_sleep and going to sleep for 30 sec

$ ruby simple.rb 'Hello world'
20090513 063059 Using socket #<TCPSocket:0x16bd4> for localhost:4730
20090513 063059 Sending grab_job to localhost:4730
20090513 063059 Got job_created with handle H:black-2.local:10
  from localhost:4730
20090513 063059 Got noop from localhost:4730
20090513 063059 Got job_assign with handle H:black-2.local:10
  and 11 byte(s) from localhost:4730
20090513 063059 Sending work_complete for H:black-2.local:10
  with 11 byte(s) to localhost:4730
20090513 063059 Sending grab_job to localhost:4730
20090513 063059 Got no_job from localhost:4730
20090513 063059 Got work_complete with handle H:black-2.local:10
  and 11 byte(s) of data from localhost:4730
20090513 063059 Sending pre_sleep and going to sleep for 30 sec

dlrow olleH

Very cool.

Lots of Uses

Although not shown here, you can easily connect a command-line worker with a Ruby client and vice versa. You can also write a Python worker to satisfy Perl clients. The latter feature is especially appealing, as you could choose the language, library, and classes best suited to the task, say, mix Perl’s rich suite of email CPAN modules with a Rails Web application.

Abstraction, abstraction, abstraction.

The sky is the limit and the pending new features mentioned at the outset shift Gearman into overdrive. Happy tinkering.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62