Paul Fremantle explains the concept of an Enterprise Service Bus (ESB) and how to use Apache Synapse, an open source enterprise service bus that will help your organization work more efficiently.
Apache Synapse is a simple lightweight ESB that works well on Linux. In this article we will take a look at the project, work through a simple scenario, and look at how it works in a Linux environment. You might be asking, “well, what the heck is an ESB anyway?” Read on, and all will become clear.
What is Synapse?
If there was a single phrase less clearly defined than any other in software technology, Enterprise Service Bus (ESB) is a strong contender for the prize. (Right along with “middleware, “and “service-oriented architecture,” but that’s a discussion for another time.) And while there are a number of commercial products and open source projects called ESBs, there isn’t a single approach that covers all of them.
When I’m asked to define Apache Synapse in just a few words, I say it’s a smart, open source router that can transform messages, monitor traffic and connect different systems. In addition, it’s fairly simple to have several Synapse nodes work in concert. Just like a set of network routers form the backbone of a network, so a set of Synapse nodes form the backbone of a “bus.”
Effectively, Synapse is a process that runs, listens for incoming messages, does useful things to those messages, and then sends them on. What are those useful things? Synapse can log the messages, validate them according to schemas, authenticate and authorize, route them — including doing content-based routing, transform them from one format to another, and switch protocols. It can also do things like load-balancing, failover and throttling. And it is very extensible, so you can add your own logic — or, in Synapse jargon, “mediators.”
Runtime and Deployment
Apache Synapse is written in Java and requires a Java Virtual Machine (JVM) to run on. It needs JDK 1.5, and typically uses 30-100Mb of memory depending on load and message size. While Synapse is written in Java, it has been designed with the idea in mind that some users might not be Java experts.
Synapse comes with a Linux daemon shell script, so that it can be deployed to load at startup time like any other daemon. It also can be started using a standard shell script or batch file. Under Windows it can be deployed as a Windows Service. Being an open source project, it ships with full source code, but is also available as pre-compiled binaries for Linux, Solaris and Windows. Typically installation is as simple as downloading, unpacking the archive (tar.gz or zip), and typing
A First Example
The simplest example is pretty boring. The most basic thing you can do with Synapse is to take an XML message sent via HTTP request, log it, and send it on to another server. In this case we will “hard-code” the address of the target server. Figure 1 shows the configuration in question.
Figure 1: Logging HTTP Requests
<definitions xmlns= "http://ws.apache.org/ns/synapse">
<proxy name= "XMLProxy">
<address uri= "http://remoteserver.com/xmltarget" />
<log level= "full" />
In this simple example the proxy tag defines three main things: the endpoint to which messages will be sent, what happens to incoming messages (log) and to responses (just send the response back).
A More Interesting Example
Now let’s look at a more interesting example. If you’re familiar with Google Documents you will know that you can create a shared spreadsheet that is accessible as a CSV file. Suppose we want to read this CSV file, convert it to XML messages, and send these out– one per row– to another server. For example, a user might add a new row to the spreadsheet using their browser, which might be used to kick off a process running in an enterprise system running on a mainframe.
In order to implement this scenario, we are going to use a combination of simple techniques:
- A simple user supplied task will grab the CSV from Google at regular intervals.
- A filter will look for these messages.
- A CSV-to-XML mediator will convert the CSV into a common XML format.
- The built-in iterate mediator will split this into separate messages, one per row.
- The send mediator will publish these messages to an open source broker — Apache ActiveMQ — via the JMS interface.
The very first thing is to grab the CSV from Gooogle Docs. A simple task does this:
<task class="org.apache.synapse.contrib.tasks.HTTPGetTask" \
<property name="to "value="urn:csv"/>
<property name="HttpURL" \
The task is executed every 60 seconds (
trigger interval=60), and it injects a new message into Synapse containing the contents of the HttpURL. Ideally, Google Docs would set the right HTTP headers (Last-Modified, ETag) to indicate whether there was any change in the spreadsheet, but at the moment they don’t, so we simply publish the contents once a minute, irrespective of changes (though its actually simple to write a mediator that drops the message if it’s the same as the last one).
In addition, we target the message to the virtual address “urn:csv”. We do this because we don’t want to send it out yet– we still have processing to do.
The next thing is to match any messages that are targeted at this virtual address:
<filter source="get-property('To') "regex="urn:csv">
Anything matching this will then execute the following logic:
<class name= "org.apache.synapse.mediators.contrib.FlatPackFileToXML">
<property name= "ParserType" value= "delimited_titles" />
<iterate xmlns:rs= "http://ws.apache.org/synapse/ns/rowset" \
continueParent= "false" \
<target sequence= "publish" />
What’s happening here is that first we are converting the CSV to XML. The converter is told to expect the CSV to have a row of titles, and delimited fields. Then, we iterate over the resulting XML, creating a new message every time we hit an XML element called row inside the rowset element. For each new message, we execute a sequence called
Finally, the publish mediator sends the message to a new topic:
<property action="set "name="OUT_ONLY "value="true"/>
Note that I deleted some of the incredibly long JMS URL so that it would fit on the page! Before you run this, you need to copy the extra mediator JARs into the Synapse lib directory, and also follow the instructions on enabling JMS in the samples guide.
The result of this is a simple 25 line XML file which pulls together a set of re-usable mediators and tasks into a useful integration. Now we’ve looked at some configuration examples let’s return to the runtime.
Asynchronous / Non-blocking
For those of you who are experienced Java users, Synapse also runs in the popular Apache Tomcat server as well as other enterprise Java application servers. However, it’s worth pointing out that Synapse does not — by default– use the standard HTTP engine provided by Tomcat. Why not? The Servlet model which Tomcat uses is based on a blocking IO model.
In this model, the server uses one thread for every incoming connection. While this is acceptable (if not perfect) for running Web applications, the model falls down badly for a router or ESB like Synapse, where the number of connections that can be processed concurrently would be limited by the number of threads. Effectively the threads would end up blocked waiting for responses from remote systems.
Instead, Synapse has its own HTTP and HTTPS server code, based on the Apache HTTPCore project. Because the Synapse HTTP code is completely non-blocking, Synapse can support very high levels of concurrent connections with a smallish thread pool. We have tested it with 5,000 concurrent TCP/IP connections and a thread pool of around 50 without problems.
We have talked about HTTP, HTTPS, JMS, and file-system support, but it would be a very limiting ESB that didn’t support other protocols. Out of the box, Synapse also supports SOAP, FTP, SFTP, Mail (SMTP/POP3/IMAP), and databases via the JDBC API.
Via JMS Synapse supports various Message Oriented Middleware such as IBM WebSphereMQ, Tibco, etc., as well as the Apache ActiveMQ we used for the example. An interface to the Jabber/XMPP protocol is just in the works, and the model is extensible so that other transports can be added by anyone.
Support for different message formats includes standard HTTP requests (GET, POST), SOAP 1.1 and 1.2, XML, JSON, and binary message formats. For SOAP the support includes full security (authentication, encryption, signatures) as well as reliable messaging.
Streaming XML support
Synapse was originally written with a strong focus on XML messages. Of course it can handle other messages (text, binary, fixed-field, etc.), but the aim has always been to support XML particularly well. The reason for this is two-fold. Firstly, XML is actually fairly hard to support efficiently. Many other ESBs when faced with an XML message parse the whole message into memory, which is slow, and often unnecessary.
Synapse streams XML messages instead, and only parses them as needed. For example, suppose the message is being routed based on an HTTP header. In this case, Synapse has no need to look inside the XML, and it can simply be streamed from the input socket to the output socket.
However, Synapse is a little cleverer than just this. Suppose we want to route a message based on data in the first few XML tags– Synapse can parse just enough XML to decide where the message is going, write out those first tags onto the output socket, and then stream the rest. The result is that Synapse is very efficient in memory, using half as much as other systems with the same workload.
The other reasons for focusing on XML was to have a single common format that could be supported across any different system. By making it simple and efficient to deal with XML messages, Synapse encourages the use of XML as a common format. The aim is to ensure that departments, business units, web sites or other parties can easily communicate via XML– so other parties don’t need to understand legacy formats to communicate with them.
Other Linux Linkages
I already mentioned that you can run Synapse as a Linux daemon. In addition the logging system is syslog-enabled, so it is fairly simple to integrate Synapse into your existing log monitoring and management infrastructure.
If you are using Web Services Security (WSS) for authentication and authorization, then a recently published plugin supports linking this to the Linux Pluggable Authentication Modules (PAM), meaning you can use your existing user database to authorize requests.
I hope this has whet your appetite to find out more about Apache Synapse. If you want more information, you can find out more at the Apache Synapse Web site. In the resources section you can also find links to presentations, and the rest of the code to try out the sample described.