At a summit of open source leaders convened at the O'Reilly Open Source Convention in July, I asked everyone what they thought was the most significant work of open source development in the past year. None of them came up with the answer I was looking for, yet all of them agreed once I proposed it: The work of James Kent, who wrote the gene assembler that allowed the Human Genome Project to finish its work three days before the private effort by Celera Genomics -- thus ensuring the gene sequence remains in the public domain. Kent wrote the 10,000 line program in a month, "because of his concern that the genome would be locked up by commercial patents if an assembled sequence was not made publicly available for all scientists to work on." (The New York Times, February 13, 2001, http://www.nytimes.com/2001/02/13/health/13HERO.html).
At a summit of open source leaders convened at the O’Reilly Open Source Convention in July, I asked everyone what they thought was the most significant work of open source development in the past year. None of them came up with the answer I was looking for, yet all of them agreed once I proposed it: The work of James Kent, who wrote the gene assembler that allowed the Human Genome Project to finish its work three days before the private effort by Celera Genomics — thus ensuring the gene sequence remains in the public domain. Kent wrote the 10,000 line program in a month, “because of his concern that the genome would be locked up by commercial patents if an assembled sequence was not made publicly available for all scientists to work on.” (The New York Times, February 13, 2001, http://www.nytimes.com/2001/02/13/health/13HERO.html).
This story is significant for several reasons:
1. It reminds us that new frontiers, like bioinformatics and genomics, should be on the open source radar. Our goal should be to reach outside the hardcore computing industry to those in other fields who share our vision.
2. It illustrates that we need to think about more than open source code; we need to think about open data.
3. It illustrates that in the future, the public domain may be undermined more by patents than by software whose source code is kept private.
Fortunately, while bioinformatics is not yet on the radar of many developers, the decentralized, bottom-up world of open source means that good things can happen anyway. In addition to the work of hackers like Kent, Perl is used extensively by bioinformaticists; Linux clustering provides the heavy-duty computing power required, at far lower cost than commercial supercomputers. Open source works so well precisely because you don’t need someone else’s permission to get started. You put your work out, others build on it, and before long, you have the classic stone soup of the fairy tale.
That said, if commercial interests succeed in locking up crucial data or techniques through the use of patents, the “open sourcing” of biology and other scientific fields may be retarded.
Patents are developing into an increasing threat to existing open source projects as well. While we haven’t yet seen an actual patent infringement case brought against an open source project, there are troubling signs. For example, SMB 2.0, the next version of the Microsoft file and printer sharing protocol — whose predecessor was reverse-engineered by the SAMBA team so that Unix/Linux machines could interoperate with Microsoft networks — appears to rely on a patented password-changing mechanism. Developers worry that this patent (or other hidden patents) could make it more difficult for them to write code that interoperates with some piece of proprietary code.
Regardless of the details of any particular set of patents or laws, it’s clear that the landscape is changing. What was once simple — free the source code and you free developers and users to learn from, enhance, and redistribute the software — is growing more complex.
Another area of immense change is the way software itself is being delivered. So much of the thinking of both the free software and open source communities is rooted in the idea of software modules being distributed and installed on local machines. But in the age of the Internet, many applications are simply run remotely.
Right now, many of these applications are Web-based and consumed through the browser, but with the rise of what are now being called Web Services, we are starting to think of Internet sites as if they were large-scale software components. One of the biggest challenges facing the open source community is to create a next-generation operating system for the Internet that defines open and interoperable rules for creating and using such components. XML is a big part of this story, so XML and various XML-related open source projects should very much be on everyone’s radar. Support for XML-RPC and SOAP in Perl and Python is an exciting first step.
Building a single-vendor Internet operating system that spans everything from the largest Web sites to the smallest interconnected devices is clearly the goal of Microsoft’s .NET project. However, I’m not convinced we need to rush to create open source or free software equivalents of all the .NET frameworks and services. What we most need to do is encourage bottom-up competing solutions to real-world problems, connected by an Internet-style architecture that allows those independent solutions to interoperate, both with each other and with proprietary offerings from the likes of Microsoft and AOL.
What we don’t need is a single, monolithic Linux and open source strategy. There is a history of battles between Microsoft and other companies, from Novell to Netscape, where each competitor’s “top-down strategy” has foundered on the rocks of Microsoft’s entrenched monopoly position and superior execution of a winner-takes-all strategy.
The technologies that have done best are those that avoided that game entirely and instead countered Microsoft’s “embrace and extend” strategy with one of “open up and connect.” Netscape abandoned the open protocols of the early Web with an embrace-and-extend strategy of their own, which failed despite a staggering early lead in the browser market. By contrast, Apache, which hewed strictly to interoperability and open protocols, has maintained and even extended its lead over Microsoft’s competitive offerings.
Perhaps I seem to be jumping track from open source to interoperability. However, I’m convinced the two are inextricably linked, because open architectures, like those of Unix/Linux and the Internet, are what make bottom-up open source technology development possible. Well-designed open source projects have what you might call an architecture of participation, one in which the protocols between participating programs are well defined, so that the individual programs can work together despite being developed independently. Contrast this with the code “commingling” that judges have found so troubling in the Microsoft antitrust case.
The Internet as it now exists has laid down the fundamental rules of such an interoperable system. One key element of the Internet approach was first articulated in 1980 by Jon Postel in RFC 761, TCP, and repeated and expanded in RFC 1122, Requirements of Internet Hosts. It is known as the “robustness principle” — “be conservative in what you do, be liberal in what you accept from others.”
Overtones of the golden rule aside, this is a cornerstone of interoperability, and the opposite of the games we’ve watched for the last few years, as vendors have built competing implementations that are reckless in extending standards but parsimonious in recognizing competing variations.
Many open source projects have worked to create interoperability where it was not planned. For example, Jabber has been at the forefront of efforts to bring interoperability to Instant Messaging systems. So far, AOL, the market leader, has tried to stop Jabber and similar efforts to open up IM, but I believe they are coming to realize that interoperability may be their only successful weapon in the coming battle with next-generation Microsoft Instant Messaging software.
It is in this context that we need to understand Mono, Miguel de Icaza’s effort to reimplement parts of the .NET framework. Some critics have complained that Mono is bad because it aids Microsoft’s plan to make the .NET framework ubiquitous. But that’s precisely the point. The framework will become ubiquitous to the extent that independent implementations can be interoperable.
Once you accept the ideas of interoperability and open access as the shared cornerstone of both open source and the Internet architecture, and measure projects by the degree to which they work well with others, you may get a different set of filters for what’s most important for open source developers to focus on.
One thing the open source community and its media boosters could do better is recognize and embrace its innovative new projects, rather than keeping so much of the focus on the established projects. The names of James Kent and James Clark, heroes of open source bioinformatics and XML respectively, should be as familiar to us as Linus Torvalds and Larry Wall. In addition, decentralized networking projects such as Freenet, Gnutella, Jabber, JXTA, and BEEP should be as familiar to open source developers as Linux, Perl, and Apache.
It’s important to remember that in an interoperable world, it’s not necessary for a program to “win” to achieve dominant market share and have an impact. In fact, the essence of interoperability is choice. If developers honor the robustness principle, and make software that works well with the software of others, it’s ultimately the users who win.
Tim O’Reilly is founder and president of O’Reilly & Associates Inc. O’Reilly is sponsoring a Bioinformatics Technology Conference in Tucson, AZ, Jan. 28-31. Tim can be reached at firstname.lastname@example.org.