Where did Apache come from? How has a diverse group of programmers spread out all over the world created a program that captured the majority market share in this important software category?
Before there was an Apache Project, there was the Web server
I built and tested at the National Center for Supercomputing Applications (NCSA)
at the University of Illinois Urbana Champaign campus. At that time, the most
widely deployed Web server was produced by the CERN group in Switzerland, which
is also where the World Wide Web had been invented. Marc Andreesen was working on
the Mosaic Web browser (the precursor to the Netscape browser) at the NCSA and
found the existing CERN server to be large and cumbersome. I took a look at the
source code and agreed. My work as a systems administrator at the NCSA had
prepared me for the task of writing a server, so when Marc asked me to write a
new one, I agreed to.
The CERN server was based upon a protocol called “hypertext transport
protocol” (HTTP). I downloaded the specifications for HTTP and started working on
the NCSA HTTPD. The D in HTTPD stands for “daemon”. Under UNIX, a daemon is a
server program — a program that runs continuously and waits for other programs
to connect to it. The protocol at that time was around version 0.9 or 1.0, and
was very simple. I found that it wasn’t the protocol itself that was interesting,
but rather all the things that could be built around it. For example, one day
Marc was tossing around some ideas and he started talking about having a URL (a
“Uniform Resource Locator”, which is just a standardized method for addressing
items on the Web) that could run a program. At first I didn’t understand what he
was talking about, but when it hit me, it was a huge idea! It was the idea that
eventually led to clickable image-maps and made HTML fill-out-forms possible.
After that discussion I began working on the “htbin” interface, which later
evolved into the “Common Gateway Interface”, or CGI. These interfaces specified
a standard method for a Web server and an external program to pass information
between each other. The “external program” can perform any sort of arbitrary
processing on the data passed to it and then return the processed data to the Web
server for transmission back to a Web browser. Thus the server became almost
infinitely extensible and flexible in its ability to process data.
Nowadays, when the Apache guys want to modify the server, they discuss their
ideas on the Apache mailing lists, but at that time, changes to the servers or
the protocols were discussed on the “www-talk” list. That is where htbin and CGI
were formally worked out.
Although it was a lot of work, and could be stressful, the NCSA server was a
very enjoyable project. I often looked back on it fondly as I worked at Netscape.
Especially on the www-talk list, I felt a sense of community. There was also a
sense of that in the newsgroups, and in the e-mail I would receive from the
people using the software. I got a lot of ideas and feedback from them. One of
the best ideas I received was from Charles Henrich. He had developed a mechanism
for embedding program output in HTML files. The server would scan a page as it
was sent out, and replace certain tokens with the contents of a file or with the
output of a program. He called the mechanism “server side includes”, although I
later renamed it “server parsed HTML”. Unfortunately I renamed it after it had
been released for some time, and both names stuck.
At first I was somewhat leery about the idea of a server parsing HTML files,
but this mechanism has now become one of the key components of Web servers. It is
widely used for advertising and a variety of other applications. It was the
connection with the users of NCSA HTTPD through the www-talk mailing list and the
newsgroups that made these innovations possible. Creating the server in that
environment was a very enjoyable experience.
|What a mccool guy: Rob McCool wrote the original NCSA|
server that eventually became the Apache Project.
By this time, I had begun working for Netscape, where I was responsible for
creating an all new HTTP server. The very early versions of what became the
Netscape server were developed on a 486 running Linux. Although I never directly
contributed to the development of Linux, I had downloaded one of the Slackware
1.x releases and installed it on my home computer. It ran surprisingly smoothly
on the small machine I had, and I also found a great deal of similarities between
the community of Linux users and the community I had been a part of while working
on the NCSA server. This sense of familiarity led me to begin writing the new
Netscape server under this OS. Unfortunately, a corporate-level decision was made
to prioritize other OSs over Linux, and the server slowly drifted away from being
buildable under it.
Because the Netscape server would not be Open Source, I needed an alternate
method to allow users to directly modify its behavior. The method also needed to
be faster than CGI or server parsed HTML. This led me to create the Netscape
Server API (NSAPI). Through it, people could write standalone modules that could
be loaded into the server via a configuration file. Thus people would have a
standard framework within which they could modify the behavior of the server.
When the Apache Module API came to be, it was based in part on some of the ideas
NSAPI went part of the way towards providing the benefits of Open
Source, but not all the way. One of the problems was documentation; NSAPI was
difficult to articulate. A lot of the time when users would ask questions, I
needed to go look at the source code to find the answer. Allowing users more
access to the source code would have made many things much easier. I also
received regular requests from users for source code to various pieces of the
server. The most frequently requested pieces were the CGI engine and the server
parsed HTML engine. In hindsight, I think releasing the source code for the
Netscape server would have helped our users quite a bit.
There had been a strong sense of collaboration and community surrounding the
development of the NCSA HTTPD. Still, I had been the primary driving force behind
it, and when I left for Netscape, it was left somewhat adrift while NCSA tried to
find new people to work on it. At the same time, I became less active on the NCSA
newsgroups and mailing lists, as developing the new Netscape server took all of
my time. I answered questions here and there, but mostly forwarded them back to
the NCSA. I stopped interacting directly with the people who used either of my
servers, both the NCSA HTTPD and the Netscape HTTPD, because there were too many
of them and because I didn’t have the time. I missed interacting with them, and I
missed the feedback they gave.
As time went on, a number of users began patching the NCSA server code to do
things that they wanted or needed it to do. I’m not sure of exactly how or when
this happened, but they collected their patches together and released their new
server as “a patchy server”, or “Apache”. I was happy to see that, because it
meant the community spirit that had surrounded the NCSA HTTPD server was
Part of the reason Apache was possible is that Marc had earlier decided to
release NCSA HTTPD under a “do anything you want with it” license. The Mosaic
browser had been released under a fairly restrictive license, but I liked the
idea of an unrestricted license for the server. From what I understood, Marc’s
decision to make the server license unrestricted was based on problems he had
encountered with restrictive licenses for Gopher servers. Gopher was a pre-WWW
system that was similar to the Web in functionality, but used an entirely text
Seeing what the Apache team was doing made me miss being a part of the HTTP
server development community. I wanted to help out. I also thought that since the
NCSA code could be horrendously confusing, I might be able to lend some insight
to the Apache developers. I wondered if it was a conflict of interest to be
involved with the Apache project, but at that point Netscape did not consider any
of the free servers to be competitors. We were focused mainly on products from
companies like Spyglass, Open Market, and Microsoft.
I joined the Apache development mailing list, and explained why I had done
things in a particular way. It was enjoyable for a while, but I soon began to
notice rising tensions. Various publications such as Netcraft began making direct
comparisons of market share between the Netscape HTTPD and Apache. With the
rising tensions and the increasing comparisons, it started to seem like there was
a definite conflict of interest if I remained on the mailing list. At the same
time, the Apache code base had been mostly rewritten, which meant that my
usefulness was nearing an end. So I removed myself from the list.
Whether I’m on the mailing list or not, I’m glad the Apache project is here.
I’m thrilled to see that it’s getting a lot of attention from IBM and others.
It’s very well deserved and the team has done some incredible things with the
original source code. I was pleased when the project started, because it meant
that the community spirit that had fostered the NCSA HTTPD would not die out.
With Apache, people continue to have an open, simple web server that they can use
and modify, which is exactly what I had set out to create at NCSA.
Rob McCool was the author of the original Web server, The NCSA HTTPD. He went on
to write the Netscape Web server. Rob can be reached at
|Too many chiefs? The Apache team poses with its well-|
known feather at Apachecon ’98.
What is it that makes an Open Source project so
successful that it can compete in the same product space as multibillion dollar
corporations? It isn’t the price — many organizations have switched to Apache
after they had already purchased a competing product. It isn’t the sheer number
of contributors — the Apache development team is tiny compared to those of
Netscape’s Enterprise Server or Microsoft’s IIS. It isn’t just because the source
is open — most Open Source projects never succeed this far. What, then, are the
secrets to Apache’s success?
When the project began in February 1995, the most popular Web server software
was the public domain HTTP daemon developed by Rob McCool at the National Center
for Supercomputing Applications (NCSA). However, development of NCSA HTTPD had
stalled after Rob joined Netscape in mid-1994. Many webmasters had developed
their own extensions and bug fixes that were in need of a common distribution. A
small group of these webmasters gathered together via private e-mail for the
purpose of coordinating their changes (in the form of “patches”). Brian
Behlendorf volunteered the use of his server as a shared information space,
hosting a public mailing list for communication and providing logins for the
developers, including Robert S. Thau, Rob Hartill, Randy Terbush, David Robinson,
Cliff Skolnick, Andrew Wilson, and myself.
Apache has never been organized around a single person or primary
contributor. As a result, the first problem we had to solve was determining how
to make decisions and coordinate our separate efforts into a unified product. In
contrast to the popular press’ descriptions of Open Source hackers, the original
Apache Group consisted of three doctoral students and one Ph.D. in Computer
Science, three professional software developers, and a software business owner.
There was no shortage of ego, but thankfully there was no shortage of experience
and reasoned debate either. We were located in California, Nebraska, Boston, and
Britain, so regular meetings in person or by teleconference were out of the
question. Furthermore, we each had (at least) one other “real” job. We
collaborated on producing and supporting the Apache server out of enlightened
self-interest: By pooling our efforts, the resulting product was much more
functional and robust than anything we could have produced alone. However, this
also meant that none of us could devote large blocks of time to the project in a
consistent or planned manner.
In order for the project to succeed, our development process (the procedures
by which we make decisions and coordinate our efforts) had to reflect this
globally distributed and volunteer organizational environment. There was no
Apache CEO, president, or manager to turn to for making decisions. Instead, we
needed to determine how to make group decisions without using synchronous
communication and in a way that would interfere as little as possible with the
What we came up with was a system of voting via e-mail that was based on
minimal quorum consensus. Each independent developer could vote on any issue
facing the project by sending mail to the mailing list with a “+1″ (yes) or “-1″
(no) vote. In order to change the code base, three positive votes were required
and there could be no negative votes (vetoes). For other decisions, a minimum of
three “+1″ votes and an overall positive majority was required. Anyone on the
mailing list could vote, thus expressing their opinion on what the group should
do, but only votes cast by the Apache Group members were considered binding.
The voting system had a number of interesting properties. By setting the
minimum at three positive votes, only a subset of the group has to be involved in
any decision. This allows developers to focus on the project when they have the
time, without blocking overall progress on any given day. At the same time, the
minimum vote enforces a high degree of peer review over all code changes. Veto
power is used sparingly and only when backed by a good explanation, primarily as
a means of preventing software bloat (the addition of too many features to the
core code) and ensuring that no change would knowingly break a significant
Software professionals are often surprised that so many developers can be
given veto power over a project and not have it be continuously deadlocked over
design conflicts. Apache can do this because it is more than just a software
product — it is a community. Each of us invests a considerable amount of time
maintaining the software through the mailing lists, and recognizes the similar
efforts of the other group members, so none of us is inclined to block progress
on the server code for anything but the most important issues. Since we have
widely varying backgrounds in computing and represent different types of Apache
customers, each group member focuses on reviewing and enforcing design criteria
that match their own expertise.
Voting does have its drawbacks, however. During periods of rapid and focused
development, voting can become a barrier and a frequent source of friction among
the developers. What should be a fun project would sometimes become bogged-down
in the details of decision by committee. We have since improved our development
process as new tools became available and as the context of the project and
membership of the Apache Group changed over time.
In 1996, we switched to using CVS (Concurrent Versioning System)
(http://www.cyclic.com/cvs/info.html) for managing our shared information
space, including product distributions and web site content. CVS allows a central
repository to maintain consistency and simplifies the task of committing changes
from many remote developers. Changes to the repository are summarized and sent to
a mailing list, thus notifying the other developers of the content being changed.
Consequently, we now review changes after they are committed to the source, thus
streamlining the approval of simple fixes. Anonymous access to the current source
code base is provided to the public, enabling a wider testing audience and
satisfying those who like to live on the cutting edge.
Although the Apache Group makes decisions as a whole, any actual work on the
project is done by individuals. The group does not write code, design solutions,
document products, or provide support to our customers — individual people do
that. The group provides an environment for collaboration and an excellent
testing ground for ideas and code, but the creative energy needed to solve a
particular problem, redesign a piece of the system, or fix a given bug is almost
always contributed by an individual volunteer working on his own, for his own
purposes, and not at the behest of the group. Competitors mistakenly assume that
Apache will be unable to take on new or unusual tasks because of the perception
that we act as a group. They fail to see, however, that by remaining open to new
contributors, the group has an unlimited supply of innovative ideas. The
individuals who choose to pursue those ideas are the real driving force behind
In order to better serve the Apache community and provide legal protection
for the Apache projects, the Group has recently incorporated as a not-for-profit
corporation called The Apache Software Foundation. The new Foundation will
concentrate on the business side of Apache — responding to licensing requests,
overseeing user conferences, maintaining intellectual property rights, providing
support equipment and software, eventually hiring a systems administrator rather
than relying solely on volunteers, and providing an entity that can receive and
properly manage financial contributions to the projects. The process of managing
the HTTP Server Project will not be changed,since technical decision-making
within the project has always been separate from the business side of the Apache
Group. Other Open Source projects will soon be joining this umbrella
organization, each with their own project management committee, and share in the
benefits of a common infrastructure and support environment. Our hope is that The
Apache Software Foundation will become the catalyst for many collaborative
development projects in the future, with a potential for success equal to that of
the HTTP Server Project.
Roy T. Fielding is a doctoral candidate, co-founder of the Apache HTTP Server
Project, and a Director of the Apache Software Foundation. He can be reached at
|Feeling blue (chip): Brian Behlendorf was instrumental|
in bringing IBM
on board the Apache project.
By March of 1998, not only had Apache solidified its
position as the world’s most popular Web server, it had also started garnering a
fair bit of attention from the media — not just the “techie” press, but also
traditional media like The Wall Street Journal and ABC News. Despite all of this
attention, most software companies still seemed to be pretending Apache didn’t
even exist, which was funny, since many of them ran it on their own Web sites.
On a cold Thursday afternoon, I received an email from someone at IBM I’d
never conversed with before — not an unusual occurrence, as people were pitching
things to me all the time, but this one stood out for obvious reasons:
I started working with IBM several months ago and made a number of
suggestions concerning the use of Apache by IBM. One of those ideas has resulted
in a project that we would like to discuss with you. As a result of my
suggestions, a project was developed concerning the Apache Server. Since I
recommended that we talk to you concerning the implications of the project, they
have asked me to contact you and see if we could get an hour of your time for
some feedback concerning our ideas and their impact on the Apache
I said sure, and we met a few days later over Italian food. I still expected
at this point that IBM was going to ask about how to better support Apache in
their “Web Objects Manager” product, or whether they could distribute it on a CD,
or something else equally mundane.
I wasn’t prepared for the bombshell they were about to drop at that dinner
meeting. In short, paraphrased: “IBM would like to formally adopt Apache as its
Web server platform, across all its server offerings — both hardware and
software. We want to do this right. How do we do this?”
Now, I’d gone out to a number of companies, even large ones, extolling the
virtues of using Apache on their Web sites. I’d sat around playing “what if”
games with other engineers, trying to see if Apache represented something really
new and sustainable, or if it was just a fluke byproduct of a bunch of motivated
programmers attacking an interesting problem. We’d debated on the various Apache
lists the idea of formally incorporating — not because we wanted to go into
business together (a company with 18 CTOs just wouldn’t work), but more to
protect the intellectual property and brand in Apache if we ever needed to — but
movement on that front had stalled because no one saw it as necessary. But even
through all this, nobody anticipated any of the “big guys” ditching whatever
server efforts they may have had and adopting Apache as a platform.
The questioning began close to home. I left the meeting sounding positive,
but internally I was divided. The development processes we’d developed required
trust on many levels, trust enforced with accountability (through mailing lists)
and reversibility (through CVS). Our method of adding new developers was
informal. Most developers had some sort of financial motivation to be involved,
yet overall the main driver at that point was that working on and being involved
with Apache was “fun”. Could it still be “fun” with an 800-pound gorilla
involved? Could we reconcile the fact that up to this point, it had been a group
of individuals working together, yet this was a company asking to join us? Would
this mean we’d become essentially unpaid developers for someone else?
After a serious discussion with a couple of close advisers, and looking at a
number of Open Source resources just starting to emerge at the time, I came to
the conclusion that there had to be a way to do this; and if there wasn’t, it was
worth trying. If it didn’t work, the worst that could happen is that we’d press
the “reset” button and fork the code base, just as we had with the NCSA server.
I had also become convinced of IBM’s sincerity, and that they were one of the
ideal companies to be making a move like this. Their revenue base had shifted
from software to services over the last few years, and their whole “e-business”
push seemed to be much more about support, strategy and integration than about
packaged software. To them, it really doesn’t matter if the underlying Web server
is Apache or Netscape or Microsoft’s IIS — it just has to work. The revenue they
would earn from a proprietary server product would be very small compared to the
ongoing support and for which services they would charge money.
It was also clear that IBM had a lot of what Apache could use. They had an
abundance of developer talent, particularly those who had worked on “servers” of
one type or another for a long time. They had a huge patent portfolio which would
make for a good legal defense should Apache be attacked on patent grounds.
Finally, if any company could legitimize the use of non-proprietary software to
Fortune 100 companies, it was IBM.
After a confirmation e-mail, I was on my way to Raleigh, NC, for a meeting
with various IBMers about how this might work.
Again, IBM beat my expectations. I frankly didn’t know what to expect –
whether I’d be thrown in front of programmers fearing I was proposing taking away
their jobs, or marketers who would be trying to steal the Apache brand, or
lawyers who would be asking Apache developers to sign away our rights and ideas.
Instead, I met engineers who knew Apache pretty well already and who were eager
to demonstrate a caching module for the NT kernel that significantly accelerated
Apache on NT. I met marketers who were very sensitive to not only the Apache
Group’s domain over the Apache brand, but also to the Native American Apache
tribe. I also met lawyers who had not only read the Apache license and the GPL,
but were also clued into the distributed and volunteer nature of the project.
More specifically, the meeting focused on how the “Open Source” side of the
development model would mesh with the parts of their business that weren’t
completely open. For example, the cache accelerator. (IBM had some interest in
keeping that closed-source, since there were patents being used that they didn’t
want to give away for free, and they felt there was an advantage over Microsoft
that they were interested in maintaining.) I said that was fine with us, so long
as Apache on NT ran just as well without the commercial module, albeit more
slowly. We discussed how they could use the bug database we’d set up in Apache to
help deal with AIX-specific bugs. We talked about
testing…internationalization…support for their mainframe systems, like the
AS/400 and OS/390. I made clear that anything they saw as a business need they
were going to have to build themselves; they couldn’t rely on us to build it
for them. They completely understood that, even before I said it.
On the marketing side, they wanted to make a tremendous amount of noise about
this when the time was right. What was important to me and to the other
developers was that, above all, there be no perception of exclusivity or special
treatment of IBM by the Apache developers. IBM had to be an equal partner in
this; after all, they were setting the template by which I was hoping other
companies would follow suit with us and with other Open Source projects. This
even got down to the nitty gritty of marketing terms — they couldn’t call their
derivative product “IBM’s Apache”, or even “A Version of Apache produced by IBM”;
the most they could say was “…, based on Apache”, to show a clear distinction
between what was IBM’s and what was the Group’s.
Of course, one crucial component to the marketing side is whether this
endeavor was something the Apache Group could decide to endorse. That decision
wasn’t up to me or any single person — it had to be a decision made by the core
developers. It was also unclear what “endorse” meant — a joint press release?
Giving them CVS commit privileges?
It was time to bring this to the Apache Group. However, the “Group” had been
only very loosely defined over the course of its life; it varied from “those who
had CVS commit privs” to “those who had accounts on apache.org” to “those who
were on the apache-core discussion forum”. There were also some members who had
not been active in a long time. Thus, we needed to tighten up the definition of
what the “Group” meant and what its relationship to the larger development
community was. This was a significant event in the history of the Apache Group –
in some sort of strange derivative of the Schrodinger’s Cat paradox, the Apache
Group’s identity more fully emerged as there came to be a need for it.
It was clear, as well, that IBM wanted to be seen as a peer in this
community. To them, that meant being able to be a member of the Apache Group. Up
to this point, we had relied on a meritocratic process for bringing new
developers on, and in the opinion of many, it had served us well. In fact, in my
first meeting with IBM and in subsequent e-mails, I hammered home that the Apache
Group and IBM wouldn’t have any formal relationship until we’d seen a solid
string of contributions from IBM developers through the standard Apache
development forums. It took them a few months to ramp up to this point, as it
would take anyone; but finally, by June, there was a consensus that two
particular developers at IBM had contributed sufficiently in the public
development arenas, and they could be invited into the Apache Group.
Reaching that consensus wasn’t easy; it involved a fair bit of soul-searching
by members of the group as to what they all wanted to see out of the project.
Some reiterated that they were working on this only for fun, and anything that
made it less fun would cause them to leave. Others were concerned that the 800-lb
gorilla might turn the code base into an unrecognizable soup of bad ‘C’ code to
fulfill certain marketing obligations that would run counter to common sense or
good design. We had alwayspraised working, elegant code, over almost anything
else — including speed, functionality, and user-friendliness. Were these all
impossible to reconcile?
In the end, we decided to give it a try. We worked out a press release with
IBM’s PR department and went live with it June 22nd at a Web conference in San
Francisco. Throughout the press conference that followed, it was clear that IBM
“got it” when it came to working as a peer in this process, as even the execs
were clear that it was a joining of peers, and not a standard business “deal”.
Later on, IBM offered to host an Apache Group meeting in June in San
Francisco, as well as help pay the airfare for some developers to attend. This
was the first time many of us had met in person. We spent two days talking about
how we might officially incorporate the Apache Group and another two days
discussing architectural ideas for Apache 2.0.
Since that time, IBM has helped the project tremendously, first with a
significant number of enhancements and fixes for the Windows NT port of Apache,
and now with the multithreaded prototype for Apache 2.0. They have set the gold
standard for how to work as a peer within an OpenSource development project –
which is surprising, perhaps, given that a few years ago the world was ready to
write them off as a 1970s mainframe business culture incapable of tackling the
1990s Internet-frenzied business environment. Whether the accountants think the
Apache strategy has been monetarily successful for IBM, I can’t say; but judging
from the other announcements IBM has made since then with respect to Linux, I
think the Open Source community has found an extremely powerful ally.
Brian Behlendorf was one of the original members of the Apache team. He now
works for O’Reilly & Associates, designing the next generation of Open Source
development tools. He can be reached at