Someone recently asked me to help them install a Web server
on her Linux system. I simply pointed her Web browser at http:// localhost, and
up came a "Test Page" from Red Hat's Apache package, displaying the banner "It
|At Your Service: Most Linux distributions will install|
the Open Source Apache Web server by default.
Someone recently asked me to help them install a Web server
on her Linux system. I simply pointed her Web browser at http:// localhost, and
up came a “Test Page” from Red Hat’s Apache package, displaying the banner “It
The point is that most Linux distributions already include the Apache Web
server package and will install it either by default or through a simple
selection from the installation scripts. So if you have installed Linux, no
additional work is required to install a Web server; you’ve already done it.
It sometimes happens, however, that your distribution may not have included
the particular Web server you want to use. Or you may want to upgrade an existing
Web server to a newer version. In that case, you may have to install one of the
many Linux Web server packages using your distribution’s package management
facilities, or by grabbing a tarball of the Web server’s sources, compiling them,
and installing them manually.
There is not enough space to go into great detail on these alternatives.
Fortunately, even a manual package install is rather simple. To install a new Web
server package from an RPM file, you would use a command like:
Similar commands would be used on Debian and Slackware systems, which each
have their own package management commands.
In the most basic case, running a Web server under Linux is simply a matter
of providing some HTML files (some Web pages). In other words, you can just drop
your Web pages into the DocumentRoot directory. That’s usually located in /home/ httpd/html, and it’s always definedin the Web server’s
configuration files. Those configuration files are usually located in /etc/http/conf. You can generally find your DocumentRoot
using a command like:
grep -i documentroot \
Once you’ve found or re-defined your DocumentRoot, you can focus on composing
your content using any HTML or plain text editor. Netscape Communicator’s Page
Composer and AdvaSoft’s asWedit are a couple of popular HTML editors that run
under Linux. The HTML editors perform syntax checking and generally make it
easier for the novice to write valid HTML. Keep in mind that HTML is really just
normal text, so any text editor, such as emacs or vi (my personal favorite) work
just as well.
Creating content is the most important aspect of running any Web server,
regardless of what platform you run it on. Also keep in mind that good writing is
hard work. Many webmasters get so caught up in the latest Web fads, expending
much of their effort building and maintaining elaborate Web-based applications,
that the basics of well-written content are often forgotten.
Although I think that good Web sites should be based on solid foundations of
well-written content, most good sites also have at least some “active”
The traditional CGI (common gateway interface) and SSI (server-side includes)
mechanisms are the most well known and widely supported methods for generating
dynamic Web pages and for interacting with users, (generally through HTML forms).
There are many “canned” CGI scripts available on the Web. The best places to look
for them are at Matt’s Script Archive at http://www.worldwidemart.com/ scripts/
and, since most CGI programming is done in PERL, at the CPAN (the Comprehensive
PERL Archive Network of ftp and Web mirrors) at http://www.cpan.org or http:
In addition to the traditional SSI, Apache supports XSSI (an extended SSI),
PHP3 (“personal home page”), embPERL, and other server-driven scripting
|Where’s the Action? The browser can simply display the page from code executed on theserver (L), or the browser/Java virtual machine can execute and display a dynamic page (R).|
All of these server script interpreters work by reading source files, usually
a combination of HTML intermixed with programming statements. They execute the
embedded scripting commands and may insert additional HTML into the contents of
the file as it is sent to the browser, which has requested a given URL. The
resulting effect can be as simple as any of those ubiquitous “hit counters” and
as elaborate as anything that your Web browsercan possibly display.
The Apache Web server doesn’t include all of the different programming
languages by default, however. That would bloat the installation unnecessarily
since few Web masters use more than a couple of these languages in their Web
Also, Apache is modular — you can compile literally hundreds of modules into
your Web daemon. In addition, new versions of Apache support DSOs (dynamic shared
objects), which are conceptually similar to Linux loadable kernel modules, and
can be loaded by Apache without recompilation. This is far more convenient than
the old “rebuild” method, making Web server configuration much easier for those
users unfamiliar with source compilation. There’s even an Apache module that
checks “bad” URLs for common misspellings before returning one of those infamous
“404 File not Found” errors.
Unfortunately, space doesn’t allow me to give more than a superficial
overview of Apache. The canonical source for Apache information is the Apache Web
site at http://www.apache.org. Another good source of Apache information is
http://www.apacheweek.com, a weekly webzine of tutorials, features, and news on
the topic [See the Apache article]. Keep in mind that Apache is only
one of many Web daemon packages available for Linux and UNIX. Chances are good
that you can find one that meets your needs precisely.
Note that some forms of dynamically generated content don’t require any
support from Apache (or any other Web server). It’s possible, for example, for
simple shell and PERL scripts to generate new pages. You can launch these by cron and/or at (the standard UNIX
periodic and one-time command scheduling services), or invoke them by any process
you like. Programs like Mon, BigBrother and MRTG (the multi-router traffic
grapher) generate Web pages to help sysadmins monitor their servers and
Other forms of active content that don’t require Web daemon support include
client side of the Web connection, and the Web server simply transfers the code
to the browser just as it would any other data. Many animated pictures on the Web
rely on a feature of GIF (Graphics Interchange Format) images which allows a
single GIF file to contain multiple picture frames. Most GIF viewers, including
those in the common Web browsers, will cycle through the frames, which can be set
in an endless loop.
Access to most Web pages is anonymous. However, there are times when you’ll
wish to restrict access to selected pages on your site. Even some of the oldest
servers support a form of authentication called AuthType Basic. This involves
some editing of the httpd.conf files and the creation of
a .htpasswd file, which contains user names and password
hashes (similar to those in the /etc/ passwd file).
One big problem with AuthType Basic, or the use of simple HTML forms and CGI
scripts for custom written authentication is that it’s not secure. When users
provide passwords to their browser, the browser relays them in clear text over
the intervening networks. As a result, this form of authentication, like plain
old telnet and ftp, is vulnerable to sniffing.
It’s possible to solve this problem using SSL (secure socket layer). A few
SSL implementations exist for Apache. Raven from Covalent Technologies
(http://www.covalent.net/raven/ssl/) is one commercial module. C2Net Software’s
Stronghold (http://www.c2.net/products/sh2/) is a commercially enhanced version
of the Apache Web server that comes with integrated SSL support. Red Hat also
sells a Secure Web Server based on Apache. Alternatively, you can obtain
mod_ssl5, a free module available at http://www.engelschall.com/sw/mod_ssl/. Mod_ssl5
acts as an interface between Apache and Eric A. Young’s SSLeay libraries.
Using any of these, it’s possible to provide encrypted security between your
Web server and any browser that supports SSL. A less widely known feature of
SSLv3 is support for client-side certificates. So far these are a bit of a hassle
to set up so it’s not surprising that they aren’t widely used. The worst part of
SSL certificates, for the client or server, is dealing with the bureaucratic
details of purchasing or obtaining one from a CA (certificate authority).
In short, there are many ways to provided secure and user authenticated
(non-anonymous) access to your sensitive pages, and to protect the information
that your users and customers are sending to you via their Web forms.
Admittedly, it’s an oversimplification to say that Linux-based Web servers
are “ready to run, right out of the box.” Networking connectivity must be
considered. Although a Web server running on localhost will work just fine even
if no network connections exist, this doesn’t do anybody any good except for the
one person at the console.
Therefore, you first have to have a network connection established between
your system and the people to whom you wish to make Web pages available. This
task is accomplished automatically by the installation programs for every major
Linux distribution. Most commonly, hosts are assigned “static” (manually chosen)
IP addresses. Although you could certainly run a Web server on a “dynamically”
issued IP address (i.e., using DHCP, the Dynamic Host Configuration Protocol) it
would be pretty confusing to your users unless it was done in conjunction with
some form of dynamic DNS.
In any event, when you can access any systems on the LAN, then any of those
LAN systems can access any services on your system. So, you can use the systems
at your desktop to publish pages for others on the LAN to see. In small
workgroups, there is often no need to provide a dedicated machine for an intranet
Providing public access to our server is more involved. Many organizations
limit Internet access to their networks using packet filters, firewalls, and
proxies. This is generally done in the interests of security. Additionally, a growing number of sites use
“reserved private net” addresses (RFC1918) and IP masquerading (a form of network
address translation). These techniques reduce the need for multiple public IP
addresses, which are often subject to monthly charges when provided by ISPs.
These technologies also allow your client workstations to “see” the Internet
without allowing systems from “out there” to access your services.
For a publicly accessible Web server, you need to have a public IP address
(or you’d have to use some exotic form of “reverse proxy,” something way beyond
the scope of this column), and have to place your server on a network segment
which allows browser requests to be routed to you. In other words you have to put
it “outside the firewall” or “on the exposed segment”. The intricacies of
configuring firewalls and routing tables is an article all by itself.
Another issue beyond the scope of this piece relates to your server’s name.
Obviously you’ll want to have a valid DNS name in order to serve Web pages to the
public. It’s possible for your users to enter URLs with “hard coded” IP
addresses. However, you’ll hardly be the next Yahoo! or EBay if you force all of
your visitors to visit http://172.18.23.45/…
DNS services will usually not be provided by your own Web server. More often,
small sites will have their ISP provide and manage the DNS “namespace.” To be in
control of your own domain names you must obtain a “delegation” for your domain
from the organization that controls its parent domain. For those using the common
.com, .org, and .net domains, that means a visit to the InterNIC. Webmasters of
departmental servers might gain a subdomain delegation from the IT departments of
their organizations. International regions have their own domain authorities.
Fees and procedures for registering your nameservers in these domains differ.
To have a public Web server, you must have a dedicated, full-time connection
to the Internet. It just doesn’t make sense to have a site with “hours of
operation” since this is, after all, the World Wide Web — it’s always the middle
of the day somewhere. For homes and small offices, DSL and “Centrex ISDN” are the
most common sorts of low-cost dedicated connection. Some areas also have various
sorts of wireless/radio IP services available for reasonable rates. Although it’s
possible to run a Web server on any TCP/IP connection, even a little 28.8 or 14.4
modem/PPP line, it’s just not reasonable.
In many cases, it makes great economic sense to “outsource” your Web services
– to have your site virtual hosted or co-located.
In these cases you don’t bother with the hassles of “bringing the packets to
your site.” You just send your Web site to existing routes. Most ISPs provide
basic virtual hosting for very low prices. You can have any provider on the
Internet provide your virtual hosting, so you can use another ISP (or a work,
school, or “FreeNet” account) to get online and use that connection to upload
your content, upgrade your system, and otherwise maintain your Web site.
Of course if you have special needs — for example, you want to provide
access to more than a hundred megabytes of documents or images, or you need to do
secure transaction processing, or allow access to custom CGI programs — then
virtual hosting is not for you. Your virtual hosted Web site shares a system with
other users, possibly hundreds of them. In addition you have no control over what
operating system the ISP uses, how or when backups are done, etc.
Another option that still “sends the system to the packets” is to co-locate.
This is sending your computer system to your ISP with your OS and other software
pre-installed and configured. They, in turn, place it on their racks and plug it
in. Essentially you’re renting shelf space on their network. Typically they
provide power and an Ethernet connection. You are responsible for everything
With co-location you have considerably more control. You can usually install
any operating system and software you want and you aren’t sharing your processing
power and memory with any of your ISP’s other customers. However, co-location is
more expensive then virtual hosting, typically costing over $100 per month.
System stability is a major consideration when considering co-location. When
you virtual host, your ISP maintains the system, which they own. Most run Linux,
FreeBSD, Solaris or some other form of UNIX, which makes them relatively stable,
and most ISPs have competent system administrators on staff to monitor the
servers. However, when you co-locate, you become your own sysadmin. You will want
to set up your system to be as stable as possible because any system failure,
crash, or lockup may involve inconvenience and delay before you can gain physical
access to the system.
Another problem with co-location is the matter of providing backups. The
recommended approach is to have the co-located host “mirror” a local server,
getting all of its updates and content from there. If done properly, the “master”
can be shipped to the ISP when the “mirror” fails.
Luckily, Linux, like other forms of UNIX, is ideal for remotely located
systems. Extremely robust, Linux allows a sysadmin to perform almost any operation
remotely about as easily and reliably as from the console. Linux systems with
uptimes of over a year are fairly common and Linux systems can reliably and
automatically recover from most reboots and power interruptions.
This kind of stability, and the flexibility I discussed earlier, goes a long
way towards explaining why Linux is the premier choice for Web server platforms.
A recent survey (http://www.leb.net/hzo/ioscount) of Web servers in Europe found
that about 28% of all Web servers there are running Linux. That’s more than any
other OS, and the numbers are probably similar world-wide. Netcraft surveys
(http://www.netcraft.co.uk) show that Apache serves more Web pages than all
other Web server daemons combined.
Linux is obviously popular as a Web server platform, and, recent endorsement
by companies like Dell, IBM, and Oracle, along with the new features and
performance in the recently released 2.2 kernel, will undoubtedly accelerate
Linux’s already phenomenal growth.
Of course, your choice of Web server should not be dictated by popularity or
market share. You’ll want to make a decision based on your specific needs. Linux
and Apache were built to serve the needs of their developers. As they matured,
they gained market share by meeting the real needs of users — not through
advertising, marketing, bundling, or the propagation of FUD (Fear, Uncertainty,
and Doubt). If you require stability, power, flexibility, and economy in a
Web-server platform, you won’t go wrong by choosing the Linux/Apache
Jim Dennis has written articles for Sysadmin Magazine, Linux Journal,
and maintains the monthly “Answer Guy” Q & A feature at Linux Gazette.
He can be reached at firstname.lastname@example.org.