Linux has always made a great file server, but now a whole new breed of open source filesystems is taking the file server idea to the next level. We show you what they are and tell you how they work.
Way back in the day, if you wanted to share files between two computers, you copied them onto a floppy disk and walked them over to the other computer. This was known as “sneaker net.” What a difference a couple of years makes. Today, personal computers running Microsoft Windows and Apple’s MacOS inherently support sharing disks and directories with other systems of the same type. And of course, the ability to share disks, directories, and files over a network has always been a hallmark of Unix and Unix-like systems (such as Linux).
Unfortunately, all of that convenience does not come without a price. Sharing filesystems over networks, and managing updates to files and directories in those filesystems, becomes more complex when many computers and users can access those shared resources. Filesystems that can be shared over networks are more properly known as “distributed filesystems” because shared files and directories are available on (or distributed across) many different computers on a network. Most distributed filesystems are examples of client/server computing, where servers export files and directories and accept updates to them, while clients import files and directories and send updates.
The history of distributed filesystems on commercial Unix and Unix-like systems began with the proprietary Domain network filesystems introduced on Apollo workstations in the early 1980s. It continues through Sun MicroSystems’ NFS (Network Filesystems), introduced in the mid 1980s, and culminates in more sophisticated, higher-performance network-based filesystems such as IBM’s AFS, Carnegie Mellon University’s CODA, and Stelias Computing’s InterMezzo. NFS was the first open implementation of a networked filesystem; its specifications and protocols were publicly available, and NFS was therefore quickly supported on many different types of computer systems.

Distributed filesystems provide significant advantages to almost everyone who uses or manages computers. They enable users to access their data files in exactly the same way from different computers. If the machine on your desk fails, just use another; your files are still intact and safe on the centralized file server. Maximizing information sharing and availability on college campuses was the genesis of many of the projects that led to the filesystems described in this article.
For system administrators, storing important data on centralized file servers, rather than on individual desktop systems, simplifies administrative tasks such as backing up and restoring files. It also reduces hardware costs for desktop systems and centralizes standard system administration tasks such as creating/deactivating accounts, monitoring filesystems use, and so on. IS and IT managers who are responsible for enterprise computing services may already be using a distributed filesystems (specifically NFS or Samba), but there are many alternatives with better performance and more features.
Principles of Distributed Filesystems
Distributed filesystems must provide two basic features:
Transparent Access to User Files from Different Computers:This means that users are not aware of whether their files are local or located on a remote server. Storing files on centralized file servers frees users from having to access their files from specific machines. Users of well-designed distributed filesystems should be able to log in on any machine in their computing environment and access and update their files in exactly the same way as they would on any other machine.
Network-Oriented Authentication: If users can log in on any machine and access their files transparently from those machines, replicating authentication (password, group, etc) information on every machine quickly becomes an administrative nightmare. For this reason, networked file systems typically go hand-in-hand with a networked authentication mechanism; this supersedes traditional sources of login information such as /etc/password and /etc/group on Linux/Unix systems. Network-oriented authentication mechanisms such as NIS, NIS+, and Kerberos are designed to provide network-oriented authentication.
Another important feature of modern distributed filesystems is replication. This is the ability of distributed filesystems to provide online copies (replicas) of existing portions of the distributed filesystems. Should a file server that exports a primary portion of the distributed filesystems become unavailable, replicas are instantly and transparently available to users without interrupting their work.
Support for disconnected operation is related to replication. “Disconnected operation” is the term used to describe the ability to access and work on one’s files when not connected to a server. Local copies of files and directories are automatically resynchronized with the copies on the server when connectivity is restored. This is especially important to laptop computer users.
Now that we’ve talked about what distributed filesystems are, let’s take a look at some of the products that are available for Linux today (and are under active development). The Finding the Right Filesystem table on page 66 provides a high-level comparison of them based on common issues in distributed computing.
Implementing Distributed Filesystems — Locks And Callbacks
The machines participating in a distributed filesystem fall into two classes: servers, which deliver files and directories and accept updates to them, and clients, which request files and directories and deliver updates to them. Distributed filesystems use a variety of mechanisms to guarantee that any local data on clients is the same as the data on the remote file server. In the distributed filesystem biz, this is known as “consistency.”
The crudest way to do this is to lock a file on a server when a client
requests it. This means that only one client can check out a file at any given time. Only slightly less crude is separating these into read and write locks. Any number of clients can request a copy of a file to read it, but only one client can request a copy of the file to be able to write to it. This almost guarantees consistency problems (assuming that the client with the write lock updates the file while other clients are viewing read-only versions of the file). How are the clients with read-only versions of the file notified when the file changes?
Distributed filesystems (where clients can edit files while disconnected from a file server, and must therefore use copies of the files that they already have) have their own sets of problems. However, for distributed filesystems with a continuous network connection to a file server, a mechanism called a “callback” provides a great solution to this problem. A callback is essentially just a software promise made by a distributed file server to notify a client whenever changes are made to a file that the client has received from a file server. If the version of the file on the file server changes, the file server breaks the callback, telling the client that it must re-retrieve the file.
The callback mechanism ensures that clients always have the most up-to- date version of a file. As explained in the Your Cache Ain’t Nothing But Trash sidebar on page 68, making sure that file servers notice when clients have updated files depends on when clients write modified files back to a file server.
|
AFS
AFS (which used to stand for Andrew Filesystems but is now an acronym without an expansion) was originally a 1983 research project sponsored by IBM at Carnegie-Mellon University (CMU). AFS grew to support thousands of users at CMU before it was turned into a commercial product in 1989 by Transarc Corporation (which was later acquired by IBM).
Sets of administratively-related AFS clients and servers, known as AFS cells, are located all over the world and can easily be configured to intercommunicate, exchange Kerberos authentication information, and share data over the Internet. AFS file servers provide their own volume management services and formats using system support for logical volume management (LVM) whenever available. On systems using LVM, exported disk partitions can actually consist of several server partitions that can span multiple physical devices but are exported as a single entity. (Logical Volume Management for Linux is actively under development and is a default portion of kernel versions 2.3.49 and later. See this month’s Guru Guidance on page 78.)
AFS clients pioneered persistent client-side caching to increase performance, preserve state information across system restarts, and reduce client startup time. A persistent cache is able to survive across reboots of a client. For more information about caching in distributed filesystems, see the Your Cache Ain’t Nothing But Trash sidebar on page 68.
In early September 2000, IBM announced that it was releasing the AFS source code to the open source community using the IPL (IBM Public License, which is essentially IBM’s version of the GNU Public License GPL, but protects IBM’s patents). IBM will still sell and support an official version of AFS, but the open source version will remain open source and free.
An AFS Advisory Board consisting of IBM personnel, academics, and customer representatives will consider new additions to the supported IBM version of AFS, as well as consider migrating portions of code back into the official version from the open source version. One of the biggest oversights in AFS has always been its lack of support for disconnected operations. Hopefully, this omission will soon be properly addressed by one of the new versions of AFS.
CODA
Coda is a distributed filesystem with its origin in AFS v2. It has been under development at Carnegie Mellon University since 1987. Coda shows its AFS roots in its support for persistent, client-side caching that helps minimize client restart times and server replication, while providing for effective scalability. Unlike AFS, Coda provides support for disconnected operations for mobile computing. As an academic project, the source code and binaries for Coda are freely available under a liberal license (which is primarily GPL), but portions (the libraries) are Lesser GPL (LGPL).
GFS
The Global Filesystem (GFS) is a shared-disk filesystem for clusters of Linux systems. The nodes in a GFS cluster share the same storage devices, accessing those devices at the block-level rather than at the file and directory level (as with most of the other filesystems discussed in this article). For this reason, GFS requires using low-level disk drivers rather than file-level client and server daemons. The binaries for several disk drivers are available at the GFS download URL (See Web Links, pg. 70). A GFS filesystem appears to be local to each node, with file access across the cluster synchronized by GFS.
GFS is an Open Source 64 bit SAN (Storage Area Network) filesystem with an impressive list of industry sponsors (including Veritas Software, StorageTek, NASA Ames Research Center Mass Storage Systems Group, Seagate Technologies, and EMC). SAN solutions are a hot topic in the centralized storage and distributed filesystems world because their block-level disk access makes them inherently processor-type and system independent. All actual disk access is done by the low-level drivers, which simply fetch and store data and therefore have no assumptions about data types and organization.
InterMezzo
InterMezzo is a new distributed filesystem with a focus on high availability, flexible replication of directories, disconnected operation, and a persistent cache. InterMezzo is an open source project, currently available for Linux kernel versions 2.2 and 2.3.
InterMezzo was inspired by CMU’s Coda, but is not based on the Coda source code. One of the founders of Stelias Computing (the company that oversees the active development of InterMezzo) was the head of the Coda project at CMU for several years.
On both clients and servers, InterMezzo uses an underlying journaled filesystem that stores information about local filesystem changes in a log stored within an ext2fs filesystem. InterMezzo provides a set of wrapper functions that enable it to interoperate with other journaled filesystems (such as SGI’s XFS or IBM’s JFS).
The TurboLinux, Red Hat, Mandrake, and SuSE Linux distributions have all expressed interest in InterMezzo, but are not, at this point, active users of this new distributed filesystem.
NFS
NFS is a distributed filesystem that provides transparent access to files residing on remote disks. Developed at Sun Microsystems in the early 1980s, the NFS protocol has since been revised and enhanced a number of times. It is available on all Unix systems and even for Microsoft Windows. The specifications for NFS have been publicly available since its inception, which is what allowed NFS to become the de facto standard distributed filesystem.
Despite its use everywhere, NFS has some problems. First, it is stateless. This means that clients of NFS servers do not maintain a cache that provides useful information across reboots of a client. Next, NFS performance is average at best. Recent versions of Linux (using kernel versions 2.2 or later) have provided a knfsd (an NFS daemon that largely runs within the Linux kernel). This provides enhanced performance over standard NFS daemons. Finally, NFS has primitive locking mechanisms that are not very useful if many client applications are simultaneously accessing data files.
NFS is unquestionably the most popular distributed filesystem around, because it works with almost every operating system, is easy to set up and administer, and provides a better solution than the alternative of not sharing files.
Your Cache Ain’t Nothing But Trash
Actually, we hope not. A cache is the portion of the client’s local disk that contains copies of the files and directories that the client has requested from file servers. Whenever an application running on the client requests a file that is actually located on a remote file server, the request first passes through the cache. If the file is not already located in the cache, or if it is there but the callback to the file server has been broken (see the previous sidebar), the client requests the file from the server again. A new version of the file is stored in the cache, and a pointer to it is returned to the application that requested it.
Caching improves the speed and efficiency of accessing files that are actually located on a distributed file server. If multiple applications or users repeatedly ask for data from the same file, that file is already in the cache on the local disk. This reduces both the load on the network and the time it takes to satisfy users’ requests for that file. Whenever a user modifies the cached file or directory, the callback between the client and the file server is broken, and the file server retrieves the updated version from the client that made the changes. The server then breaks the callback between itself and any other clients that have older cached copies of the file. (Warning: this is an oversimplification meant to introduce you to the notion of how caching can improve performance and is not designed to replace an upper-level course in computer science!) Figure Three shows the same thing as Figure Two (pg. 64), but with a cache added appropriately.
Another way in which caching data from file servers can improve performance is by preserving and re-using the cache when you restart your system. The theory here is that users of a client will probably resume working on the same files that they were previously working on. If those files are still in the cache and haven’t changed on the file server (i.e., the callbacks can be re-established), then the client can use the contents of the cache without retrieving the files from the file server. This is known as a “persistent” cache, not because it’s irritating or obnoxious, but because its contents persist and are often reusable across system reboots. In the absolute worst case, your cache ain’t nothing but trash, and every file in the cache must be discarded. In this case, the client doesn’t retrieve anything, but simply discards the current contents of the cache and proceeds normally, retrieving and caching any files that users of the client request.
NFS uses a cache, but its cache is designed to allow client systems to continue while files are being saved back to the file server. An NFS cache does not persist across reboots. Distributed filesystems such as AFS, CODA, and InterMezzo make heavy use of caching to improve performance when they can communicate directly with a file server. When CODA or InterMezzo is running disconnected from a file server, it depends on the data in the cache. Once it can reconnect to the file server, it resynchronizes the cache with the data on it. This topic is extremely complex. See Web Links (pg. 70) for more information.
|
Summary And Conclusions
Distributed filesystems are worth their weight in gold to both users and system administrators. NFS is still the most popular distributed filesystem available today, because it is free and is delivered with most types of systems. However, the variety of distributed filesystems available for Linux under the GPL or similar open-source licenses is about to change all of that. Being able to install and run AFS, CODA, or InterMezzo on your Linux system can provide substantial performance and usability improvements over NFS. Thanks to Linux and the open source movement, you no longer have to accept the limitations of NFS, using it simply because it is there.
Managing more advanced distributed filesystems than NFS requires some administrative overhead. For example, installing and using AFS requires learning AFS-specific commands and backup procedures. All distributed filesystems, even NFS, require some administrative configuration on both clients and servers. All of the advanced distributed filesystems discussed in this article require installing kernel patches or compiling and installing loadable kernel modules, which is not necessarily for the faint-of-heart or the novice administrator. As binary RPMs for these filesystems become available, installing and using modern distributed filesystems will become even easier.
Finding the Right Filesystem
| Filesystem | Supported Platforms | Security System | Scalability | Client Cache Location | Persistent Cache | Availability | Licensing |
| AFS | All Unix Platforms, Red Hat Linux, Windows 95, 98, NT, and 2000 | Kerberos | Excellent | local disk directory | Yes | Current | IBM Public License |
| CODA | Linux, NetBSD, FreeBSD, Windows 95, Windows NT | Kerberos | Yes | local disk directories or partitions | Yes | Current | GPL, Lesser GPL |
| GFS | Linux | Any | Yes | N/A | N/A | Current | GPL |
| InterMezzo | Linux Kernels v2.2 or better | NONE | Yes | local disk directories or partitions | Yes | Current | GPL |
| NFS | All Unix Platforms and Linux distributions, Windows 95, 98, NT, and 2000 | yp/NIS, NIS+ | Acceptable | local disk directory | No | Current | Linux versions GPL, others proprietary but w/ open specs
|
|
AFS
Coda
GFS
InterMezzo
NFS for Linux (NFS and KNFS)
|
Installing And Configuring an InterMezzo Server And Client
Using Stelias Computing’s InterMezzo distributed file system as an example, this sidebar explains how to actually get a modern distributed filesystem up and running. We selected InterMezzo for a variety of reasons. It is available under the GPL, its lento cache manager is a Perl program, it is simpler to configure than most distributed filesystems, and it is more actively under development than some of the other distributed filesystems discussed in this article. This HOWTO sidebar uses Stelias’ 0.02 software release as an example, but the 0.9 release of InterMezzo is nearing final release candidacy.
Requirements
You’ll need at least two computers running Linux, one to act as an InterMezzo file server and at least one to act as a client of that server. When using the 0.02 release of InterMezzo, these should all be uni-processor systems; even multi-processor systems running in single-processor mode may use untested code paths through the kernel (as we learned).
You must have the Linux kernel sources installed in /usr/src/linux (which can be a symlink, as it normally is) on each machine where you will install InterMezzo. The kernel configuration file (.config) in that directory on each machine must be correctly configured to match the kernel that is running on that system. The kernel shipped with Red Hat 6.2 was release 2.2.14-5.0, which we recommend using if you want to guarantee that things work for you exactly as described in this HOWTO.
For optimal experimentation with InterMezzo, you’ll need to have a spare partition on each client and server. The examples later in this HOWTO explain how to experiment with InterMezzo even if you don’t have a spare partition available; but, it would be better if you did.
If you don’t have a spare partition on your server and client(s) that you can dedicate to InterMezzo, the kernel running on each of your systems must be compiled with support for the “loopback device.” This is selectable using the Block Devices -> Loopback Device Support option from make menuconfig or make xconfig. I prefer to build this into the kernel rather than making it a loadable module. If you need to rebuild your kernel to support this device, you’ll need to run make dep and then make install to rebuild and install the new kernel.
STEP 1: Get the Source, Luke! Retrieve the Following Packages:
The first three are fixes and/or enhancements to Perl itself; the fourth is the source code for Intermezzo 0.02.
STEP 2: Build and Install InterMezzo and Perl Enhancements
Make a directory somewhere; make that directory your working directory and extract the contents of each of these using a command like:
gzip -cd filename | tar xvf ->
Use the su command to become the root user on your system. Next, cd to each of the directories containing the Perl enhancements. Make and install the software using the following commands:
perl Makefile.PL
make install
Finally, cd to the intermezzo directory (created for you when you unpacked the file intermezzo-0.02-0.tgz). Make and install the software using the commands:
STEP 3: Configure Your System for InterMezzo
Modify your system’s kernel module configuration using the following commands:
echo “alias char-major-185 presto” >> /etc/conf.modules
echo “alias InterMezzo presto” >> /etc/conf.modules
Next, configure your system’s module dependencies correctly:
STEP 4: Reboot
If you have rebuilt your kernel to add loopback support, you’ll need to reboot. You’ll also want to verify that you haven’t accidentally broken anything (such as any other aspect of network support) while building the new kernel. In general, rebooting after reorganizing module dependencies and configuration is a good idea.
STEP 5: Create the InterMezzo Configuration Files on Each System
After rebooting, log in as root. Create the directory /etc/intermezzo. You will create three files in this directory on your server and on each client.
This file contains a single line that lists some information – the name of the system, the name of the device used for InterMezzo, and the system’s IP address. For example, on my server, distfs, this line looks like:
distfs /dev/intermezzo0 192.168.6.166
You’ll get the name of the InterMezzo device from the /etc/fstab entry that you’ll create later in this section.
This file is a database of key/value pairs (“hashes” in Perl speak) that identify the InterMezzo file servers in your environment. In my case, this is:
{
distfs => {
ipaddr => “192.168.6.166″
}
};
/etc/intermezzo/voldb
Like the serverdb file, this file is a database of key/value pairs. The pairs identify the servers and clients that replicate files and folders from the InterMezzo device on the server (and are therefore known as “replicators”). In a one client (test), one server (distfs) setup, where the name of the shared volume is shared, this would look like the code at left. If you’re adding multiple clients, the name of each client must be enclosed within quotation marks and separated from the next by a comma.
{
shared => {
servername => “distfs”,
replicators => [ "test" ]
}
};
Linux Magazine /
November 2000 / FEATURES
Distributed Filesystems for Linux