The Linux File Access Primitives

One of the most important abstractions of the POSIX API is the file. While nearly all operating systems provide files for permanent storage, all versions of UNIX provide access to most system resources through the file abstraction.

One of the most important abstractions of the POSIX API is
the file. While nearly all operating systems provide files for permanent storage,
all versions of UNIX provide access to most system resources through the file

More concretely, this means that Linux uses the same set of system calls to
provide access to devices (such as floppy disks and tape devices), networking
resources (most commonly TCP/IP connections), system terminals, and even kernel
status information. Thanks to their ubiquity, fluency in file-related system
calls is important for every Linux programmer. Let’s examine the basic concepts
behind the file API and describe the most important file related system

Linux provides many different kinds of files. The most common type is simply
called a regular file, which stores hunks of information for later access. The
vast majority of files you work with; such as executables (e.g.,
/bin/vi), data files (e.g., /etc/ passwd), and system libraries
(e.g., /lib/libc.so.6); are all regular files. Usually these reside
somewhere on disk, but that may not necessarily be the case (as we’ll see

Another type of file is the directory, which contains a list of other files
and their locations. When you use the ls command to list the files in a
directory, it opens the file for that directory and prints out information on all
of the files mentioned in it.

Other files include block devices (which represent filesystem-cached devices
such as hard drives), character devices (which represent uncached devices like
tape drives, mice, and system terminals), pipes and sockets (which allow
processes to talk to one another), and symbolic links (which allow files to be
given more than one name in the directory hierarchy).

Most files have one or more symbolic names which refer to them. These
symbolic names are a set of strings delimited by the / character, and
identify the file to the kernel. These are the pathnames with which all Linux
users are quite familiar; for example, the pathname
/home/ewt/articlerefers to the file that contains the text of this
article on my laptop. No two files share the same name (a single file can have
more then one name, however), so a pathname uniquely identifies a single

Each file a process has access to is identified by a small nonnegative
integer, called a “file descriptor”. File descriptors are created by system calls
which open files and are inherited by new processes which are forked off from the
current process. That is, when a process starts a new program, the original
process’s open files are normally inherited by the new program.

By convention, most programs reserve the first three file descriptors (0, 1,
and 2) for a special purposes — access to the so-called standard input, standard
output, and standard error streams. File descriptor 0 is standard input, where
many programs expect to receive input from the outside world. File descriptor 1
is standard output. Most programs display normal output there. For output related
to error conditions, file descriptor 2 (standard error) is used.

Anyone comfortable with Linux shells has seen the use of the standard in,
out, and error file descriptors. Normally, the shell runs commands with file
descriptors 0, 1, and 2 all referring to the shell’s terminal. When the
> character is used to instruct the shell to send a program’s output
to another file, the shell opens that file as file descriptor 1 before invoking
the new program. This causes the program to send its output to the given file
rather than the user’s terminal — the beauty is that this is transparent to the
program itself!

Similarly, the < character instructs the shell to use a
particular file as file descriptor 0. This forces the program to read its input
from that file — in both cases, any errors from the program will still appear on
the terminal, as those are sent to standard error on file descriptor 2. (Under
the “bash” shell, you can redirect standard error using 2> rather
than >.) This type of file redirection is one of the most powerful
features of the Linux command line.

Before using any file-related system calls, programs should include
<fcntl.h> and <unistd.h>; these provide the
function prototypes and constants for the most common file routines. In the
example code below, we’ll assume that each program begins with

#include <fcntl.h>

#include <unistd.h>

First, let’s look at how to read and write from a file. Intuitively enough,
the read() and write() system calls are the most common ways of
doing this. Both system calls expect three arguments: the file descriptor to
access, a pointer to the information to read or write, and the number of
characters which should be read or written. The number of characters which were
successfully read or written is returned. Figure 1 illustrates a simple program
which reads a line from standard input (file descriptor 0) and writesit to
standard output (file descriptor 1).

Figure 1

void main(void) {
char buf[100];
int num;

num = read(0, buf, sizeof(buf));
write(1, “I got: “, 7); /* Length of “I got: ” is 7! */
write(1, buf, num);

There are two things worth noting about this process. First, we asked
read() to return 100 characters, but if we run this program, we only get
input until the user presses the “enter” key. Many file operations work on a
best-effort basis: they try to return all of the information the program asks
for, but these may succeed only partially. By default, the terminal is configured
to return from a read() call as soon as a \n is available
(which is generated by pressing the “enter” key). This is actually quite
convenient, since most users expect programs to be line-oriented anyway. Regular
data files don’t show this kind of behavior, however, and relying on it may cause
unexpected results.

The other thing to notice is that we didn’t have to write a \n after
displaying our output. The read() call gave us the \n from the
user, and we just write() that \n back to standard out. If
you’d like to see what happens without that newline, try changing the last line

write(1, buf, num – 1);

One last point about this simple example — at no point does buf
contain an actual C string! C strings are terminated by a single \0
character which marks the end of the string. As read() doesn’t add a
\0 to the end of the buffer, using strlen() (or any other C
string function) on a read() buffer would be a big mistake! This
behavior allows read() and write() to manipulate data which
includes \0 characters, which is an impossibility for normal string

The read() and write() system calls work on the vast
majority of files. They don’t work on directories, which should be accessed
through special functions such as readdir(). Also, read() and
write() don’t work for certain types of sockets.

Some files, such as regular files and block device files, use the concept of
a file pointer. It specifies where in the file the next read() call will
read from, and where the next write() call will write to. After a
read() or a write(), the file pointer is advanced (internally,
by the kernel) by the number of characters which were processed. This makes it
easy to read all of the data in a file with a simple loop [See Figure 2].

Figure 2

char buffer[1024];

while ((num = read(0, buffer, 1024))) {
printf(“got some data\n”);

This loop will read all of the data on standard in, automatically advancing
the kernel’s internal file pointer after every read. When the file pointer is at
the end of the file, the read() will return 0 and the loop will exit.
Some files (such as character devices — the terminal is a good example) don’t
have a file pointer per se, so on them this program will continue running until
the user provides an end of file marker (by pressing “Ctrl-D”).

Now that we’ve seen how to read and write from a file, the next thing to
learn is how to open a new file. There are different ways of opening different
types of files; the only one we’ll discuss here is opening files that are
represented in the filesystem through a pathname, including regular files,
directories, device files, and named pipes. While some socket files have path
names, those must be opened through an alternate method.

Disclaimers aside, the open() system call allows programs to access
most system files. open() is an unusual system call as it takes either
two or three arguments:

int open(const char *
int flags);


int open(const char *
int flags,
int perm);

The first form is more common; it opens a file which already exists. The
second form should be used when the file may need to be created. The third
argument specifies the access permissions that the new file should be given.

The first parameter to open() is the full path name as a normal C
string (that is, terminated with a \0). The second parameter specifies
how the file should be opened, and is one or more of the following flags
logically ORd together:

O_RDONLY: The file may only be read

O_RDWR: The file may be read from or written to

O_APPEND: The file may be read, or appended to

O_CREAT: If the file does not already exist, it should be created

O_EXCL: If the file already exists, fail rather then create it (should
only be used with O_CREAT)

O_TRUNC: If the file already exists, remove all data from it(this is
similar to creating a new file)

The third parameter to open() is needed only when O_CREAT
is used; it specifies the file permissions as a number, which is the same format
as the numeric permissions argument to the chown command. The
permissions specified to open() are affected by the user’s
umask, which allows the user to specify a set of default permissions
that all new files should obtain. Most programs creating files call
open() with a third argument of 0666, enabling the user to
control their default permissions through the umask (the umask
command of most shells can change this).

Figure 3

 int fd;

fd = open(“myfile”, O_RDWR | O_CREAT | O_TRUNC, 0666)

if (fd < 0) {

/* Some error occurred */

/* … */


For example, Figure 3 shows how to open a file for reading and writing,
creating it if it doesn’t exist, and discarding any data which is in it if it

open() returns a file descriptor which references the file. Recall
that file descriptors are always >= 0; if open() returns a
negative value an error occurred and the global variable errno contains
the UNIX error code describing the problem. open() will always return
the smallest number that it can; for example, if file descriptor 0 is not being
used, open() will always return 0.

When a process is finished with a file, it should close it through the
close() system call, which takes the form:

int close(int fd);

The file descriptor to close is the only argument to close(), and it
returns 0 on success. While it may seem odd for close() to fail, if the
file descriptor refers to a file on a remote server, say, and the system cannot
properly flush its caches,close() can actually fail. When a process
terminates, the kernel automatically closes any files left open.

The final common file operation is moving the file pointer. This only makes
sense for files with file pointers (naturally), and attempting this on
inappropriate files will return an error. The lseek() system call is
used for this purpose:

off_t lseek(int fd,
off_t pos,
int whence);

The off_t type is a fancy way of saying long int (long is
where the “l” in lseek comes from). lseek() returns the final
position of the file’s file pointer relative to the start of the file, or -1 if
there was an error. This system call expects the file descriptor whose file
pointer is being moved as the first argument, and the position in the file to
move it to as the second. The last argument describes how the file pointer is

SEEK_SET moves it to pos bytes from the beginning of the file

SEEK_END moves it to pos bytes from the end of the file

SEEK_CUR moves it pos bytes toward the end of the file from its
current position

The combination of open(), close(), write(), read(),
andlseek() provides the basic file access API for Linux. While there are
numerous other functions which manipulate files, those described here are used
most of the time.

Most programmers use the familiar ANSI C library file functions (such as
fopen() and fread()), rather than the lower-level system calls
described here. fopen() and fread() are, as you would expect,
implemented on top of these system calls in a user-level library. Still, it’s not
uncommon to see usage of the low-level system calls, especially in more complex
programs. By familiarizing yourself with these routines and interfaces you’ll be
on your way to becoming a true UNIX hacker.

Erik Troan is a developer for Red Hat Software and co-author of the book Linux
Application Development. He can be reached at ewt@redhat.com.

Comments are closed.