Learn how Linux manages open files and explore a number of utilities to watch files as they grow.
If you’re more of a Linux user than a programmer, you may not have given much thought to how the operating system handles files. As a user, you simply give a filename to a program (from the “Open” command on a menu, from the command line, or however) and the file is (hopefully) accessed however it’s supposed to be.
However, programmers who perform low-level access of files– for instance, to seek to a particular point in a file or rewinding the file to its start– have to understand more about Linux file handling. That’s where we’re headed this month: seeing how Linux handles files that have been opened by a program, and learning how you can take advantage of this even if you don’t usually write programs that handle files.
Along the way, we’ll look at tail –f, MultiTail, and less+F. These three programs can show you what’s happening to a file as it grows (as data is added to end of the file). They’re handy for viewing log files and monitoring a long-running process.
Let’s start with a quote from the Linux man page for open(), the low-level system call that programmers can use to open a file:
The open() system call is used to convert a pathname into a file descriptor (a small, non-negative integer for use in subsequent I/O as with read, write, etc.). When the call is successful, the file descriptor returned will be the lowest file descriptor not currently open for the process. This call creates a new open file, not shared with any other process. (But shared open files may arise via the fork() system call.) The new file descriptor is set to remain open across exec functions (see fcntl()). The file offset is set to the beginning of the file.
As in most man pages, there’s a lot of information packed into that paragraph!
When you give a file’s pathname (like /a/b/afile,../afile, or simply afile) to a program, the program opens the file to access its contents. A file can be opened for reading, for writing, or (in some cases) for both. When the Linux kernel opens a file, it returns a file descriptor– one of the numbers 3, 4, 5, and so on– which your program uses to refer to that file. The file stays open until the program closes it or until the process ends. (This rule– that a file stays open until it’s closed or the process ends– can be taken to extremes. See the sidebar “Removing Open Files?”)
Every process has at least three file descriptors assigned to it: the standard input (stdin) is file descriptor (fd) 0; the standard output (stdout) is fd 1, and standard error (stderr) is fd 2. So, if a program issues an open() call on file foo, and no other files were open so far, the contents of foo are accessible through fd 3.
After a file is opened, the kernel also tracks its file offset. This is the point in the file where the process is currently reading or writing. It’s kind of like putting a bookmark in a (paper) book to hold your place. Each time you read more and then stop reading for a while, you move the bookmark along toward the end of the book. Later, when you want to read some more, the bookmark has held your place. The file offset works the same way. You can move the file offset ahead by reading or writing data, or you can move it more furtively with a system call like seek().
Bourne-type shells– bash, for instance– let you open files and access them by their fd. We’ve seen this in the June 2004 column “Execution and Redirection,” available online at http://www.linux-mag.com/2004-06/power_01.html.
File This Under Linux
In Linux, basically all input and output (I/O) is done via “files”– streams of characters, accessed through a file descriptor with a file offset pointer– although many of those “files” are actually pipes, disk drives, and other character sources or sinks.
With that in mind, let’s look again at a redirected-input while loop from the May 2004 column, “Great Command-line Combinations,” which you can read at http://www.linux-mag.com/2004-05/power_01.html:
find /proj –type d –print |
while read dir