Spring Cleaning, Geek Style

How to clean the clutter that consumes your computer.

If you’re like me, you have Linux systems that have been in continuous operation for years. To be sure, you’ve probably rebooted innumerable times, performed system upgrades, replaced hard drives, and so on, but certain key directories might have been created many years ago. In fact, this is one of Linux’s strong points: the operating system is robust enough that it can work for years on end without a fresh re-installation.

The trouble with this longevity is that the operating system tends to collect junk, such as temporary files that hang around like perpetual unwanted house guests, log files that grow bigger than dinosaurs, packages that you installed and then never used, and so on.

This month’s column is dedicated to cleaning the clutter from your computer. The techniques shown apply to system files; keeping your home directory clean is a chore left to you.

Why Clean House?

You might wonder, though, what the fuss is about. Who cares if your /tmp directory has a few files that were created when Bill Clinton was President, or if you’ve got so many fonts installed that a complete printed set of font samples would be thicker than an unabridged dictionary? After all, with modern hard disks routinely topping 100 GB, you don’t need to worry about disk space, right?

In fact, excesses can cause a number of problems:

*Wasted disk space. Even with huge hard disks, wasted disk space can be a problem, particularly if the space wasted is in multimedia files or other big space consumers. Wasted disk space can also be a problem if you’ve split your system into multiple small partitions; too many useless files on a small partition can become a problem even if you’ve got lots of space on another partition. Using logical volume management (LVM), as described in the April 2006 “Guru Guidance,” can help you work around such problems.

*File access times. Every file, no matter its size, requires a directory entry, and to access files, Linux must sift through these entries. If a large enough number of useless files collect, system performance drops whenever Linux tries to access other files in the same directory as the useless ones.

*Locating files. If you need to locate a file, your task will be complicated by the presence of useless files. Reading dozens of filenames can take time. If you use tools such as grep to search files’ contents, the search will be slower if the software has to read through hundreds of irrelevant files.

*Security. Unnecessary files, and particularly unnecessary program files, can pose a security risk. If your system shouldn’t be running, say, a File Transfer Protocol (FTP) server, it’s far safer to not have any FTP software installed on your system than to have it installed but not used. Installed software might be accidentally started when it shouldn’t be. Even non-server programs can pose a security risk if they’re run by root or have security-related bugs.

For these reasons, then, you should periodically clean house. Doing so keeps your system performing well and makes it easier for you to do your work.

How Do Unwanted Files Accumulate?

Before you go digging into your directory tree to locate unwanted files, you should have some understanding of how these files are created. A little knowledge here helps you efficiently locate the files, and may be helpful in reducing the buildup of these files in the future.

Several sources of such files are common:

*User programs. The /tmp and /var/tmp directories are home to miscellaneous temporary files. These directories are world-writable and are intended as places in which user programs may create files that shouldn’t stick around forever. Some programs even create entire directory trees in temporary directories. A wide variety of user programs make use of temporary directories. Sometimes a filename gives a clue as to what program created the file, but sometimes the files have bizarre names that aren’t helpful in identifying the creator.

*Spool files. Certain servers and other system programs create temporary files in subdirectories of /var/spool. Examples include the Linux printing system (most often CUPS), the mail server, and Samba– but don’t consider this list complete. In most cases, the program that created a spool file deletes it when it’s no longer necessary. Sometimes this doesn’t happen when it should, though, resulting in files cluttering the spool directory. On the other hand, some spool files have legitimate long lifetimes. Users might intentionally leave mail in their inboxes, for instance, resulting in mail spool files with long lifetimes.

*Log files. Files in /var/log hold records of server, kernel, and other system activities. Log files are normally rotated and eventually deleted by cron jobs, but if these are misconfigured, some log files may grow to ridiculous sizes.

*Kernels. If you upgrade or rebuild your kernel, the kernel files themselves can become a problem. This issue can be particularly important if you use a small /boot partition.

*Accidents. Sometimes a typo or other accident can cause detritus to accumulate. You might accidentally move a file into one directory when you’d intended to move it into another one. If you don’t immediately detect and correct the problem, the file can hang around in the wrong directory for a very long time. Keeping your working time as root to a minimum can help you avoid this problem, or at least keep the misplaced files within your home directory.

*Package system files. Some package tools, such as the Advanced Package Tools (APT), Yellow Dog Updater, Modified (Yum), and Portage, download package files and store them in subdirectories of /var. (Portage also uses /usr/portage/distfiles for this purpose.) These systems don’t usually clean up after themselves, though, the reason being that you might delete a package and then want to re-install it, at which point having the original package on disk is helpful. Over time, though, having too many packages on the system can become a problem. You can generally use the package system itself to clean up its package files, but sometimes manual cleaning is necessary.

*Installed packages. Useless package files can be a problem, but unwanted installed packages can be at least as bad. Unused packages can accumulate in just about any directory tree. They’re typically installed by automated or semi-automated tools. For instance, after upgrading one of my systems to Fedora Core 5, I discovered over 800 MB of OpenOffice.org packages (mostly foreign language dictionaries) that I would never be likely to use. Unfortunately, some packages that provide features you don’t want may be flagged as dependencies of packages you do need, so removing them is sometimes risky or impossible.

This list can help you track down files you might want to delete. Note, for instance, that the /tmp and /var directories are popular locations for the accumulation of junk. Few programs store temporary files in /usr, /opt, /etc, or most other directories, so these programs aren’t likely to create detritus in these locations. Unwanted packages are an exception to this rule.

Locating Temporary Files to be Deleted

You shouldn’t just delete anything you don’t recognize in /tmp or /var, though. These directories hold many legitimate files, and possibly even large legitimate files. A few tips and techniques will help you identify those files that you can safely delete.

The first tool for tracking down unwanted files is du. This command displays the disk space used in particular directories. You can pass it the –s option to have du display information on just the directories its given. If you pass a wildcard, the result summartizes the disk usage of each file or subdirectory in a directory, as shown in Listing One.

Listing One: The du program may be used to learn where disk space is being consumed

# du -s /var/*
107774 /var/cache
146827 /var/db
0 /var/empty
12 /var/games
40 /var/gdm
27144 /var/lib
0 /var/lock
10170 /var/log
0 /var/mail
65 /var/run
12 /var/spool
0 /var/state
63698 /var/tmp
1446 /var/www

The du program can take a while to run if the directory tree you specify has lots of files. Listing One indicates that /var/db, /var/cache, and /var/tmp consume the most space within the /var directory tree. Although a directory can be filled with many small files and consume less space than a directory with just one or two big ones, tracking disk use is a good way to start hunting for files that are worth deleting.

The system that produced the output shown in Listing One has legitimate long-term uses for the files in /var/db and /var/cache. That leaves /var/tmp for cleanup. Further tests showed that much of the space in that directory tree was being consumed by /var/tmp/portage/xsane-0.991. This directory held an incomplete Gentoo Portage build of the XSane 0.991 package. Because the package didn’t compile, Gentoo’s emerge program left all the package files on the disk without deleting them. Although XSane is a relatively small program, if this had happened with a really big program, such as Mozilla or X.org-X11, the wasted disk space could become a problem. In this case, deleting the files poses no problem; at worst, Portage will have to uncompress the files again. In fact, deleting all the directories in /var/tmp/portage is a safe action, assuming no package updates are in progress.

Identifying files that have been left around can sometimes be a problem. What program created /tmp/03a0hf4j.bin? What does this file contain? Can it safely be deleted? You’ll need to do some detective work to answer these questions.

One clue, of course, lies in the files’ names. Filename extensions often provide clues: .pdf denotes Portable Document Format (PDF) files; .gz indicates files compressed with gzip; .rpm is used by RPM files, and so on. The .bin extension in /tmp/03a0hf4j.bin is not very helpful, though, and the part of the filename before the extension is even less informative.

In cases such as this, you might want to use the file utility, which looks for “magic numbers” — byte sequences that are used almost exclusively by particular file types — in the file. If file finds a magic number, it displays whatever information it can about the file in question:

$ file /tmp/03a0hf4j.bin
/tmp/03a0hf4j.bin: gzip compressed data, 
 from Unix, last modified: Thu May 11
 15:41:22 2006

This output indicates that the file is probably a gzip- compressed file, despite its filename extension. Ordinarily, gzip only works on files with certain extensions, and .bin isn’t one of them. Thus, you should copy or rename the file to uncompress it. Alternatively, you can use pipes and redirection to get around the limit:

$ cat /tmp/03a0hf4j.bin | gunzip - > \

This command creates a new file called 03a0hf4j.bin.unzip, which is an uncompressed version of the original. You can then use file on it:

$ file 03a0hf4j.bin.unzip
03a0hf4j.bin.unzip: POSIX tar archive

At this point, you can view the tarball’s directory or even uncompress it to fully identify it and decide whether or not to delete it. This example held software; the file was probably left behind by a Web browser after downloading the software from a Web site.

Removing Unnecessary Packages

Unnecessary packages can consume huge amounts of disk space. Worse, they pose a security risk, as described earlier. Thus, you should take care to eliminate unnecessary packages from your computer.

One of the best tools for accomplishing this goal is a GUI package browser, such as YaST2 for SuSE, Synaptic for Debian and other systems running APT, Yumex (shown in Figure One) for Fedora and other systems running Yum, and Kuroo for Gentoo. Using these tools, you can browse a list of installed packages, reading package descriptions and marking packages for deletion.

Figure One: Yumex is a typical GUI front-end to a package manager

As an example, consider deleting packages using Yumex:

1.Select the Remove icon in the left pane of the main window. After a few seconds, a list of packages appears.

2.Select one or more packages for removal by clicking the boxes to the left of their filenames. You can either browse for packages or use the Filter field to search for particular packages by name.

3.Click the Add to Queue button. Red “X” icons appear in place of the document icons to the left of the package names.

4.Click the Queue icon in the left pane of the window. You should see the packages you selected under the heading Packages to remove.

5.Click the Process Queue button. Yumex displays a dialog box asking for confirmation. When you click OK, the program proceeds to delete your selected files, then updates its package information.

Other GUI package managers accomplish the same task and require more-or-less the same actions.

While you’re cleaning up unnecessary packages, remember to clean out your APT, Yum, or Portage collection of downloaded package files. For APT, type apt-get clean; for Yum, type yum clean packages; and for Portage, remove the contents of /usr/portage/distfiles.

Automatic Housecleaning

Ideally, your system should do a certain amount of housecleaning automatically. System startup scripts and cron jobs often check for and remove files in /tmp; servers should delete their spool files when they’re no longer useful; and so on. Unfortunately, these scripts sometimes don’t work as intended.

If you like, you can create your own scripts to perform automatic or semi-automatic housecleaning and call them from your own local startup scripts or from cron jobs. You should, however, be very cautious about doing this. You don’t want to have a cron job delete a temporary file that’s still in use, for instance. For this reason, the safest approach is to either not do automatic maintenance at all or to have your scripts search for files that are good candidates for deletion and email a list to you. You can then delete them manually.

Criteria for automatic deletion, or for being added to a list of files to check, are likely to focus on the age of the files in question. Ordinarily, files in temporary storage locations are likely to hang around for a few seconds, minutes, or perhaps hours. A months-old file in /tmp is most likely abandoned and can be deleted.

Programs that run for extended periods, such as servers, are exceptions to this rule; if they create temporary files, those files probably shouldn’t be touched.

Roderick W. Smith is the author or co-author of over a dozen books, including Degunking Linux and Linux Power Tools. He can be reached at rodsmith@rodsbooks.com.

Comments are closed.