dcsimg

How Old is that Data on the Hard Drive?

The vast of amount of data being stored in this day and age, naturally leads to files sitting unused for longer and longer periods of time. A new app, agedu, can quickly tell you what data on your filesystem is lying fallow.

Aegdu – A Tool for Displaying Data Usage

File systems can have hundreds or even millions of files. Tracking how they are used can be very difficult, bordering on the impossible. Fortunately there is a tool, named agedu, that can give you a quick glimpse into the “age” of the data on a directory basis.

Agedu is simple to install, configure, and run. You build the code with a simple ./configure command that everyone is accustomed to using with open-source code. If you build the code this way it will install into /usr/local so you need to be root. The other option is to build the code to install into your user account. This is fairly easy to do byt telling the configuration to install into your home directory such as, ./configure --prefix=/home/laytonjb/bin/agedu-r8442. Then you just create an alias to the executable. For example in your .bashrc file you add the following line,

alias agedu=/home/laytonjb/bin/agedu-r8442/bin/agedu

(Note: the version of agedu tested in this article is r8442).

After you have installed agedu there are a number of things you can do with it. The first thing that you should do with agedu is create an index of all of the files and their sizes in the directory tree. All subsequent queries can be done against the index (much faster than continually scanning the file system). Note that this means that for all directories and files below the current directory agedu will sum the used storage. Once the index is built you can then “query” it to get a variety of information. Agedu even comes with a basic HTML server so it will produce a graphical display of the results.

To create an index of the directory tree you just run the command.

$ agedu -s /home/laytonjb

Note, the "-s [directory]" produces an index file named agedu.dat in the current directory (Note: if the index file is in a directory being scanned, agedu will ignore it).

Once the index is created you can query it. A great way to get started is to use the HTML display capabilities.

$ agedu -w

Agedu will print out a URL that you can then copy into your browser. For example,

$ agedu -w
Using Linux /proc/net magic authentication
URL: http://127.164.152.163:51107/

Below is a screenshot of the web browser.

Figure 1: Aegdu Screenshot Using Access Time (atime)
Figure 1: Aegdu Screenshot Using Access Time (atime)

The web graphics display the age of the files in a specific directory, red being the oldest and green being the newest. The web page orders the directories by the total space used in the directory. For this specific example the first directory has the vast majority of the used space and also the majority of the oldest files (the laptop used in the screenshot is only about 7 months old so this directory has almost all of the oldest files). The image also tells you the total space used in the directories to the far left and what percentage of the total space the directory is using (listed to the far right). When you are finished with the web page, you close agedu by just pressing cntrl-c.

In the screenshot you should notice the very top of the page that states the data age is based on the access time (atime). This is the default setting. However you can easily perform the same thing using mtime if you like (Note: At this time ctime is not an
option).

$ agedu --mtime -s /home/laytonjb
Built pathname index, 14078 entries, 931214 bytes of index
Faking directory atimes
Building index
Final index file size = 1932408 bytes
$ agedu -w
Using Linux /proc/net magic authentication
URL: http://127.164.152.163:54491/

The first command produces the index. Then you need to either display the graphic output as in the second command or otherwise query the output. Figure 2 below is a screenshot of the resulting web page.

Figure 2: Aegdu Screenshot Using Modify Time (mtime)
Figure 2: Aegdu Screenshot Using Modify Time (mtime)

Again, the directory, “BENCHMARKS”, has not been touched for about 7 months or when the file system was created. Also note that the web page says “last-access” even though the data is ‘last- modified”. This appears to be a bug in the code.

In addition to the HTML output, you can also query the database to get text information (great for scripting). For example,

$ agedu -s /home/laytonjb
$ agedu -t /home/laytonjb

This will send text output to stdout that is a summary of the space usage (Recall that that this includes subdirectories as well).

Be default agedu looks for the oldest file when creating the scale as displayed in the web output. You can use the text option to query the index for the age of the data that doesn’t have to follow that scale. For example, one can scan for the amount of space in each directory that is older than 6 months by the following:

$ agedu -s /home/laytonjb
$ agedu -a 6m -t /home/laytonjb
12          /home/laytonjb/.adobe
192         /home/laytonjb/.cache
16          /home/laytonjb/.compiz
32          /home/laytonjb/.config
112         /home/laytonjb/.dvdcss
140         /home/laytonjb/.fontconfig
88          /home/laytonjb/.gconf
32          /home/laytonjb/.gnome2
12          /home/laytonjb/.gnupg
304         /home/laytonjb/.gstreamer-0.10
4           /home/laytonjb/.local
16          /home/laytonjb/.macromedia
1036        /home/laytonjb/.mozilla
12          /home/laytonjb/.mplayer
28          /home/laytonjb/.nautilus
8           /home/laytonjb/.pulse
4           /home/laytonjb/.update-manager-core
4988140     /home/laytonjb/BENCHMARKS
20188       /home/laytonjb/BG
412         /home/laytonjb/CLUSTERBUFFER
6580        /home/laytonjb/POISSON_HOME
160         /home/laytonjb/RESEARCH
5017632     /home/laytonjb

This shows the space usage summary for each directory that has data older than 6 months. This capability can be extremely useful to search for directories that have very old data. From a system administrator perspective a prime example would be to use agedu to scan user directories for really old data after examining all home directories for the oldest data. This can also be run as part of a script that is run either daily, weekly, or monthly, and creates a report of the directories with the oldest data. Then one can make decisions about what to do with the data such as archiving it.

The savvy administrators reading this article will be quick to realize that users could simply use the touch command to update the atime and mtime of their data, obscuring the real access and modify times of the data. However, one could use agedu to run reports fairly often to catch users doing this. It doesn’t stop them but at least you have a record of the users doing this and if they become abusers of space then you can at least talk to them and show them the reports.

Despite the tone of this article, users are not evil in any sense. But having data to explain to users why they should compress data or delete data is much more effective that simply demanding that they delete data. In addition you can use these reports to help identify users that need space and then work with them to understand how they are using space. These types of reports also help in requesting more space because you can explain how space is being used, who is using it, and the how fast it is growing.

Summary – Check Your Space Usage Today!

As pointed out in the introduction, studies have pointed out that there is data on storage that has not been accessed in a very long time. While there are legitimate reasons for keeping data on-line, at least knowing how much data that has not been accessed in quite some time provides evidence for either adding storage or adding archiving capabilities.

It can also be used to identify the users that are using the most space so they can be asked to delete data or compress data that they are not using or have not touched in some time (it’s fairly convincing to ask the user to compress files they have not used in some time when you can show them how much disk space they are using and how long it’s been since they last accessed the files). This information can also be used to provide justification for more space or, perhaps even more importantly, it can be used to track trends in data usage.

Agedu is a tool that can give you a quick overview of your disk usage as a function of time. The tool is remarkably easy to build and use and has a great deal of flexibility including the ability to be used in scripts. As pointed out, the scripting capability can be used in a variety of ways to help administrators. Even for the casual home user, this information can be very useful in understanding why your disks are getting so full and perhaps even more useful when asking the household finance committee to flip for a new 2 TB SATA drive (or two).

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62