FS_scan: Getting Detailed with Your Data

Need details on your file system's data? FS_scan allows you dig deep into your storage, giving you the ability to perform trend analysis on the results.

Just last week we walked you though a new tool, agedu, that allows you to get a snapshot view of your file system. agedu produces a very nice graphical display that provides an overview of the age and size of your data (either change or access time). However, there are times when you need or want more detail on the data that’s sitting in your storage. This time around we’ll look at a new tool, FS_scan, that does precisely that.

FS_scan allows you to recursively scan a directory tree to get a detailed view of your data. In particular, it will tell you the dates and ages of your files, the average ages of the files in a given directory, and it will tell you the oldest files in the directory tree. It also produces a CSV file that you can open in a spreadsheet. With this information you can get a very detailed view of the state of your storage with the ability to do a trend analysis of the resulting data (i.e. How fast is it changing? How often are files accessed? How often is data modified?).

Let’s dive in and see what our data’s doing.


When You Just Need More Details

Remember that when talking about the data on your storage there are three dates (or three ages) that need to be considered: (1) Last date accessed or the access age, (2) Last date modified or the modify age, (3) Date last changed or the change age. So when examining a file system it becomes much more difficult to quantify how data is being used because all three dates or ages can be very important. Agedu is a great tool for getting a quick glimpse of the access age or change age of the file system being examined, but it is only a glimpse of the state of the filesystem. If you want to create a more detailed report or monitor the file system over time for a trend analysis then you need more detailed information than what agedu can provide at this time.

One option for getting more detailed information is to use the stat command in Linux. It can be used to get the status of files or even the file system. For example the output from stat looks like the following,

$ stat *
  File: `~storage002.html'
  Size: 11472     	Blocks: 24         IO Block: 4096   regular file
Device: 811h/2065d	Inode: 3220767     Links: 1
Access: (0600/-rw-------)  Uid: ( 1000/laytonjb)   Gid: ( 1000/laytonjb)
Access: 2009-05-24 17:19:52.000000000 -0400
Modify: 2009-05-24 17:19:52.000000000 -0400
Change: 2009-05-24 17:19:52.000000000 -0400
  File: `storage002.html'
  Size: 11285     	Blocks: 24         IO Block: 4096   regular file
Device: 811h/2065d	Inode: 3220766     Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/laytonjb)   Gid: ( 1000/laytonjb)
Access: 2009-05-24 17:13:13.000000000 -0400
Modify: 2009-05-24 16:02:27.000000000 -0400
Change: 2009-05-24 16:02:27.000000000 -0400

Or you can get a glimpse of the file system status using the “-f” option.

$ stat -f *
  File: "~storage002.html"
    ID: f11c91747fe09927 Namelen: 255     Type: ext2/ext3
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 37263886   Free: 33094551   Available: 31216553
Inodes: Total: 9396224    Free: 9106153
  File: "storage002.html"
    ID: f11c91747fe09927 Namelen: 255     Type: ext2/ext3
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 37263886   Free: 33094551   Available: 31216553
Inodes: Total: 9396224    Free: 9106153

Both options provide useful information. The first option, stat, gives the access, modify, and change dates for the file, as well as the uid, gid, the size of the files, and the permissions. The second option, stat -f gives additional information including the file system type and the fundamental block size. However, if you want to use the stat command to gather detailed information you will have to perform these commands for the directory tree, parse the information, and assemble it into a usable form.

Python has a nice module, called the os module that can easily walk a file system and gather virtually all of the same information that the stat command produces. Even better is that this module is part of the standard library for many of the python packages in many of the distributions. This can easily form the basis of a tool to walk a file system and gather detailed file information.

Python Modules to the Rescue

One of the functions in the os module is called “walk” (os.walk). This function allows you to easily walk a directory tree (i.e. examine the files recursively in a directory tree) and get information on the directories and the files. From the Python 2.6.2 documentation there is a simple example that has been modified and presented below.

#!/usr/bin/python

import os
from os.path import join, getsize

for root, dirs, files in os.walk('.'):
    print root, "consumes",
    print sum(getsize(join(root, name)) for name in files),
    print "bytes in", len(files), "non-directory files"

This quick code snippet displays the number of bytes taken by non-directory files in each directory under the starting directory (current working directory). This simple snippet can form the basis of a script that can walk through a directory tree and gather information about the files. A quick note – this code snippet does not have any exception handling and it is definitely possible you can encounter exceptions.

With the ability to walk a directory tree, you can open the files in the directory and gather statistics on each file. The os module also has a function (method) called os.fstat that can give you most of the information that the stat command produces. Taking the previous example and extending it a bit results in the following example.

#!/usr/bin/python

import os
from os.path import join, getsize

for root, dirs, files in os.walk('.'):
    print root, "consumes",
    print sum(getsize(join(root, name)) for name in files),
    print "bytes in", len(files), "non-directory files"
    for file in files:
       fileloc = root + "/" + file
       FILE = os.open(fileloc, os.O_RDONLY)
       junk = os.fstat(FILE)
       size = junk[6]
       atime = junk[7]
       mtime = junk[8]
       ctime = junk[9]
       uid = junk[4]
       gid = junk[5]
       print "   File: %s size: %s atime: %s mtime: %s ctime: %s" % (file,size,atime,mtime,ctime)
       os.close(FILE)

In the second for loop, the full path to the file is created (fileloc) using the root of the director tree (root) and the file name (file). Notice that os.fstat function returns a list of attributes. For example, it returns the access time (atime), the modify time (mtime), and the change time (ctime), which are all in seconds since the epoch. There are other attributes as well includes the size in bytes (size), the uid (uid) and gid (gid).

The previous example serves as a quick introduction to what you can do with Python using the modules in the standard library. In particular the os module has many functions that are useful for getting detailed information about files.

Down to Details »

FS_scan: A Tool For Detailed File System Information

Comments on "FS_scan: Getting Detailed with Your Data"

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>