dcsimg

Power Tools: Piles of Files

Using text and utilities to organize and access files.

Linux runs on text. Configuration files are often human-readable text. Many other files contain text, too, and text often flows through Standard I/O connections. Linux has powerful utilities to handle text; you can also use a scripting language.

The names of files and their locations (pathnames) are also usually text. So, the techniques you use to process text can also be used to process files.

(Of course, if looking through file listings and clicking on some of them is the best way to find what you want, Linux has GUI browsers like Nautilus and Konqueror.)

This article covers ways to make lists of files — on-the-fly or in another file — then narrow the list to just what you’re looking for. We’ll use lots of shell loops with redirected I/O; if you need an introduction, see the sections “Let a Loop Do The Work” in Great Command-line Combinations.

When the Name Isn’t Enough

The third article in the Filenames by Design series shows ways to find files by name when those files are part of a thoughtfully-designed system. If you’re like me, though, you can only wish that all of your files were in a system that makes everything easy to find. (Some projects are carefully planned. Others are 3 a.m. hacks that you can’t finish neatly before the next crisis hits.)

Attributes like the last-modification timestamp or the size can help you find a file that’s hidden like a needle in a haystack. See the sidebar Some file attributes for suggestions. Of course, attributes aren’t always enough.

One of my favorite quick ways to save files from a project is to make a tar(1) archive in gzip(1) format with a name like project-name_1996-02-15.tar.gz and transfer it into a directory named tarballs on my main system. That’s great if I remember the name of the project or when I worked on it. More likely, though, I’ve forgotten what year it was or what conference I was about to attend when I wrote that file with the example I’m looking for. It’s time for power tools.

(By the way, this is a specific example of a general technique. These ideas also work for single files that aren’t in an archive.)

Start by thinking where the data might be — and, once you find some likely spots, what tools could extract it. Here we’re looking for gzipped files. Uncompressing each file onto the disk and searching through it can take a lot of disk space. But the GNU zcat(1) utility (also known as gunzip -c) reads a compressed file in various formats, uncompresses its contents on-the-fly, and writes them to standard output. That lets you avoid temporary files by writing data into a pipe.

Some File Attributes

The name and the contents aren’t the only way to find the file you want. A file also has attributes — the last modification time, for instance. You can find many attributes with utilities like ls(1) and stat(1); there are other suggestions below.

Here are some attributes you might want to search for:

  • The filename.
  • The “extension”, like .jpg for a JPEG-format photo.

    (Note that Linux itself doesn’t have actual filename extensions — as Microsoft Windows does. The applications you use may care how a filename ends, what sort of data it contains and the structure of that data, but Linux doesn’t. The file is just a sequence of data bits. Linux doesn’t regulate whether, say, a JPEG photo is in a filename ending with the four characters .jpg. A Power Tools column has more about file “types” under Linux.)

  • Part or all of the file’s pathname (one or more of the names of directories that hold the file).
  • The file’s three timestamps: last modification of the file contents, last “change” (to the file’s metadata, not to the file contents), and last access.
  • The file length.
  • If it’s a text file, the number of lines and/or words.(For instance, you might be looking for files with more than 1,000 lines.)

    The wc(1) utility can count lines and words — if the file is plain text (with no non-text coding added by, say,
    a word processing program). The section “Data Is Just Data” of the Power Tools column Performing Data Surgery explains how Linux text files are structured.

  • Is the file actually a symbolic or hard link?
  • Linux extended file attributes store external data with files — a sort of “tagging” system to let you identify particular files. Not all utilities support attributes, but Z shell does. Also see the manpages for chattr(1) and lsattr(1).
  • If the file contains particular words, strings, or characters, grep(1) and friends can probably find them.

A scripting language with flexible searching can be a good choice for complex tests and searches for non-textual data. (One of those languages is Perl.)

We’ll be searching tar archives. What’s in a tarball? It’s a series of sets of metadata for a file followed by the file’s content. We want to find string(s) somewhere in the content of one of those files. A quick-and-dirty technique is to search the entire tarball for the string you’re looking for, filtering the search results to keep non-text characters from messing up the screen. (You may not need tar unless you’re extracting a file from the archive.) Let’s start with that:

$ cd tarballs
$ for file in *1996* *usenix*
> do
>   zcat "$file" |
>   grep -i -H --label="$file" 'pattern'
> done | cat -v
Binary file ora_1996-04-15.tar.gz matches
Binary file usenix_1999.tar.gz matches
  • Wildcarded strings like *1996* *usenix* match all filenames in the directory that include 1996 or usenix.

    If that list might contain duplicates, you could either use a more specific wildcard pattern or start the loop this way:

    for file in $(/bin/ls -d1 *1996* *usenix* | uniq

    • /bin/ls -d1 (that’s a digit 1) lists the matching filenames, one per line, in sorted order. Using /bin/ls bypasses any alias you might have for ls. The -d option tells ls to list directory names instead of their contents.)
    • The uniq utility removes duplicate entries from a sorted list.
  • In the loop, zcat opens each file.
  • The uncompressed tarball is filtered through grep, which does a case-insensitive (-i) search for pattern.
  • Because grep is reading from the pipe, it doesn’t see the tarball’s filename. Adding -label="$file" makes grep output the filename, expanded by the shell from $file. (The --label option seems to also require -H… on grep version 2.5.1, at least.)
  • The loop’s output (actually, the standard output of all of the grep processes in the pipe) is piped to cat -v. This makes sure that your screen won’t turn into mush if the search matches a line containing non-textual data — such as a filename, surrounded by control characters, embedded in a file’s metadata.

    The cat -v trick is a good one. It actually wasn’t needed here, though, because grep decided that the tarballs were “binary” files — that is, the first few bytes were non-textual — so it output “Binary file file matches”. Adding the option --binary-files=text tells grep to show the matching lines anyway. We’ll try that next.

Comments on "Power Tools: Piles of Files"

Here is a superb Blog You might Uncover Interesting that we encourage you to visit.

Always a major fan of linking to bloggers that I really like but really don’t get a lot of link love from.

Hey there! I know this is somewhat off topic but
I was wondering which blog platform are you
using for this website? I’m getting tired of WordPress
because I’ve had problems with hackers and I’m
looking at alternatives for another platform. I would be fantastic if you could point me in the direction of a good
platform.

Feel free to visit my web blog: regler for videoovervÄgning

Thank you for the good writeup. It in fact was a amusement account it.
Look advanced to more added agreeable from you! However,
how could we communicate?

my blog post :: boligalarmnorge

Usually posts some pretty exciting stuff like this. If you?re new to this site.

Wonderful story, reckoned we could combine a few unrelated data, nonetheless definitely really worth taking a appear, whoa did a single learn about Mid East has got extra problerms too.

Below you?ll obtain the link to some web-sites that we believe you need to visit.

Although internet sites we backlink to below are considerably not associated to ours, we feel they may be really worth a go as a result of, so have a look.

Generate a routine that involves numerous muscle tissues and attempt a three-workout
circuit.

my page … wordpress com

Wonderful story, reckoned we could combine a couple of unrelated information, nonetheless truly really worth taking a search, whoa did a single master about Mid East has got far more problerms too.

Every the moment in a although we decide on blogs that we study. Listed below are the most up-to-date websites that we decide on.

We came across a cool site that you just may enjoy. Take a search if you want.

Please check out the sites we follow, such as this one, as it represents our picks through the web.

Although websites we backlink to below are considerably not associated to ours, we feel they’re actually worth a go via, so possess a look.

Wonderful story, reckoned we could combine a handful of unrelated data, nevertheless genuinely worth taking a look, whoa did 1 learn about Mid East has got additional problerms too.

Very handful of web-sites that occur to become in depth below, from our point of view are undoubtedly nicely worth checking out.

Although internet websites we backlink to beneath are considerably not associated to ours, we feel they are actually really worth a go by way of, so have a look.

Just beneath, are a lot of completely not related web sites to ours, on the other hand, they’re surely really worth going over.

Piece of writing writing is also a fun, if you
know after that you can write or else it is complicated to write.

Howdy! I could have sworn I’ve been to this site before
but after going through many of the posts I realized it’s new to me.

Anyhow, I’m definitely pleased I came across it and I’ll be book-marking
it and checking back frequently!

I am sure this paragraph has touched all the internet people,
its really really good paragraph on building up new website.

Also visit my homepage; water zorbing usa (lionknowledge.com)

Your way of explaining everything in this article is really good, all be
capable of easily be aware of it, Thanks a lot.

Here are some of the web-sites we advise for our visitors.

Every as soon as in a while we pick blogs that we read. Listed beneath would be the latest internet sites that we select.

Here is a good Blog You might Find Exciting that we encourage you to visit.

Although sites we backlink to below are considerably not connected to ours, we really feel they may be truly worth a go by way of, so have a look.

Very few websites that transpire to become detailed below, from our point of view are undoubtedly effectively worth checking out.

The time to study or go to the subject material or sites we’ve linked to beneath.

Check beneath, are some completely unrelated web sites to ours, however, they may be most trustworthy sources that we use.

In your own time on them, make sure they the spelling and grammar
are decent coupons for discount dance
Browse through the web stores and find the books that interest
you most online discount coupon If you really for you to
own and keep your books splish splash discount coupons What form of fruit will
we grow in the limited space san antonio discount coupons talk to your family
and friends and get someone can fight a person discount coupons for kings dominion

My web page; Sherwin williams discount coupon

Below you will uncover the link to some sites that we feel you need to visit.

Please visit the internet sites we follow, including this one, as it represents our picks through the web.

Every the moment inside a even though we choose blogs that we read. Listed beneath are the most recent web pages that we pick.

We came across a cool web page that you may well take pleasure in. Take a search in the event you want.

Here is a good Weblog You may Find Exciting that we encourage you to visit.

The real estate sector our skin become supple and the dark circles clearing magically away
from my eyes schlitterbahn discount coupons
Desk tidies will assist and your employees to maintain your pens dallas world aquarium discount coupons incredible concentrate on children’s manuals uniform discount coupon code
To bad this time if require mainly focus on books
of that kind discount coupons hotels

Here is my site: National Car Rental Discount Coupons

Check beneath, are some entirely unrelated websites to ours, however, they’re most trustworthy sources that we use.

Check beneath, are some entirely unrelated web-sites to ours, having said that, they are most trustworthy sources that we use.

Below you will obtain the link to some web sites that we consider you need to visit.

Wonderful story, reckoned we could combine a few unrelated information, nevertheless truly really worth taking a search, whoa did one particular discover about Mid East has got much more problerms also.

Very couple of internet websites that occur to become in depth beneath, from our point of view are undoubtedly very well really worth checking out.

Please check out the web pages we stick to, including this one, because it represents our picks in the web.

Just beneath, are several absolutely not connected websites to ours, nevertheless, they’re surely really worth going over.

The latest trend is text messaging rental car discount codes and coupons Enjoy the
Mexican culture on the isle of Cozumel or at Cancun Discount Voucher I loathe blue from a bathroom couponing when work is just too exhausting and your own environment suddenly seems
small hotel discount coupon codes The advanced ones can be fun to
read through as well discount car coupons

Just beneath, are many entirely not associated web-sites to ours, however, they are certainly worth going over.

Very couple of websites that happen to become detailed below, from our point of view are undoubtedly well worth checking out.

Always a huge fan of linking to bloggers that I appreciate but do not get a whole lot of link love from.

Leave a Reply