Filenames by Design, Part Two

Continuing our series on how to take full advantage of your filesystem with tips and tricks for the newbie and old pro alike.

The first article in this series showed ways to use the names of files and directories as a simple database — to organize collections of data and find them quickly from either a GUI menu, from a program, or from the command line. Although you don’t need to read that article to understand this one, it’s a good idea to review the four points in the filesystem introduction, especially the point about pathnames.

This month we’ll look at ways to build those database-like systems. Some of these techniques are so specific that they’re intended more as examples (an idea of what’s possible) than as specific solutions to everyday problems.

If you won’t be using the filesystem as a database, you might still be interested in the ways we stretch the shells and utilities. You’ll see curly-brace and arithmetic evaluation operators, along with printf(1), to build whole directory trees with just a few commands. We’ll also use the almost-forgotten (but useful!) bc and jot utilities.

Order, order…

When you’re planning a new project, or if you need to organize a mess of files, you’ll probably sit down and think of a logical system. The principles in the previous column can help. For instance, sorting filenames with shell wildcards may be easier if names have the same length. (So, if the names have digits, add leading zeroes as needed to give numbers the same number of digits in all filenames.)

An obvious way to make a series of directories is with a loop in the shell (or other scripting language). Build the directory names in one or a set of variables. For instance, to make directories named foo1 through foo9, bar1 through bar9, and so on, try the nested for loops below. (You can type these directly at a shell prompt, by the way; you don’t need to make a script file. And indentation isn’t required.)

for name in foo bar baz ...
do
  for num in 1 2 3 4 5 6 7 8 9
  do
    mkdir "$name$num"
  done
done

Most shells have curly-brace operators that create a series of space-separated words. (The bash manpage explains these in the section “Brace Expansion”.) Braces are a great time-saver when you’re creating a series of names. You could write the previous example as:

mkdir {foo,bar,baz}{1,2,3,4,5,6,7,8,9}

Q: Wouldn’t the shell wildcard operator [1-9] be simpler in the previous example than the curly-brace operators {1,2,3,4,5,6,7,8,9}?

A: Wildcard operators only match existing filenames. The curly-brace operators create strings, not filenames — so you can use them to create filenames that don’t exist yet. (The shell doesn’t “know” it’s creating directories; it simply outputs space-separated names that are passed to mkdir.)

Multiple levels

You can build trees (more than one level) using the mkdir option -p. It creates intermediate directory levels as needed. For instance, here’s one way to build all of the structure in last month’s example Figure One:

for top in archive browsing current
do
  let i=0
  while [[ i -le 99 ]]
  do
    printf -v level2 '%02d' "$i"
    mkdir -p $top/$level2/{0,1,2,3,4,5,6,7,8,9}00
    let i++
  done
done

The printf stores two-character directory names in $level2. (%02d adds a leading zero if $i is less than 10.) This makes all directory names the same length — which, as mentioned earlier, is important for sorting. So, when $top is archive and $i is 0, mkdir creates archive/00/000, archive/00/100, and so on, through archive/00/900.

You could actually do the same thing by replacing the inner loop with curly-brace operators, as the next example shows.

Tip: Copy the brace pattern {0,1,2,3,4,5,6,7,8,9} with your mouse or your editor, then paste it as many times as needed.

for top in archive browsing current
do mkdir -p $top/{0,1,2,3,4,5,6,7,8,9}{0,1,2,3,4,5,6,7,8,9}/{0,1,2,3,4,5,6,7,8,9}00
done

To see what mkdir commands the shell will run, add echo before mkdir -p. If you’re feeling really maniacal ;-), you don’t need the loop at all. Just replace $top with {archive,browsing,current}. You’ll create the whole tree with a single mkdir. (Don’t carry this “too far,” though. On some systems, this technique may eventually fail with “arguments too long”.)

The jot utility is great for creating series of numbers, names, and much more. It’s often used to generate a list of for-loop parameters. For example, to make empty files named aa/data00.html through aa/data62.html, bb/data00.html through bb/data62.html, …, up to zz/data00.html through zz/data62.html, try:

for d in $(jot -w '%c' 26 a)
do
  mkdir $d$d
  touch $d$d/data{0,1,2,3,4,5}{0,1,2,3,4,5,6,7,8,9}.html
  touch $d$d/data6{0,1,2}.html
done

(We used two separate touch commands to help the example fit the page. You could use just one.) jot outputs 26 letters a through z. The argument to its -w option uses printf-style formatting. The 26 specifies the number of repetitions and the a is the starting value; jot increments by default. Instead of jot, you could use curly-brace operators {aa,bb,cc,...,zz}.

Finally, here's a way to add default text to each of those files -- so you could later edit them with an HTML editor (or do a global edit with a script). Make a template file; let's call it template.html:

<html>
<head>
<title>Data File NNN</title>
</head>
<body>
<h1>Data File NNN</h1>
<p>Data set NNN will be added soon.</p>
</body>
</html>

Then use sed to read that file and output it, with the current file number in place of each NNN string, into each new HTML file:

for d in $(jot -w '%c' 26 a)
do
  mkdir $d$d
  for n in {0,1,2,3,4,5}{0,1,2,3,4,5,6,7,8,9} 6{0,1,2}
  do
    sed "s/NNN/$n/" template.html > $d$d/data$n.html
  done
done

Renaming existing files

Here's an example for Bourne-type shells like bash. You have a directory full of files with random names. You'd like to add a three-digit prefix to each filename so they'll sort in a predefined order. (Of course, if ls and shell wildcards already output the filenames in the order you want, you don't need techniques like this.)

  1. Save the filenames into a temporary file, then edit the file and put the names into the order you'd like:

    $ ls > /tmp/files
    $ vi !$
    vi /tmp/files

    The history operator !$ gets the filename from the previous command line. We're using vi, but any plain-text editor will do. If you're on a shared system, you might want a more secure location than /tmp.

  2. Read the filenames from the ordered list in /tmp/files and rename them with a three-digit prefix and an underscore (_). (This scheme is arbitrary, of course. Use whatever will work best for you.) We'll use a shell loop, but you could use any scripting language. Here the printf utility is making three-character zero-padded names:

    $ i=0
    $ while read -r oldfile
    > do
    >   printf -v prefix '%03d' $((i++))
    >   echo "mv -i '$oldfile' '${prefix}_$oldfile'"
    > done </tmp/files >/tmp/renamer

    This uses the bash operator $((i++)) to increment $i after getting its value. If anything else isn't familiar, there are details in the first part of Details of the file-renaming loops.

  3. Here's another way. It uses jot (mentioned earlier in this article) to generate just enough hexadecimal prefixes, just long enough, for all of the filenames in /tmp/files.
    file_count=$(wc -l < /tmp/files)
    max_hex=$(echo -e "obase=16\n${file_count}-1" | bc -q)
    prefix_width=${#max_hex}
    for prefix in $(jot -w "%0${prefix_width}x" "$file_count" 0)
    do
      read -r oldfile
      echo "mv -i '$oldfile' '${prefix}_$oldfile'"
    done </tmp/files >/tmp/renamer
    

    For instance, if there are 255 files, the prefixes will need to have two hex digits. The first file will be renamed to 00_somefile and the last will be fe_somefile (fe hex is 254 decimal).

    There's a lot of shell hackery here. (If it seems too obscure, you can always use another language...) You'll find details in the second half of Details of the file-renaming loops.

  4. When you're happy with /tmp/renamer, run it with a command like this:

    $ sh -ve /tmp/renamer
    mv -i 'foo' '000_foo'
    ...
    mv -i 'a file' '124_a file'
    $

    The sh option -v shows each command line before running it, which makes it easy to keep track of where you are and to know which command produced any error message. The -e option makes the shell exit immediately if any of the commands returns a non-zero status -- for instance, if one of the files doesn't exist.

  5. Once you're more confident, or for short jobs, you can omit the script file (here, /tmp/renamer) and have the loop run mv commands directly. (The command-line editing features in most shells make this easy to do -- and also make it easy to modify and re-use previous loops.) Ending the mv command line with || break will terminate the while loop if an mv command returns non-zero status -- as the -e option did for the script file. That's a good safety measure.

    Also, in this case, don't use the mv option -i because, if mv prompts you "overwrite somefile?" you won't be able to answer since stdin has been redirected from /tmp/files. (This "shouldn't be a problem" after you've gotten some experience. The -i "ask me first" options to mv, cp, and rm are for wimps, anyway. :)

Making complex filenames

The end of the first article in this series showed complex filenames like 0012345_04_2568x3915_q75.jpg (which holds photo 12345, version 4, tells that the photo is 2568x3915 pixels and was saved at 75% quality as a JPEG file). As we'll see next time, complex filenames can help you find or identify files quickly without reading the file (or another database) for often-needed meta-information.

That complex filename came from a nawk(1) script that reads a directory full of digital camera files with names like DSC_0001.JPG and generates mv commands -- basically like this:

photo_info=.../photo_info.nawk
for oldfile in DSC_????.JPG
do
  # ... set $basenum
  nawk_out=`nawk -f $photo_info "$oldfile"`
  mv -i "$oldfile" "${basenum}_01_${nawk_out}.jpg"
done

The nawk script parses the output of the ImageMagick identify utility to get meta-information from $oldfile.

Here's one more idea: use the line-numbering utility nl(1) to add a four-digit number before each filename from /tmp/files. Use a shell while loop with a read command to read the number into $num and the filename into $name. The mv option -v shows what's happening:

nl -n rz -w 4 < /tmp/files |
while read -r num name
do
  mv -v "$name" "${num}_${name}"
done

Wrapping up

One last note: Linux extended file attributes can store extra data about files -- and help you avoid overly-long filenames. However, not all utilities (or GUI applications!) support them. The Z shell does, though; see Extended File Attributes and ZSH for details.

To summarize: This article shows some techniques to create systems of files and directories with meta-organization to help you find the data you want quickly. Of course, there are a lot of ways to organize data; these are just a few examples. Although most examples use the shell, you may pick another scripting language to do the job better.

The third column in this series will show ways to use shells and utilities to find the data you want. Once you've set up a system, though, you don't have to access it with a shell or a utility. For instance, the directories and files can be opened from a GUI menu (on an application like the GIMP photo editor, for instance).

Please note: For an extended discussion of the file-renaming loops noted above, click "Next" below."

Comments on "Filenames by Design, Part Two"

maximd

Hello, this is a great article and in an attempt to share more knowledge about command line tips, here’s this one:

With Bash (this tip does not work with Csh for exemple), instead of using {1,2,3,4,5,6,7,8,9} you can simply type {1..9}.

Give it a try in a Bash shell with:

echo {1..9}

Reply
frases

Great article, Jerry. Learned some new tricks!

Maximd,
Interesting..your shortcut didn’t work in my Bash shell on my MacBook:
MACLT:~/Documents$ echo $SHELL
/bin/bash
MACLT:~/Documents$ echo {1,9}{a,b,c}
1a 1b 1c 9a 9b 9c
MACLT:~/Documents$ echo {1..9}
{1..9}

scott

Reply
wirawan0

Commenting on maximd’s comment:

> With Bash (this tip does not work with Csh for exemple), instead of using {1,2,3,4,5,6,7,8,9} you can simply type {1..9}.

This is correct, but it only works for Bash 3.0 and above.

Reply
jp

Thanks, maximd and wirawan0. I need to take a closer look at bash 3.

I’m guessing that bash got the {1..9} expansion from the Z Shell, which has had it for quite a while. Here’s some info from the zshexpn(1) manpage:

An expression of the form `{n1..n2}’, where n1 and n2 are integers, is expanded to every number between n1 and n2 inclusive. If either number begins with a zero, all the resulting numbers will be padded with leading zeroes to that minimum width. If the numbers are in decreasing order the resulting sequence will also be in decreasing order.

If a brace expression matches none of the above forms, it is left unchanged, unless the BRACE_CCL option is set. In that case, it is expanded to a sorted list of the individual characters between the braces, in the manner of a search set. `-’ is treated specially as in a search set, but `^’ or `!’ as the first character is treated normally.

Jerry

Reply
akton

The seq command is also nice to generate sequences. Instead of typing {1,2,3,4,5,6,7,8,9} , you can type $(seq 1 9)

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>