Filenames by Design, Part Two

Continuing our series on how to take full advantage of your filesystem with tips and tricks for the newbie and old pro alike.

Extended Sidebar: Details of the File-Renaming Loops

Earlier in the article I showed two shell loops that rename an ordered list of files (read from /tmp/files). Each loop builds a series of shell commands to rename the files with a prefix of either three decimal digits or a variable number of hexadecimal digits. The prefix insures file sorting order for shell wildcards and ls(1). Each loop writes a shell script to /tmp/renamer that renames the files.

This “sidebar” page has more information about those two loops.

First loop: numeric prefixes


$ let i=0
$ while read -r oldfile
> do
>   printf -v prefix '%03d' $((i++))
>   echo "mv -i '$oldfile' '${prefix}_$oldfile'"
> done </tmp/files >/tmp/renamer

Some of the techniques above might not be familiar. Here’s a rundown:

  • The > prompt is a Bourne shell secondary prompt. The shell is waiting for you to complete something it’s missing — in this case, the rest of the while statement. If your shell is set up to edit multi-line input, you should be able to edit any line in the loop before you press ENTER on the done line. (If you aren’t comfortable typing multiple lines at a shell prompt, you can make a throw-away script file instead and run it with bash scriptname.)
  • The variable i stores the file prefix number. (Note that with a redirected-I/O loop like this one, some shells will increment variables within the loop but not pass any changes out of the loop. That is, after the loop runs, $i may still be 0.)
  • The while loop reads existing filenames one by one from /tmp/files and stores each name in the variable oldfile.
    (Notice the redirection at the end of the loop. That’s explained here and here.) The -r option keeps read from changing the input text; -r isn’t needed on some versions of read.
  • The bash operator $((i++)) does the arithmetic operation between the double parentheses and returns the result. Here we’re using the postfix operator ++ to increment $i after passing its value to printf. The formatting specification %03d prints this value in a 3-character field, left-padded with one or two zeroes if needed.

    We’re using a version of printf that’s built into bash. From other shells, you can use the external version of printf(1):


      prefix=`printf '%03d' "$i"`


    If you don’t have printf and/or the $((i++)) operator, you can use a case statement and expr. This also lets you be sure $i isn’t empty and hasn’t overflowed:


      case "$i" in
      ?) prefix=00$i ;;
      ??) prefix=0$i ;;
      ???) prefix=$i ;;
      *) echo "case: bad length: '$i'" 1>&2; break ;;
      esac
      i=`expr "$i" + 1`

  • The echo outputs an mv command line to rename $oldfile to a new name starting with $prefix and an underscore. These command lines are collected from echo‘s standard output into the file /tmp/renamer.

    Notice the quoting: outer double quotes (") allow variable substitution between them — but suppress the special meaning of single quotes ('). So $oldfile and ${prefix} are expanded as part of two single-quoted arguments. Those single quotes protect the filenames stored in /tmp/renamer from any interpretation by the shell that reads them. (You can see the result below, at the less command.)

Here’s a look at the two files we made, /tmp/files and /tmp/renamer:


$ head -3 /tmp/files
foo
bar
data
$ less /tmp/renamer
mv -i 'foo' '000_foo'
mv -i 'bar' '001_bar'
mv -i 'data' '002_data'
...
mv -i 'a file' '124_a file'
mv -i 'How
many
people?' '125_How
many
people?'

Note that the single-quoting even protects filenames containing spaces, newlines, and wildcard characters like ?. (Yes, a multi-line filename is legal on Linux filesystems.) This keeps the shell from breaking the last two filenames into pieces and from interpreting the question mark (?) as a wildcard.

(You might want to modify the script that builds /tmp/renamer, or simply edit /tmp/renamer by hand, to make a more “agreeable”
new filename without spaces or special characters — for instance, 125_How_many_people.)

Q: Our method of single-quoting the mv arguments fails for one particular character. Which one?

A: If a filename contains a single quote character ('), its mv command will have unmatched single quotes.

Writing shell commands to generate other shell commands that can deal with filenames containing any arbitrary mixture of single and double quotes (especially with backslashes before them) can be a challenge. A simple workaround is adding a test to warn you that you’ll need to edit /tmp/renamer by hand:


while read -r oldfile
do
  case "$oldfile" in
  *\'*) echo "WARNING: fix the unprotected ' in \"$oldfile\"." 1>&2 ;;
  esac
...


Or you could modify the script to handle filenames containing single quotes and double quotes. (We don’t shy away from the good, the bad, or the ugly here in the Power Tools column. :) It’s probably easier to use a different programming language than the shell — possibly for the whole job.
But here’s a workaround. It uses sed, which doesn’t treat quotes specially. (If you have a simpler way using only shell code, tell me! Please be sure it handles filenames containing the two-character sequence \".)


while read -r oldfile
do
  printf -v prefix '%03d' $((i++))
  case "$oldfile" in
  *[\'\"]*)
    # Make mv -i "$oldfile" "${prefix}_$oldfile"
    echo -E "$oldfile" | sed \
    -e 's/\\/&&/g' \
    -e 's/"/\\"/g' \
    -e 's/.*/"&"/' \
    -e h \
    -e "s/^\"/\"${prefix}_/" \
    -e x -e G -e 's/\n/ /' \
    -e 's/^/mv -i /'
    ;;
  *)
    # Output single-quoted arguments:
    echo "mv -i '$oldfile' '${prefix}_$oldfile'"
    ;;
  esac
done </tmp/files >/tmp/renamer


A detailed explanation would take pages. Briefly, though:

  • If $oldfile contains a single or double quote, the sed script runs, reading $oldfile from its standard input. Otherwise, a simple echo outputs an mv command line as the original script did.
  • Each sed expression starts with its -e option.
  • We double each backslash (\) character, add a backslash before each double quote, add double quotes around the old filename (s/.*/"&"/), then copy the quoted filename into sed‘s hold buffer.
  • Next we make the second argument for mv by adding ${prefix}_ to the start of the quoted filename (which is still in sed‘s pattern buffer). Note the double quotes around this sed expression; they let the shell replace ${prefix} with the current prefix value before sed starts.
  • The next-to-last line has three commands that swap the hold and pattern buffers, then join them onto a single line.
  • The last command adds mv -i before the two arguments. By default, sed writes its pattern buffer to stdout after the last editing command. This adds the mv command to /tmp/renamer.

Second loop

file_count=$(wc -l < /tmp/files)
max_hex=$(echo -e "obase=16\n${file_count}-1" | bc -q)
prefix_width=${#max_hex}
for prefix in $(jot -w "%0${prefix_width}x" "$file_count" 0)
do
  read -r oldfile
  echo "mv -i '$oldfile' '${prefix}_$oldfile'"
done </tmp/files >/tmp/renamer

Here’s the rundown:

  • $file_count holds the number of filenames. (Using < means wc will only output the line count, not the filename /tmp/files.)
  • $max_hex is the largest hex value that $prefix will have. We get this by sending two commands to the standard input of bc(1), the calculator utility:

    1. obase=16 sets the output base to hexadecimal. (The default is decimal, base 10.)
    2. The second command is n-1, where n is the number of filenames. (The shell expands ${file_count} into n.) For example, if there are 256 filenames, bc will get the command 256-1. It will return FF, which means the prefixes will run from 00 (the minimum) to FF (255 decimal).

    Because bc(1) reads commands from its standard input, we’re sending the commands with echo; its -e option converts the \n escape sequence into a newline. The -q option makes bc “quiet” so it only outputs results.

  • $prefix_width uses the bash string-length operator ${#parameter} to get the number of characters in $max_hex. This tells us the number of hex characters needed to hold the largest prefix number.
  • Finally (whew!) we tell jot to generate hex numbers in the printf-like specification %0nx. For instance, if ${prefix_width} is 4, then the second argument passed to jot will be %04x.
Jerry Peek is a freelance writer and instructor who has used Unix and Linux for more than 25 years. He's happy to hear from readers; see http://www.jpeek.com/contact.html.

Comments on "Filenames by Design, Part Two"

maximd

Hello, this is a great article and in an attempt to share more knowledge about command line tips, here’s this one:

With Bash (this tip does not work with Csh for exemple), instead of using {1,2,3,4,5,6,7,8,9} you can simply type {1..9}.

Give it a try in a Bash shell with:

echo {1..9}

Reply
frases

Great article, Jerry. Learned some new tricks!

Maximd,
Interesting..your shortcut didn’t work in my Bash shell on my MacBook:
MACLT:~/Documents$ echo $SHELL
/bin/bash
MACLT:~/Documents$ echo {1,9}{a,b,c}
1a 1b 1c 9a 9b 9c
MACLT:~/Documents$ echo {1..9}
{1..9}

scott

Reply
wirawan0

Commenting on maximd’s comment:

> With Bash (this tip does not work with Csh for exemple), instead of using {1,2,3,4,5,6,7,8,9} you can simply type {1..9}.

This is correct, but it only works for Bash 3.0 and above.

Reply
jp

Thanks, maximd and wirawan0. I need to take a closer look at bash 3.

I’m guessing that bash got the {1..9} expansion from the Z Shell, which has had it for quite a while. Here’s some info from the zshexpn(1) manpage:

An expression of the form `{n1..n2}’, where n1 and n2 are integers, is expanded to every number between n1 and n2 inclusive. If either number begins with a zero, all the resulting numbers will be padded with leading zeroes to that minimum width. If the numbers are in decreasing order the resulting sequence will also be in decreasing order.

If a brace expression matches none of the above forms, it is left unchanged, unless the BRACE_CCL option is set. In that case, it is expanded to a sorted list of the individual characters between the braces, in the manner of a search set. `-’ is treated specially as in a search set, but `^’ or `!’ as the first character is treated normally.

Jerry

Reply
akton

The seq command is also nice to generate sequences. Instead of typing {1,2,3,4,5,6,7,8,9} , you can type $(seq 1 9)

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>