dcsimg

Filenames by Design, Part Two

Continuing our series on how to take full advantage of your filesystem with tips and tricks for the newbie and old pro alike.

Extended Sidebar: Details of the File-Renaming Loops

Earlier in the article I showed two shell loops that rename an ordered list of files (read from /tmp/files). Each loop builds a series of shell commands to rename the files with a prefix of either three decimal digits or a variable number of hexadecimal digits. The prefix insures file sorting order for shell wildcards and ls(1). Each loop writes a shell script to /tmp/renamer that renames the files.

This “sidebar” page has more information about those two loops.

First loop: numeric prefixes


$ let i=0
$ while read -r oldfile
> do
>   printf -v prefix '%03d' $((i++))
>   echo "mv -i '$oldfile' '${prefix}_$oldfile'"
> done </tmp/files >/tmp/renamer

Some of the techniques above might not be familiar. Here’s a rundown:

  • The > prompt is a Bourne shell secondary prompt. The shell is waiting for you to complete something it’s missing — in this case, the rest of the while statement. If your shell is set up to edit multi-line input, you should be able to edit any line in the loop before you press ENTER on the done line. (If you aren’t comfortable typing multiple lines at a shell prompt, you can make a throw-away script file instead and run it with bash scriptname.)
  • The variable i stores the file prefix number. (Note that with a redirected-I/O loop like this one, some shells will increment variables within the loop but not pass any changes out of the loop. That is, after the loop runs, $i may still be 0.)
  • The while loop reads existing filenames one by one from /tmp/files and stores each name in the variable oldfile.
    (Notice the redirection at the end of the loop. That’s explained here and here.) The -r option keeps read from changing the input text; -r isn’t needed on some versions of read.
  • The bash operator $((i++)) does the arithmetic operation between the double parentheses and returns the result. Here we’re using the postfix operator ++ to increment $i after passing its value to printf. The formatting specification %03d prints this value in a 3-character field, left-padded with one or two zeroes if needed.

    We’re using a version of printf that’s built into bash. From other shells, you can use the external version of printf(1):


      prefix=`printf '%03d' "$i"`


    If you don’t have printf and/or the $((i++)) operator, you can use a case statement and expr. This also lets you be sure $i isn’t empty and hasn’t overflowed:


      case "$i" in
      ?) prefix=00$i ;;
      ??) prefix=0$i ;;
      ???) prefix=$i ;;
      *) echo "case: bad length: '$i'" 1>&2; break ;;
      esac
      i=`expr "$i" + 1`

  • The echo outputs an mv command line to rename $oldfile to a new name starting with $prefix and an underscore. These command lines are collected from echo‘s standard output into the file /tmp/renamer.

    Notice the quoting: outer double quotes (") allow variable substitution between them — but suppress the special meaning of single quotes ('). So $oldfile and ${prefix} are expanded as part of two single-quoted arguments. Those single quotes protect the filenames stored in /tmp/renamer from any interpretation by the shell that reads them. (You can see the result below, at the less command.)

Here’s a look at the two files we made, /tmp/files and /tmp/renamer:


$ head -3 /tmp/files
foo
bar
data
$ less /tmp/renamer
mv -i 'foo' '000_foo'
mv -i 'bar' '001_bar'
mv -i 'data' '002_data'
...
mv -i 'a file' '124_a file'
mv -i 'How
many
people?' '125_How
many
people?'

Note that the single-quoting even protects filenames containing spaces, newlines, and wildcard characters like ?. (Yes, a multi-line filename is legal on Linux filesystems.) This keeps the shell from breaking the last two filenames into pieces and from interpreting the question mark (?) as a wildcard.

(You might want to modify the script that builds /tmp/renamer, or simply edit /tmp/renamer by hand, to make a more “agreeable”
new filename without spaces or special characters — for instance, 125_How_many_people.)

Q: Our method of single-quoting the mv arguments fails for one particular character. Which one?

A: If a filename contains a single quote character ('), its mv command will have unmatched single quotes.

Writing shell commands to generate other shell commands that can deal with filenames containing any arbitrary mixture of single and double quotes (especially with backslashes before them) can be a challenge. A simple workaround is adding a test to warn you that you’ll need to edit /tmp/renamer by hand:


while read -r oldfile
do
  case "$oldfile" in
  *\'*) echo "WARNING: fix the unprotected ' in \"$oldfile\"." 1>&2 ;;
  esac
...


Or you could modify the script to handle filenames containing single quotes and double quotes. (We don’t shy away from the good, the bad, or the ugly here in the Power Tools column. :) It’s probably easier to use a different programming language than the shell — possibly for the whole job.
But here’s a workaround. It uses sed, which doesn’t treat quotes specially. (If you have a simpler way using only shell code, tell me! Please be sure it handles filenames containing the two-character sequence \".)


while read -r oldfile
do
  printf -v prefix '%03d' $((i++))
  case "$oldfile" in
  *[\'\"]*)
    # Make mv -i "$oldfile" "${prefix}_$oldfile"
    echo -E "$oldfile" | sed \
    -e 's/\\/&&/g' \
    -e 's/"/\\"/g' \
    -e 's/.*/"&"/' \
    -e h \
    -e "s/^\"/\"${prefix}_/" \
    -e x -e G -e 's/\n/ /' \
    -e 's/^/mv -i /'
    ;;
  *)
    # Output single-quoted arguments:
    echo "mv -i '$oldfile' '${prefix}_$oldfile'"
    ;;
  esac
done </tmp/files >/tmp/renamer


A detailed explanation would take pages. Briefly, though:

  • If $oldfile contains a single or double quote, the sed script runs, reading $oldfile from its standard input. Otherwise, a simple echo outputs an mv command line as the original script did.
  • Each sed expression starts with its -e option.
  • We double each backslash (\) character, add a backslash before each double quote, add double quotes around the old filename (s/.*/"&"/), then copy the quoted filename into sed‘s hold buffer.
  • Next we make the second argument for mv by adding ${prefix}_ to the start of the quoted filename (which is still in sed‘s pattern buffer). Note the double quotes around this sed expression; they let the shell replace ${prefix} with the current prefix value before sed starts.
  • The next-to-last line has three commands that swap the hold and pattern buffers, then join them onto a single line.
  • The last command adds mv -i before the two arguments. By default, sed writes its pattern buffer to stdout after the last editing command. This adds the mv command to /tmp/renamer.

Second loop

file_count=$(wc -l < /tmp/files)
max_hex=$(echo -e "obase=16\n${file_count}-1" | bc -q)
prefix_width=${#max_hex}
for prefix in $(jot -w "%0${prefix_width}x" "$file_count" 0)
do
  read -r oldfile
  echo "mv -i '$oldfile' '${prefix}_$oldfile'"
done </tmp/files >/tmp/renamer

Here’s the rundown:

  • $file_count holds the number of filenames. (Using < means wc will only output the line count, not the filename /tmp/files.)
  • $max_hex is the largest hex value that $prefix will have. We get this by sending two commands to the standard input of bc(1), the calculator utility:

    1. obase=16 sets the output base to hexadecimal. (The default is decimal, base 10.)
    2. The second command is n-1, where n is the number of filenames. (The shell expands ${file_count} into n.) For example, if there are 256 filenames, bc will get the command 256-1. It will return FF, which means the prefixes will run from 00 (the minimum) to FF (255 decimal).

    Because bc(1) reads commands from its standard input, we’re sending the commands with echo; its -e option converts the \n escape sequence into a newline. The -q option makes bc “quiet” so it only outputs results.

  • $prefix_width uses the bash string-length operator ${#parameter} to get the number of characters in $max_hex. This tells us the number of hex characters needed to hold the largest prefix number.
  • Finally (whew!) we tell jot to generate hex numbers in the printf-like specification %0nx. For instance, if ${prefix_width} is 4, then the second argument passed to jot will be %04x.

Comments on "Filenames by Design, Part Two"

I want to it articles thank for your time of this nice read!!! I definitely enjoy every little bit of it and I have you bookmarked to check out new stuff on your blog a must read blog!!!!href=”http://www.labtronixedinc.com/bacteriological-incubator.html”>Multi Room Incubator
Multi-Stack Shaking Incubator

I want to it articles thank for your time of this nice read!!! I definitely enjoy every little bit of it and I have you bookmarked to check out new stuff on your blog a must read blog!!!!Automatic Laminar Flow
Automatic Laminar Flow

I want to it articles thank for your time of this nice read!!! I definitely enjoy every little bit of it and I have you bookmarked to check out new stuff on your blog a must read blog!!!!Automatic Laminar Flow
Automatic Laminar Flow
laminar air flow manufacturer

I just couldnt leave your website before saying that I really enjoyed the useful information you offer to your visitors… Will be back often to check up on new stuff you post!

Every when inside a although we select blogs that we study. Listed beneath are the most current web sites that we pick.

Please pay a visit to the web sites we comply with, like this a single, as it represents our picks in the web.

The time to read or go to the subject material or internet sites we’ve linked to below.

Although web sites we backlink to beneath are considerably not associated to ours, we really feel they may be essentially really worth a go as a result of, so have a look.

Very couple of web-sites that transpire to be detailed below, from our point of view are undoubtedly very well really worth checking out.

We like to honor several other net web sites around the internet, even though they aren?t linked to us, by linking to them. Under are some webpages really worth checking out.

You could definitely see your enthusiasm in the article you write.The world hopes for more passionate writers like you who are not afraid to mention how they believe. At all times go after your heart.Feel free to visit my site; http://gosecretstudentartisanthings.tumblr.com/

I think the admin of this website is in fact working hard for his website, as here every material is quality based data.

Leave a Reply