Wizard Boot Camp, Part 10: Utilities You Should Know

The time has come to leave Hogwarts, young wizard! We wrap up our ten-part series on becoming a command-line wizard with a look at more utilities you should know.

We wrap up the Wizard Boot Camp series with a third and final article about utility programs that you should know about — and some not-so-obvious ways to use them.

csplit, split

Some large files — including archives, multipart email messages, business information for a large set of customers, and data files with repeating patterns — may need to be split into smaller “chunks” for reorganization, easier storage or transport. That’s what csplit(1) is for. Give it a pattern, an offset, a repetition count, and/or a line number, and it will parse an input file into a series of smaller output files.

By default, csplit‘s output files are named xx00, xx01, and so on. To rearrange a big file, you can simply cat those output files together in a different order. Here’s a simple example using the shells’ curly-brace operators:

$ csplit oldfile '/pattern/' '{5}'
  ...creates files xx00 - xx05...
$ cat xx0{5,0,2,1,3,4} > newfile
$ rm xx0[0-5]

Using a shell wildcard operator, like xx0[502134], wouldn’t work with cat because it sorts filenames in alphanumeric order. We’ll see more about csplit in a future column — part of a new series about handling text files.

In these days of multi-gigabyte files and terabyte (or larger) filesystems, a file can still be “too big.” For instance, if you’re trying to send a huge file over a network connection that often times out, even the automatic-retry ability of some file transfer utilities won’t get all of the file through.

Splitting the file into smaller chunks, then reassembling the pieces on the other end, can save headaches and time. The split utility is great for that. It splits input — either files or stdin — into equal-sized files named (by default) xaa, xab, xac, and so on. (The last file may be smaller.) For instance, on the sending side, split the file into 1-Megabyte chunks:

$ ls -l kcpr.mp3
-rw-r--r-- ... 70535208 ... kcpr.mp3
$ split -b1m kcpr.mp3
$ ls -l x??
-rw-r--r-- ...  1048576 ... xaa
-rw-r--r-- ...  1048576 ... xab
-rw-r--r-- ...   280616 ... xcp

After you’ve transmitted all of the files, do a quick check on the receiving side to be sure they all have the same size. Then cat them together.

% ls -l x??
-rw-r--r-- ...  1048576 ... xaa
-rw-r--r-- ...   280616 ... xcp
% cat x?? > kcpr.mp3
% rm x??

Using a checksum program like md5sum on both the original file and the reconstructed version can give you more confidence.


In general, Linux doesn’t require filename extensions such as .exe or .txt to know what to do with a file. Executable files — files whose execute bit was set by, for instance, chmod — start with a two-byte magic number. The best-known magic number is probably #!, which lets you specify the file’s interpreter (/bin/sh, /usr/bin/perl, etc.)

The file utility will guess what type of data is in many types of files. That includes unidentified or mis-identified files you receive attached to an email message. For example:

$ file mystery-file.dat
mystery-file.dat: PDF document, version 1.3

locate, updatedb

If your system runs the updatedb utility (from cron or otherwise), you’ll have a database on your system that lets you find files or directories by name much faster than a command like find / -name .... The locate utility searches that database. For instance, to find all files and directories whose name includes examples:

$ locate examples

The locate manpage has details — including how to use wildcard characters to restrict matching. But some uses aren’t quite so obvious. If you think about the syntax of the pathnames you want, you can often use string-matching to find them. To locate any file in a directory named examples, for instance, use a search pattern that matches a directory name in a pathname:

locate /examples/

To do more sophisticated pattern-matching, you can filter locate‘s output through a tool like grep. You can even dump the entire database:

locate / | grep ...

To search the contents of files or directories with a certain name, pipe the list of pathnames to xargs grep. For instance, to search all files with a name including foo for text containing bar, try this:

$ locate -0 foo | xargs -0 grep -Hs bar | cat -v
/usr/local/bin/ascript:echo "back at the bar..."
Binary file /usr/share/emacs/21.4/lisp/mail/footnote.elc matches

We’re using the -0 (zero) option for both locate and xargs; it separates pathnames with NUL characters, which avoids problems caused by “special” characters in filenames. The grep option -H makes tells grep to always output a filename (even if xargs happens to pass only one filename to grep). And the grep option -s keeps grep silent about arguments that are directory files or unreadable files. The cat -v avoids sending “unprintable” characters to your terminal.

You may want to run updatedb multiple times to make more than one locate database: one for all users, one for system files, one for each user’s home directory (readable only by that user), and so on. In that case, you and other users may want to set the LOCATE_PATH environment variable to tell locate which databases to search:

$ grep LOCATE_PATH /etc/profile


Everyone knows about the grep utilities. Less well-known is look. It searches the first characters on each line of data — like a grep search starting with the anchor character ^ (caret). look defaults to a linear (sequential) search or uses binary search with its -b option. A binary search rapidly searches a sorted data file — even a very large one.

By default, look searches the system word list. That’s handy for checking spelling. If there are words that start with the argument you type, you’ll see them:

$ time look gas

real    0m0.088s
user    0m0.065s
sys     0m0.009s

Comments on "Wizard Boot Camp, Part 10: Utilities You Should Know"

Hi fantastic blog! Does runnikng a blog such as this require a great deal oof work?
I’ve no expertise in programming but I hadd been hoping to start my own blog in the near
future. Anyhow, if you have any idess or techniques for new blog
owners please share. I know this is off topic howevdr I just wanted to ask.

My blog :: cheap car insurance

It’s amazing for me to have a web site, which is beneficial in support oof my knowledge.

thanks admin

Here is my weblog; cheap car insurance

Excellent web site you have got here.. It’s hard too find high quality writing llike yours thesze
days. I seriously appreciate individuals like you!

Takee care!!

my web site … Cheap car insurance

My coder is trfying to convince me to move to .net from PHP.
I have always disliked the idea because oof the costs.But he’s tryuong none
thhe less. I’ve been using WordPress on various websites for ahout a year
and am anxious about switchig to another platform.
I have heard good things about blogengine.net.Is there a way I can import all my wordpress content inyo it?

Any kind of help would be greatly appreciated!

Look at my web blig – cheap car insurance in ohio

Does your site hhave a contact page? I’m havin a tough time locating it
but, I’d like tto senhd you an e-mail. I’ve
got some ideas for your blog you might be interested in hearing.
Either way, greeat blog and I look forward to seeing it
grow over time.

Visit my weeb blog – cheap car insurance

Thanks ffor another informative web site.
The place else may just I am getting thqt kind of information written in such a perfect manner?
I’ve a challenge that I’m simply now running on, and I’ve been at the look
out for such info.

Also visit my webpage Cheap Car Insurance

I am extremely inspired along with your writing abilities and
also with the format to your weblog. Is that this a paid subject or did you modify it yourself?
Either way stay up the nice high quality writing, it is uncommon to look a nice blog like this one today..

Also visit my blog: cheap car insurance

Heya i’m for the first time here. I found this board and I find It really useful & it helped me out a lot.
I hope to give something back and help others like you helped

Here is my page :: cheap car insurance

Very good blog article.Really thank you! Cool.

Very neat post. Really Cool.

Check beneath, are some entirely unrelated web-sites to ours, even so, they’re most trustworthy sources that we use.

Very few websites that come about to be detailed below, from our point of view are undoubtedly effectively really worth checking out.

Here are some of the web sites we suggest for our visitors.

Every after in a although we pick out blogs that we read. Listed below are the most up-to-date web-sites that we opt for.

This blog w?s… how ddo I ?ay ?t? Relevant!!

F?nally I’v? found s?mething tha helped me.
?hank y?u!

Feel free to visit m? website – adult swinging

Leave a Reply