What’s GNU in Old Utilities, Part Five: sort
For decades, sort has been extended over and over again to make it more and more useful. Here’s the fifth in an ongoing series about new features in familiar utilities.
Wednesday, February 15th, 2006
This month, in the fifth article of a series on new features added to utilities by GNU programmers and others, let’s look at sort. What’s new? There are new ways to sort that aren’t just lexicographic. For example, you can sort on month abbreviations (jan, jun) and general numeric formats. You can also handle NUL-terminated records — great for sorting filenames to pass to xargs. (All examples are based on GNU sort version 5.2.1 from the Debian stable distribution.
Forget What You Know
If you learned to use sort quite a while ago, you may not recognize its most recent incarnations. Some command-line options have changed, and the GNU version no longer truncates long lines of data — a longstanding bug in early utilities. Additionally, GNU sort now follows POSIX rules, which are mostly the same as the older System V rules, but fairly different than the old BSD rules. Now, for instance, your LC_COLLATE locale setting affects sorting order, while the older versions of sort assumed native byte order.
To get the old sort order, set the LC_ALL environment variable to C. Listing One shows an example: the first command shows an unsorted file; the second command temporarily sets LC_ALL to C and sorts the file in the old native byte order; and the third command sorts the file with the system default, here en_US, ISO-8859-1.