List Manipulation

Although many of my columns deal with entire programs, I find that people still send me email about the basics. So, this month, I thought I'd address an issue that people seem to keep asking about: basic list manipulation.

Although many of my columns deal with entire programs, I find that people still send me email about the basics. So, this month, I thought I’d address an issue that people seem to keep asking about: basic list manipulation.

Your Order, Please

One very common task with lists is selection: finding items in a list that meet a particular condition. For example, let’s find all the odd elements in a list @input:

my @output;
foreach (@input) {
if ($_ % 2) {
push @output, $_;

Of course, you can shorten and clean this up a bit by using a “backwards” if:

my @output;
foreach (@input) {
push @output, $_ if $_ % 2;

Or you can shorten the loop up a different way using a “backwards” foreach:

my @output;
$_ % 2 and push @output, $_ foreach @input;

But alas, you can’t nest the backwards if and the backwards foreach. It’s been argued that this leads to the potential for much abuse, and is thus not permitted. Even the use of and here as a stand-in for conditional execution is arguably obfuscated enough.

The previous code works, but it’s like using a hammer when a screwdriver would suffice. A more elegant solution, Perl’s grep operator does a fine job of selecting elements from a list:

my @output = grep $_ % 2, @input;

Each element of @input is placed temporarily in $_, and then the $_ % 2 expression is evaluated for a true/false value. When the expression is true, the corresponding element of the list is included in the output list. Thus, you get the odd-valued elements in a manner even shorter than before.

What if you wanted just the odd-positioned elements? That’s a bit trickier, but still not very hard, if you do it in two steps. First, construct a list of all the odd-position indices:

my @odd_indicies = grep $_ % 2, 0..$#input;

The snippet above constructs the list of all the indices using 0..$#input, and then throws away the even numbers as before. Next, you need a slice of the array with just those indicated elements:

my @output = @input[@odd_indices];

And that’s it! Of course, you can even eliminate the intermediate variable, at the expense of a bit of complexity, in:

my @output = @input[grep $_ % 2, 0..$#input];

Now, let’s consider the opposite problem: collect the indices of the elements that are odd in the array. Again, it’s a matter of understanding the right indirections. Start with 0..$#input, and then see which of those results in an odd array value:

my @indices_of_odd = grep $input[$_] % 2,

Of course, from here, it’s a simple step to actually look at those elements:

my @output = @input[@indices_of_odd];

But the point of this snippet is to fetch the indices, not the final @output, which we derived above much easier.

The expression for the grep can get pretty complex. Usually if the expression is something more complicated than a single operator, I drop down into the block-form of grep:

my @output = grep { $_ % 2 } @input;

Notice that there’s no comma between the closing brace and the list. If you add one there, Perl thinks you were trying to create an anonymous hash for the expression, which would always be true and thus rather pointless. Occasionally, Perl guesses wrong anyway about whether that’s an anonymous hash or block, so you have to help it along. The simplest way is to add a leading plus for a hash or a “slightly-trailing” semicolon for a block:

grep + { anon hash } …

grep { ; code block } …

The block can be arbitrarily complex, including having local variables and arbitrary control structures.

Like a subroutine, the last expression evaluated in the block is the one that matters. But unlike a subroutine, you’re not permitted to use a return from the block, so choose your logic carefully.

For example, to implement the Unix rm command with the -i selective delete switch, it’s merely a bit of code:

unlink $_ or warn “Cannot delete $_: $!;
foreach grep {
print “$_? “;
<STDIN> =~ /^y/i;
} glob “*”;

Let’s look at this one from the end to the beginning, because it uses a bunch of right-to-left operators.

First, the glob returns a list of all of the names in the current directory that don’t begin with a dot.

Next, the grep evaluates the print with $_ set to each name, and then waits for a response on STDIN.

If the response begins with the letter y (case ignored), the last expression evaluated in the block is true, and thus that particular item is selected for the output. Otherwise, the item is simply discarded.

Next, the foreach takes each item of the list returned by grep, and evaluates the logical and expression with the value in $_. If the unlink succeeds, the warn is skipped. Otherwise, the message is printed, and we move on.

The code is arguably convoluted with grep. You might find it easier to read in a forward form:

foreach (glob “*”) {
print “$_? “;
next unless <STDIN> =~ /^y/i;
next if unlink $_;
warn “Cannot delete $_: $!;

However, one difference between this code and the previous snippet is that this code deletes the files as you go along, instead of selecting all the files first, then deleting them in a batch.


Another common operation performed on lists is transforming them. For example, suppose the task was to insure that all the items in a list were definitely odd by multiplying them by two and adding one:

my @output;
foreach (@input) {
push @output, $_ * 2 + 1;

But again, this is such a common operation that there’s a nice shortcut in Perl: the map operator:

my @output = map $_ * 2 + 1, @input;

And like its grep cousin, there’s also a block form:

my @output = map { $_ * 2 + 1 } @input;

But unlike grep, the map operator’s last (or only) expression is evaluated in a list context. If the result is multiple elements (or empty), the output list is longer (or shorter) than the input list:

my @numbers_and_odds = map { $_, $_ * 2 +
1 } @input;

Above, two values are added to the resulting list for each input value. The output list is twice as big as the input list.

One slightly-more-practical example is to get a list of filenames and their sizes from the current directory, cached into a hash:

my @names_and_sizes = map { $_ , -s $_ }
glob “*”;
my %sizeof = @names_and_sizes;

The intermediate array has alternating names and sizes, which is the right shape to be assigned to the hash. Of course, the intermediate variable isn’t needed here either:

my %sizeof = map { $_ , -s $_ } glob “*”;

The number of elements returned doesn’t have to be constant, either. For example, the expression:

my @fields = split ” “, $_;

returns the whitespace-delimited elements of the string in $_. In fact, it appears so often that this is the default:

my @fields = split;

If you did this with $_ set to each of the lines of a file, and then concatenated the results, you’d have a single list of all the words of the file. But this is also what map will do for you directly:

my @all_words = map split, <INPUT>;

Each line is placed in $_, the split breaks the line into words, and the results are concatenated into one final output!

What if we wanted a (so-called) two-dimensional array of lines and words. No problem. Just put an anonymous array constructor around the output of each split.

my @two_d_words = map [split], <INPUT>;

Now $two_d_words[3] is an arrayref of the words on the fourth line (counting 0, 1, 2, 3, and so on), and $two_ d_words[3][2] is the third word on the fourth line, if any, or undef otherwise.

List Reduction Surgery

Another common list operation is reduction: converting a list into a single scalar. A few common reductions are built in to Perl, such as joining the elements into a single string with a common glue:

my $result = join “, “, @input;

But what if you wanted the items summed instead of being concatenated? You could hand-write the code like so:

my $result = 0;
$result += $_ for @input;

But it might be easier to see both the concatenation and summation as examples of a generic reduce operator:

sub join { my $glue = shift; reduce {
“$a$glue$b” } @_ }

sub sum { reduce { $a + $b } @_ }

This reduce operator works by placing the first and second elements of the list into the placeholders of $a and $b, and then taking the scalar result as $a and the next element as $b, until the entire list has been processed.

Although this hypothetical reduce operator isn’t built in to Perl the way that grep and map are provided, there’s a very fast C-coded version in the CPAN in the List::Util module. And as of Perl 5.8, List::Util is provided as part of the core distribution. Thus, you merely need to install a recent version of Perl, or the CPAN module, to say …

use List::Util reduce;
my $sum = reduce { $a + $b } @input;

… to get the sum of the values. (The List::Util module provides a sum function directly, but it’s nice to know how simple the definition is.) We can also do more esoteric things, like compute the product of the values …

my $factorial = reduce { $a * $b } 1..$n;

Or follow a list of values as a series of hash keys to obtain a final element:

my $location = reduce { \($a->{$b}) }
\%hash, @keys;

You can also use this to drill down to an arbitrary hash location. For example, you could evaluate the equivalent of …

$info{Flintstone}{Fred}{Age} = 25;

… as:

my @keys = qw(Flintstone Fred Age);
my $location = reduce { \($a->{$b}) }
\%info, @keys;
$$location = 25;

Thanks to Perl’s autovivification, the sub-hash elements are populated automatically, resulting in a nice reference to de-reference when reduce is finished.


So, there you have it. A few simple tricks with Perl’s lists, using a few built-in and easily-accessible operators. Until next time, enjoy!

Randal Schwartz is the chief Perl guru at Stonehenge Consulting and the author of many books on Perl. He can be reached at merlyn@stonehenge.com.

Comments are closed.