dcsimg

How Perl Handles References

You need references. Everybody programming in Perl does, since they are one of the basics of the language. A bit like C's pointers, references can be used to refer to all sorts of other things, including scalars, arrays, hashes, filehandles, typeglobs, subroutines, and synthetic data structures. If C calculates addresses and dereferences pointers with & and *, respectively, Perl does much the same with \ and $.

You need references. Everybody programming in Perl does, since they are one of the basics of the language. A bit like C’s pointers, references can be used to refer to all sorts of other things, including scalars, arrays, hashes, filehandles, typeglobs, subroutines, and synthetic data structures. If C calculates addresses and dereferences pointers with & and *, respectively, Perl does much the same with \ and $.

Why use references? First, they go great with subroutine calls. The usual way to pass an array or hash between a subroutine and its caller involves pushing around all the values of the array or hash. You can get the same or better results instead by passing just the single reference to the array or hash.

Also, references are necessary for complex data structures. The values of arrays and hashes must be scalars, so there’s no direct way to make hashes of hashes. However, references have the syntax of scalars, so you can make arrays of references to arrays of…. Before long, you’re writing code with variables like


$history{recent}[$western]{traffic}

Looking at References

Let’s look at an example. Suppose I want to have a “reverse chomp” operator that would add a newline to every element of an array. I could write the code as follows:


for $element (@array) {
$element .= “\n”;
}

And while this would certainly work, it locks me into a specific variable named @array. If I wanted to make a general subroutine, I’d be out of luck without references (unless I wanted to do something evil and non-scalable like alter @_ directly). A reference permits the selected variable to be changed at will, using an additional level of indirection. Consider the following code:


$this_reference = \@named_array;

Here, the \ operator “takes a reference to” the named @named_array variable. The value is called an “array reference,” or arrayref for short. (Occasionally this is incorrectly called listref.) The reference fits nearly anywhere a scalar value fits, and so we’ve shoved it into $this_ reference. This arrayref “points at” @named_array.

To use the reference we must dereference it. Let’s set @named_array to the values of 1 through 10, but using the reference:


# @named_array	= (1..10);
@{$this_reference} = (1..10);

The syntax for dereferencing is to write the operation as we would without the reference, but then replace the name of the variable with a block of code (enclosed in braces) returning a reference to the variable. So, we’ve now got a piece of code that affects @named_array, at least this time. However, with a different array reference stored in $this_reference, the same code affects a different variable:


$this_reference = \@another_array;
@{$this_reference} = (1..10);

Now we’ve set @another_array to those 10 values. We can even use the reference syntax to access individual elements:


# $another_array	[2] = “three”;
${$this_reference}[2] = “three”;

Again, replace the name with a block returning the thing holding the reference, and we get the dereferencing form.

So, we can start to see how to make our “unchomp” work. We’ll write the code so that it uses a reference, and pass that reference as a parameter:


sub unchomp {
my $ref = shift;
for $element (@{$ref}) {
$element .= “\n”;
}
}

And then call it with a reference to the array we want unchomped:


unchomp(\@named_array);
unchomp(\@another_array);

The reference passes as a single parameter, which is then shifted into $ref and dereferenced into the foreach loop. Bingo.

Since the reference fits into a scalar variable, can we have a list element be a reference? Certainly:


for $aref (\@named_array, \@another_array) {
unchomp($aref);
}

In fact, we can even store this list into another array:


@do_these = (\@named_array, \@another_array);
for $aref (@do_these) {
unchomp($aref);
}

But what have we done? We now have an array, each element of which is an arrayref, which can in turn be dereferenced to access the individual elements. So, what does it look like to access each layer?


@do_these # two elements, each an arrayref
@{$do_these[0]} # @named_array
${$do_these[0]}[3] # $named_array[3]
@{$do_these[1]}[4,5] # @another_array[4,5]
$#{$do_these[0]} # $#named_array

Some people call this structure a “list of lists,” but that’s pretty loose, since really it’s an array of arrayrefs. Perl does not have “lists of lists.”

Now, let’s simplify the syntax a bit. The rules above always work for dereferencing (replace the name with a block), but can start looking pretty ugly for common things. First, if the expression inside the block is only a simple scalar variable, we can lose the curly braces. Thus, we can change @{$aref} to @$aref, but we have to leave @{$do_these[0]} alone.

There’s another optimization available for accessing array elements through a reference. In place of:


${WHATEVER}[WHEREVER]

we can always write:


WHATEVER->[WHEREVER]

The -> operator followed by square brackets means to treat the previous value as an arrayref, dereference it, and then select the requested element. Thus, we can rewrite ${$aref}[2] as $aref->[2] and ${$do_these[0]}[3] to simply $do_these[0]->[3].

Finally, if the arrow ends up between subscripts, we can drop the arrow safely:


$do_these[0][3]

Which looks vaguely C-like. Cool. We cannot remove the arrow between $aref and [2] in the previous example, though, because that would be looking at an element of @aref, not at all what we want.

Did we need the named arrays here to set up @do_these? Nope. We can also use an “anonymous array constructor”:


$do_these[0] = [1..10];

Here, the value 1..10 is computed in a list context, then placed into an array structure. A reference to this array is returned as the value of the square brackets and placed into $do_these[0]. Except for the fact that we don’t have a named array anymore, the rest of the code would run identically. We could even initialize the entire array as:


@do_these = ([1..10], [11..20]);

And we get two different 10-element arrays, held as arrayrefs in @do_these. Note that the placement of square brackets and parens here is essential; swapping them would have gotten us into a mess.

Adding elements to an array has always been a “self-extend” operation in Perl. Assigning to elements that don’t yet exist causes the array to be autoextended:


@a = ();
$a[3] = “barney”;
$a[7] = “dino”;

And we end up with:


@a = (undef, undef, undef,
“barney”, undef, undef,
undef, “dino”);

Notice that the intermediate elements are automatically assigned undef. Similarly, any variable used as if it were an arrayref, but which does not yet contain anything (or just undef), is automatically stuffed with an arrayref to an empty anonymous array. This process, called “autovivification,” makes populating so-called “multidimensional” arrays trivial:


@a = ();
$a[3]->[2] = “hello”;
# same as:
# $a[3] = [];
# $a[3]->[2] = “hello”;

This even works on multiple levels:


@a = ();
$a[2]->[4]->[5]->[3] = “foo”;
# or $a[2][4][5][3] = “foo”;

Very nice.

Arrays aren’t the only things that can be referenced. Hashes are also another popular target:


%last_name = (
“fred” => “flintstone”,
“wilma” => “flintstone”,
“barney” => “rubble”,
);
$hashref = \%last_name;
@firsts = keys %{$hashref};

That last line can be written as keys %$hashref as well, using the same abbreviations given earlier. Accessing an element can also be abbreviated:


# looking at $last_name{“fred”}:
${$hashref}{“fred”}
# removing optional {}’s:
$$hashref{“fred”}
# or switching to arrow form:
$hashref->{“fred”}

We can put an arrayref as a hash value:


$score{“fred”} = [180, 150, 165];
$score{“barney”} = [172, 190, 158];

and then access that with everything we’ve seen:


@fred_scores = @{$score{“fred”}};
${$score{“fred”}}[2] = 168; # fix 165 to 168
$score{“fred”}->[2] = 168; # same thing
$score{“fred”}[2] = 168; # same thing

Note that we can drop an arrow between either kind of subscript.

Like arrayrefs, hashrefs can also appear from nowhere using the autovivification:


%bytes = ();
# …
$bytes{$src}{$dest} += $count;

This creates a hash of hashrefs, with each hashref being added only when a new $src shows up, and each second-level hash element being added for a new $dst for that $src.

By the way, hashrefs can also be generated by anonymous hash constructors:


$hashref = {
“fred” => “flintstone”,
“barney” => “rubble”,
“betty” => “rubble”,
};

The value inside the braces is evaluated like the right side of a hash assignment (list context, alternating key/value pairs). A hash is built, and a reference to that hash is returned. So, to build a reference to the scores above, we could do this:


$game = {
“fred” => [180, 150, 165],
“barney” => [172, 190, 158],
};

Which could have been part of the league scores:


$week[0] = $game;
$week[1] = {
“fred” => [201, 188, 65],
“barney” => [189, 252, 99],
};
# or more directly:
@week = ({
“fred” => [180, 150, 165],
“barney” => [172, 190, 158],
},{
“fred” => [201, 188, 65],
“barney” => [189, 252, 99],
});

Now we can get the score for game $i of week $j for fred with $week[$j-1]{“fred”}[$i-1]. We are subtracting 1 here because Perl begins its count at 0 instead of 1.

Less frequently used, but still just as cool, are scalar references (scalarrefs):


$that = \$scalar_var;
$$that = 17; # $scalar_var = 17

Scalarrefs autovifify as well, although that’s not particularly impressive:


$that = undef;
$$that = 3; # anonymous var becomes 3

Anonymous data structures can also occur when a variable goes out of scope:


my $x;
{
my $prince = “van gogh”;
$x = \$prince;
}

Here, $x is pointing to what is now an anonymous string, the artist formerly known as $prince. This frequently happens when returning a data structure reference from a subroutine:


sub marine {
my %things;
# …
return \%things;
}

The return value here will be a hashref, now pointing into the anonymous value. New invocations of the subroutine create new instances of %things. Memory for the previous return values is reclaimed only when the last reference is removed.

So, that should give you a good start into references. If you are interested in further information, you can check the documentation that comes with Perl, especially perlref, perllol, and perldoc. You might also want to take a look at chapter four of my book Programming Perl, Second Edition, which is published by O’Reilly & Associates. In a future Perl of Wisdom column, I’ll look at how references can also be made into subroutines and filehandles.

Until then, enjoy!





Randal L. Schwartz is the chief Perl guru at Stonehenge Consulting and co-author of Learning Perl and Programming Perl, both published by O’Reilly & Associates. He can be reached at merlyn@stonehenge.com.

Comments are closed.