dcsimg

Fit to be Tied, Part Two

Want to tie a complex data type? Here’s how, courtesy of Perl guru Randal Schwartz.
Last month, I introduced the tie operator, illustrating how tie can be used with a scalar variable to give additional behaviors to getting and setting the value of the variable. This time, let’s continue and look at other data types that can be tied.
Another common Perl data structure to tie is a hash. This code…
tie %h, MyTie, @list;
… translates to:
MyTie->TIEHASH(@list);
Again, the constructor is expected to return an object, which becomes the secret object of the tied variable. And, as before, you can call tied to get at that object.
The FETCH() method now receives a parameter, because it needs to know which key to fetch. That means…
$x = $h{SomeKey};
… turns into:
$x = tied(%h)->FETCH("SomeKey");
Storing a value now gets an additional parameter as well. The two parameters are the key and the new value. So, the code…
$h{SomeKey} = "newvalue";
… turns into:
tied(%h)->STORE("SomeKey", "newvalue");
It’s possible to use the same class for tying both scalars and hashes, but you’ll have to keep track of whether you were called with TIESCALAR or TIEHASH to know what kind of FETCH() and STORE() to do. That can get messy, so another strategy is to use your public class as a dispatch to the proper private implementation class:
package MyTie;
sub TIESCALAR {
my $class = shift;
$class->scalar_class->TIESCALAR(@_);
}
sub scalar_class { "MyTie::Scalar" }
sub TIEHASH {
my $class = shift;
$class->hash_class->TIEHASH(@_);
}
sub hash_class { "MyTie::Hash" }
package MyTie::Scalar;
# normal tied scalar stuff here
package MyTie::Hash;
# normal tied hash stuff here
Here, the class methods scalar_class() and hash_class()) get the name of the actual class to be used. This lets someone subclass this class, and provide their own derived associated classnames as well, simply by overriding these methods as needed.
Let’s create a basic tied hash that acts like a normal tied hash, putting the methods in MyTie::Hash. First, create a data structure to store the associated keys and values of an “inner” hash. An appropriate structure for that is simply another hash, so create one as an instance of the inner hidden object:
package MyTie::Hash;
sub TIEHASH {
my $class = shift;
bless {
Value => {}, # our hash value
}, $class;
}
Now when you call…
tie %h, MyTie::Hash;
… you get the hidden object holding an empty hash.
To handle a store, create a STORE() method, just as with the tied scalar:
package MyTie::Hash;
sub STORE {
my ($self, $key, $value) = @_;
$self->{Value}->{$key} = $value;
}
STORE() now gets three arguments instead of two, because the code has to know which key gets the corresponding value. Additionally, there’s an extra corresponding argument to FETCH() to select which key is needed:
package MyTie::Hash;
sub FETCH {
my ($self, $key) = @_;
return $self->{Value}->{$key};
}
With just these methods, you get a basic functional hash:
tie my %h, MyTie::Hash;
$h{"fred"} = "flintstone"; # calls STORE
$h{"barney"} = "rubble";
print $h{"barney"}, "\n"; # calls FETCH, producing "rubble"
Hash elements can also be deleted and checked for existence, so…
delete $h{SomeKey}
… turns into:
tied(%h)->DELETE("SomeKey")
And…
exists $h{SomeKey}
… turns into:
tied(%h)->EXISTS("SomeKey")
To implement those in your basic hash, just add the corresponding methods:
package MyTie::Hash;
sub DELETE {
my ($self, $key) = @_;
return delete $self->{Value}->{$key};
}
sub EXISTS {
my ($self, $key) = @_;
return exists $self->{Value}->{$key};
}
So far, this may not seem very interesting. The code just implements a much slower implementation of a basic hash. So, let’s do something that a normal hash can’t. Let’s make a case-insensitive hash by converting all keys to lowercase before being used with the inner secret hash:
package MyTie::CaseFoldingHash;
sub TIEHASH { # unchanged from before
my $class = shift;
bless {Value => {}}, $class;
}
sub STORE {
my ($self, $key, $value) = @_;
$self->{Value}->{lc $key} = $value;
}
sub FETCH {
my ($self, $key) = @_;
return $self->{Value}->{lc $key};
}
sub DELETE {
my ($self, $key) = @_;
return delete $self->{Value}->{lc $key};
}
sub EXISTS {
my ($self, $key) = @_;
return exists $self->{Value}->{lc $key};
}
By adding lc in front of each key access and consistently interpreting the key in this manner, case no longer matters. Here’s proof:
tie my %lastname, MyTie::CaseFoldingHash;
$lastname{"Fred"} = "Flintstone"; # stores at "fred"
print $lastname{"FRED"}, "\n"; # fetches "Flintstone"
Both Fred and FRED were lowercased to fred, so the two accesses connect with the same inner hash element.
Hashes can also be accessed with keys, values, and each. How are these mapped to the tied hash? It’s a bit tricky, so let’s break it down starting with each.
The first time each is called on a tied hash, the FIRSTKEY() method is called on the inner object and is expected to return some key of the hash. Each subsequent call to each triggers a call to NEXTKEY() on the inner object, which is expected to return the next key. If there are no keys to be delivered, either of these methods can return undef.
The keys and values methods are implemented as if calling each repeatedly until an undef result is returned, so once FIRSTKEY() and NEXTKEY() are implemented, you’ll get all three of each, keys, and values for free, as well as the “flattening” of a hash in a list context (@b=%a;).
For a simple hash, you must return any key of the inner hash for FIRSTKEY(), and all the remaining keys for NEXTKEY() until you’ve iterated through the hash. Well, this is exactly what calling each on the inner hash does, so let’s take that shortcut:
sub FIRSTKEY {
my ($self) = @_;
each %{ $self->{Value} };
}
sub NEXTKEY {
my ($self) = @_;
each %{ $self->{Value} };
}
And this works fine for the basics, such as:
for my $lastnames (values %lastname) { ... }
while (my($k, $v) = each %lastname) { ... }
my @flat = %lastname;
my $keycount = keys %lastname;
However, this approach doesn’t work when each fails to reach the end of the hash before one of the other hash operations is called. For example, presuming that the hash has more than four elements, calling each three times does what you’d expect:
each %lastname; # calls FIRSTKEY
each %lastname; # calls NEXTKEY
each %lastname; # calls NEXTKEY
NEXTKEY() is now ready to deliver the fourth key, but if you call keys instead…
my @keys = keys %lastname;
… Perl calls FIRSTKEY(), which gets the fourth key (oops!) and then calls NEXTKEY() repeatedly to get the remaining keys.
The problem is how each is used in the FIRSTKEY() method. Its necessary to reset the each iterator on the inner hash. A quick way to to do that is to call keys on that hash in a scalar context:
sub FIRSTKEY {
my ($self) = @_;
scalar keys %{ $self->{Value} }; # reset iterator
each %{ $self->{Value} };
}
You’ll find that this works much better, as it’s consistent with normal hashes. In the case-insensitive hash, calling keys will now show a series of lowercased keys.
But how can you preserve the original case of the assignment, while still letting accesses of either case connect to the same element? To do that, you need to store the original key as well. One option is to organize the inner hash with values that are two-element arrays containing the original key and its corresponding value. Let’s see how that looks and works:
package MyTie::CasePreservingHash;
sub STORE {
my ($self, $key, $value) = @_;
$self->{Value}->{lc $key} = [$key, $value];
}
sub FETCH {
my ($self, $key) = @_;
return $self->{Value}->{lc $key}->[1];
}
So a store creates a two element array, keeping the original key handy, while a fetch retrieves the value regardless of the key’s case. So far, so good. How about DELETE() and EXISTS()? Turns out, they don’t change a bit. Good.
But FIRSTKEY() and NEXTKEY() are a bit trickier. Both need to walk the inner hash, but not return its keys (which are already lowercased). Instead, both need to return element 0 of the array referenced by the value of that hash, like so:
sub FIRSTKEY {
my ($self) = @_;
scalar keys %{ $self->{Value} }; # reset iterator, as before
if (my($k, $v) = each %{$self->{Value}}) {
# we have a valid key, so return the unmangled real key
return $v->[0];
} else { # hash must be empty
return undef;
}
}
And unfortunately, NEXTKEY() is nearly identical, except for the absence of resetting the iterator:
sub NEXTKEY {
my ($self) = @_;
if (my($k, $v) = each %{$self->{Value}}) {
# we have a valid key, so return the unmangled real key
return $v->[0];
} else { # hash must be empty
return undef;
}
}
That probably means some refactoring is in order:
sub reset_keys {
scalar keys %{ shift->{Value} };
}
sub next_indirect_key {
my $self = shift;
if (my($k, $v) = each %{$self->{Value}}) {
# we have a valid key, so return the unmangled real key
return $v->[0];
} else { # hash must be empty
return undef;
}
}
sub FIRSTKEY {
my $self = shift;
$self->reset_keys;
return $self->next_indirect_key;
}
sub NEXTKEY {
my $self = shift;
return $self->next_indirect_key;
}
There, that feels better. It’s probably also easier to subclass.
Using this tie class definition, you can now perform case-insensitive access, but preserve the initially assigned case of the keys:
tie my %lastname, MyTie::CasePreservingHash;
# set up values
$lastname{"Randal"} = "Schwartz";
$lastname{"Tom"} = "Phoenix";
$lastname{"brian"} = "foy";
# show that the case is preserved:
print "first names:\n",
map(" $_\n", sort keys %lastname);
# show that the access is insensitive:
print "Tom’s last name is $lastname{’tom’}\n";
That code yields:
first names:
Randal
Tom
brian
Tom’s last name is Phoenix
Case is indeed preserved, but the access is case-insensitive.
Other actions for hashes include clearing the hash out and using the hashname in a scalar context. These map to the CLEAR() and SCALAR() methods, respectively.
Besides tying hashes, you can also tie arrays, and filehandles. For further information on how to do that, see the perltie manpage.
Until next time, enjoy!

Randal Schwartz is the chief Perl guru at Stonehenge Consulting.

Comments are closed.