HEAD: A Taste of Perl 6 DECK: A dozen tempting new features of the forthcoming Perl 6 TOC_LINE: Perl 6 will be the language's first major revision since the early 90's. We look at why, how, who, what, and when. AUTHOR: Damian Conway I<"Perl 5 was my rewrite of Perl. I want Perl 6 to be the community's rewrite of Perl...and of the community."> -- Larry Wall, State of the Onion speech, OSCon 2000. SUBHEAD: If it ain't broke, why fix it? First of all, Perl 5 I broke. Those of us who are working on the design of Perl 6 are doing so precisely because we like Perl 5 so much. We like it so much that we use it everyday, for everything from filtering our mail, to maintaining our servers, to formatting this very paragraph. It's I we like Perl 5 so much that we want it to be even better. Perl 5's goal was to make "easy things easy, and hard things possible". It does that very well, but we believe it could do more. We believe it could make easy things trivial, hard things easy, and impossible things merely hard. Moreover, our love of Perl doesn't blind us to its flaws. Those C<$>, C<@>, and C<%> prefixes on variables are confusing; some of its other syntax is unnecessarily cluttered; it lacks some basic language features (like named subroutine parameters, or strong typing, or even a simple case statement); its OO model isn't really strong enough for most production environments; and the list goes on. So the Perl 6 design process is about keeping what works in Perl 5, fixing what doesn't, and adding what's missing. That means there will be a few fundamental changes to the language, a larger number of extensions to existing functionality, and a handful of completely new features. This article showcases some of the ways in which these modifications, enhancements, and innovations will work together to make the future Perl even more insanely great. Without, we hope, making it even more greatly insane. SUBHEAD: Sigils simplified Let's start with those mysterious C<$>'s, C<@>'s, C<%>'s, C<&>'s and C<*>'s that can be such a source of grief for newcomers to Perl (and can occasionally trip up experts too!) They're called "sigils" and the most important news is that Perl 6 will keep most of them. We did consider removing them completely (as some people requested), but we concluded that they are far too valuable to remove. They make it much easier to interpolate a variable into a character string or regular expression. And they provide important sanity checks on the arguments of various built-ins (finding logical errors like C and C). But we're modifying how sigils relate to their variable, in a way that actually reduces mistakes, rather than causing them. In Perl 5, the type of sigil a variable requires depends on how it's being used, and in particular what kind of value that usage is supposed to produce. Consider the code in Listing 1. [ BEGIN Listing 1 -- Accessing hashes and arrays in Perl 5 ] C< 01 # Perl 5 code... 02 03 print keys %hash; 04 print $hash{"name"}; 05 print @hash{"name", "rank", "cereal preference"}; 06 07 print @array; 08 print $array[0]; > [ END Listing 1 ] If you're using a hash as a full hash (as in line 3), you use the normal "hash sigil" (C<%>). But if you're looking up a single entry in the hash (line 4), and hence expect a single scalar value back, you use the "scalar sigil" (C<$>). And if you're looking up several entries at once (line 5), and expecting to get back a list of values, you use the "array sigil" (C<@>). Likewise, when you want the full array (as in line 7), you use C<@>, but when you want just a single element of it (line 8) you use C<$>. It's a logical enough system -- at least until we throw subroutine references and method calls into the mix, at which point to breaks down completely. More importantly, it doesn't fit well into many people's brains. That's because sigils act rather like English demonstrative articles ("that value", "these values", "those values"). But an English article always agrees with the underlying plurality of the object it demonstrates, I with the plurality of the bit(s) of the object you're currently interested in. So when you say "pass me those apples" and "pass me one of those apples" the "those" stays plural, whether you're asking for one or all of the fruits. But in Perl the equivalent requests are: C< pass(@apples); # "Pass those apples" pass($apples[1]); # "Pass one of that apples" > To most programmers that's simply counter-intuitive. So Perl 6 will change how sigils operate. In the new version of Perl, they'll cease to be adjectives and become an indivisible part of the nouns themselves. Listing 2 shows what that means for the five C statements shown in Listing 1. [ BEGIN Listing 2 -- Accessing hashes and arrays in Perl 6 ] C< 01 # Perl 6 code... 02 03 print keys %hash; 04 print %hash{"name"}; 05 print %hash{"name", "rank", "cereal preference"}; 06 07 print @array; 08 print @array[0]; > [ END Listing 2 ] The previous complicated rules about selecting the sigil according to the nature of the value(s) being returned are replaced with a single rule that simply says: "If it's a hash, use C<%>; If it's an array, use C<@>; If it's a scalar, use C<$>. Always." Not only is that vastly easier to teach, to learn, and to remember, it also has the elegant side-effect of silently fixing one of the most common mistakes made by Perl programmers. Many programs include code that locates and returns a particular data structure (say a hash) by reference. That reference is then usually stored in a local scalar variable, through which the hash is later accessed. Like so: C< $data = locate_hash_for("required data"); # then later... print $data{"particular_entry"}; > In Perl 5 that's a nasty and subtle error. The first line stores a reference to a hash in the scalar variable C<$data>. But later a particular entry is looked up in the hash C<%data>. Same name, different variables. In fact, the C<$data{...}> syntax has I to do with the variable C<$data>. Instead, it means: "Look up the entry in C<%data>, using the C<$> prefix on the variable because it's returning a single value". What they should have written was: C< print $data->{"particular_entry"}; > which dereferences the hash reference in C<$data> (using the C<< -> >> operator) and then looks up the particular entry. But people just don't think that way. So they're constantly being bitten by this mistake. Fortunately, in Perl 6, it's not a mistake at all. Because the sigil is determined by the variable, rather than the value(s) it's providing, C<$data> I means the scalar variable. So C<$data{"particular_entry"}> always means "look up the entry in the hash whose reference is stored in C<$data>". We suspect that when people port their code from Perl 5 to Perl 6 many of these kinds of hidden bugs will simply "evaporate", because the language semantics will have been changed to match how people actually think and code. SUBHEAD: A Swiss Army case statement Perl's problem isn't that it doesn't have a case statement. Its problem is that, because it doesn't have one standard case statement, people have invented 23 alternative "case patterns". Everything from the pedestrian: C< # Perl 5 code... $val = 'G4'; if ($val eq 'A4') { print "paper" } elsif ($val eq 'B4') { print "prior" } elsif ($val eq 'C4') { die "BOOM!" } else { print "huh??" } > to the baroque: C< # Perl 5 code... $val = 'G4'; ({ 'A4' => sub { print "paper" }, 'B4' => sub { print "prior" }, 'C4' => sub { die "BOOM!" }, }->{$val} || sub { print "huh??" } )->(); > In Perl 6, there's no need (nor any temptation) to jury-rig such awkward inefficient solutions. Instead, there is a single, standard, built-in, control statement that does the job: C< # Perl 6 code... $val = 'G4'; given $val { when 'A4' { print "paper" } when 'B4' { print "prior" } when 'C4' { die "BOOM!" } default { print "huh??" } } > The C statement associates the value in C<$val> with the special Perl "current topic" variable C<$_>. This association lasts for the duration of the associated block. Then each successive C statement within the block compares its associated value (C<'A4'>, C<'B4'>, etc.) against the current value of C<$_>. The first C statement whose value matches C<$_> has its associated block executed, after which control passes straight to the end of the surrounding block. Though this seems no different in essence from a case statement in many other languages, there is far more power here than first meets the eye. The way each C compares its value against C<$_> is determined by the (runtime) types of the two values being compared. In the above example, each C value is a string, so the string in C<$val> is compared against each of them using Perl's C string comparison operator. However, if the code had been: C< # Perl 6 code... $val = 'G4'; given $val { when 'A4' { print "paper" } when %B4 { print "prior" } when /C4/ { die "BOOM!" } when &D4 { print "huh??" } } > then the first C would still compare against C<$val> using C, but the second C would treat the string in C<$_> as a key into the C<%B4> hash and consider the match successful if the corresponding element in the hash contained a true value. The third C, on the other hand, would note that the C<$_> was being compared against a regular expression, so it would use pattern matching to compare the two. And the final C, finding a subroutine (C<&D4>), would pass C<$_> as an argument to that subroutine and consider the match successful if C returned a true value. It may sound complex, but it isn't really. In all instances, the C just automatically chooses the most appropriate way to compare the "given" value (i.e. C<$_>) against the current case. In other words, it just Does What You Mean. A more practical example of the power, flexibility, and convenience of this approach can be seen in the following code, which guesses an encoding scheme based on the first character in the data: C< given $first_char { when [0..9] { $guess = 'dec' } when /<[A-F]>/ { $guess = 'hex' } when &is_ASCII { $guess = '7-bit' } when %known { $guess = %known{$_} } default { die Cannot::Guess } } > Interestingly, because a C statement is really a function -- one that returns the final value from any successful nested C statement -- this example could also have been written as: C< $guess = given $first_char { when [0..9] { 'dec' } when /<[A-F]>/ { 'hex' } when &is_ASCII { '7-bit' } when %known { %known{$_} } default { die Cannot::Guess } } > Considering Perl's predominantly C/Unix background, people often wonder why we chose C and C as keywords, rather than C and C. There were two reasons. Firstly, because C and C read much more naturally, and are therefore much easier for non-C/Unix programmers (now the majority of Perl's users) to understand. But more importantly, we chose new keywords because the constructs they label are vastly more powerful than a mere C statement. For a start, a C always compares its value against C<$_>, whether or not it's inside a C. It doesn't care whether C<$_> was set by something else entirely. So a C can be used in I context that has an active C<$_>, not just inside a C. For example, in Perl a C loop successively aliases C<$_> to the list of values it's iterating , so in Perl 6 it's possible to combine looping and selection in a very efficient and readable manner: C< # Perl 6 code... for (@events) { when Mouse::Over { change_focus($_) } when Mouse::Click { make_selection() } when Window::Enter { change_focus($_) } when Window::Close { delete_window() } when /unknown\s+event/ { log_event($_) } } > Here, the first five cases have class names as their values, so their C statements attempt to match each event object by checking whether it belongs to that class. The last C, on the other hand, uses a pattern match as its test. So that C treats the event object as a string (by implicitly invoking the object's coercion-to-string method) and then checks whether that coerced string matches the specified pattern. Just as a C doesn't have to be associated with a C, so too a C doesn't have to depend on nested Cs. A C statement always sets C<$_>, whether or not that C<$_> is ever examined by a C. So, within a C block I of the Perl constructs that default to operating on C<$_> -- including the new unary dereference operator (C<.>) -- can be used. As Listing 3 illustrates, that gives Perl 6 the equivalent (and more) of a Pascalish C statement. [ BEGIN Listing 3 -- Another use for C in Perl 6 ] C< 01 given $obj_ref { 02 .synchronize(); 03 %data = .get_data; 04 given %data { 05 .{name} = uc .{name}; 06 .{addr} //= "unknown"; 07 print; 08 } 09 .set_data(%data); 10 } > [ END Listing 3 ] Within the lexical scope of its associated block, the outer C (line 1) aliases C<$_> to an object reference. Then lines 2, 3, and 9 use the new "unary dot" notation to call the C, C, and C methods of that object. Without having to explicitly and repeatedly re-refer to the C<$obj_ref> variable. Similarly, the inner C (line 4) lexically aliases C<$_> to the C<%data> hash, enabling its various entries to be accessed without having to explicitly write C<%data> everywhere. So too, the C statement at line 7 defaults to printing C<$_>, and hence prints the updated hash. The Perl 5 equivalent of this code is considerably more cluttered with repeated referents, as Listing 4 demonstrates. [ BEGIN Listing 4 -- Repetitious referents in Perl 5 ] C< 01 do { 02 $obj_ref->synchronize(); 03 %data = $obj_ref->get_data; 04 do { 05 $data{name} = uc $data{name}; 06 $data{addr} = "unknown" if !defined $data{addr}; 07 print %data; 08 } 09 $obj_ref->set_data(%data); 10 } > [ END Listing 4 ] Perl has waited a long time for a real case statement, and the one it's finally getting is the most powerful, flexible, and generalized tool we could invent. SUBHEAD: Takes all types Data types and type specifications take a much more prominent role in Perl 6. It isn't that Perl 5 is weakly typed (as many people seem to think). It's just that it lacks some important compile-time type-specification mechanisms. For example, in Perl 5 there's no (easy) way to set up a variable that is only permitted to store integers. Or only references to objects of class Widget. Or an array whose elements must be character strings. Or a hash whose values must be references to arrays of numbers. In Perl 6 there is. When declaring a variable (with a C or C keyword), the type of the variable can be specified immediately after the keyword: C< my Int $number; my Widget $obj_ref; my Str @strings; my Array of Num %counters; > In fact, Perl 6 variables can be more precisely typed than variables in most other languages, because Perl 6 allows you to specify both the B of a variable (i.e. what kinds of values it can contain) and the B of the variable (i.e. how the variable itself is actually implemented). For example, a declaration like: C< my Num @observations is SparseArray; > specifies that the C<@observations> variable is required to store numbers, but takes the necessary internal structure and behaviour to do so from the C class (rather than from the usual C class) Explicit typing extends to Perl 6 subroutines as well. For example: C< sub Num mean(Int @vals) { return sum(@vals)/@vals; } > specifies that the C subroutine takes an array of integers and returns a number. We could also write that: C< sub mean(Int @vals) returns Num { return sum(@vals)/@vals; } > This extended form is handy when the return type is more complicated. For example, the following subroutine definition specifies that C takes an array of integers and returns another array of integers (namely, the frequency histogram it creates): C< sub hist(Int @vals) returns Array of Int { my Int @histogram; for @vals { @histogram[$_]++ } return @histogram; } > The C notation is an example of the way that compound types are composed in Perl 6. The compound type is constructed by passing the "inner" type (the one after the C) to the constructor of the "outer" type (the one before the C). This allows the outer type (in this example, C) to determine how to implement the required storage and behaviour to allow it to hold integers. Note that most of the subroutines shown throughout this article have proper named parameter lists. Those parameters, just like all other variables, may be given simple or compound types, just as the C<@vals> parameter was in the first line of C. That's "I be given", not "I be given". Explicit typing is entirely optional in Perl 6. It's still perfectly valid to specify a subroutine with neither formal parameter list, nor return type: C< sub hist { my @histogram; for @_ { @histogram[$_]++ } return @histogram; } > Of course, without named parameters, you have to access the subroutine's arguments via the standard C<@_> variable. And now there's no guarantee that each element of that argument list is an integer suitable for indexing the C<@histogram> array. But the point to remember is that this "untyped" version of C isn't untyped at all. It's simply using the standard I (as Perl 5 always does). We could get precisely the same "untyped" effect with explicit typing: C< sub hist(Scalar @_) returns Array of Scalar { my Scalar @histogram; for @_ { @histogram[$_]++ } return @histogram; } > So strong typing isn't optional in Perl 6. Only I strong typing is. In the absence of type specifications, Perl will simply use its default types. And, in those situations where more type-precision is called for, you can explicitly provide it. This is a distinctly Perlish approach to typing: it doesn't get in the way unnecessarily, it tries to "Do The Right Thing" automatically, and it provides a specific syntax to override the defaults for those times when Doing the Right Thing isn't quite the right thing to do. SUBHEAD: Avaunt, foul homonyms! One of the few ways in which Perl 5's "natural language" approach doesn't seem to work so well is in the naming of some of its built-in control structures and functions. For example, the C keyword has three entirely unrelated purposes (it marks expression blocks, calls subroutines, and loads source code from a separate file). Likewise, the C keyword is used both to compile and execute text strings and to intercept exceptions. The C