A Tour of Parrot

Besides serving as what is hoped to be an appropriate and effective platform for Perl 6 and other languages, Parrot is a grand experiment. Is there a better way to write a compiler than using the lex and yacc model? Can languages interoperate at a deeper level than sharing a set of calling conventions? Here are some answers.

Like the best April Fool’s jokes, Parrot took on a life of its own.

Simon Cozens, the editor of Perl.com, published a joint interview with Perl designer Larry Wall and Python designer Guido van Rossum announcing that both languages were to merge into a single language named “Parrot” (http://www.perl.com/pub/2001/04/01/parrot.htm). As Perl 6 was under heavy design at the time, it seemed obvious to Larry (or so said the interview) that merging with Python was appropriate. Within a week, Simon owned up to the hoax (http://www.oreillynet.com/pub/a/oreilly/news/parrotstory_0401.html).

Somehow, though, the idea stuck. Perhaps merging two languages with different philosophies and designs is a bad idea, but allowing them to interoperate on a single virtual machine — to use Perl libraries from Python, and vice versa, in the same program — might have tremendous benefits. Besides, Perl 6 needed a new virtual machine (VM) anyway, as the internals of Perl 5 were aging and it wasn’t clear that the code could cleanly or easily support the new features of Perl 6. Thus, work actually began on Parrot, a virtual machine designed to allow dynamic languages to interoperate.

Features of the Design

The short list of useful languages includes Python, Ruby, Lua, Tcl, PHP, Scheme, and JavaScript/ECMAScript. A dream list might include older languages, such as PL/1, Perl 1, and Smalltalk. Parrot’s original architect, Dan Sugalski, even ported an aging 4GL compiler to Parrot to provide a better platform for a client.

With a VM that supports all of those languages, there’s no reason not to allow people to build new languages, especially little languages designed for a simple goal or aimed at one particular problem domain. The tools required to build a big language like Perl 6 should certainly be powerful and effective enough to build a smaller language.

Beyond better compiler tools, code reuse is also nice. Why should every language implementor have to port to various Unix- like platforms with different notions of how complex numbers work? Why would anyone want to write yet another “mark and sweep” garbage collector? How many regular expression libraries or Unicode collation routines do you want to debug? (If any of these tasks sound like fun, Parrot could use you!)

However, there are already plenty of good virtual machines in the world. Why build a new one? Parrot’s design choices are different from those of other systems, for good reasons.

Registers

A simple compiler design uses a stack to pass arguments and results. For example, imagine a simple calculator that multiplies the numbers 6 and 9. While you might write this operation as 6 x 9 or 6*9, the compiler likely represents this operation in a tree structure, where the root is the multiply operator and the children are the integers 6 and 9:

multiply  |  +-- 6      |  +-- 9

When the virtual machine executes these operations, it first executes the operation of each child and then executes the operation of the parent. This depth-first traversal allows each child node to be evaluated first — for instance, perhaps the child nodes of the multiple operator are expressions themselves, such as 2 x 3 — so that the machine evaluates all of the elements of the expression in the proper order.

CPUs and virtual machines don’t necessarily run trees, though. Trees are effective representations of programs, but they aren’t always the most concise or efficient form for execution. A simple assembly language’s representation of this program might be:

push 6push 9multiply

That is, first push the arguments onto the stack, then multiply the two topmost entries together. In addition to performing the computation, multiply also affects the contents of the stack. Why? Because multiply might look like:

pop :r1pop :r2_asm_mulpush :r0

Here, multiply pop s the first argument off the stack and stores it in an actual machine register, does the same for the second argument, calls the native multiply operation, and then pushes the result, found in register 0, onto the stack again.

Because every operator uses the stack, it is simple to translate from the tree form of operations shown earlier into expressions, and it’s easier to perform nesting operations. 2 x 3 would already be on the stack in the correct place, if that expression tree replaced the constant 6.

Perl 5 uses this system, walking the optree in execution order and pushing to and popping off of a stack. Perl 5 consequently spends a lot of time manipulating the stack. Every built-in operation accesses the stack. While this model is conceptually simple, supporting variadic or complex arguments (such as those passed by reference or aliased parameters) becomes more difficult. Perl 5′s reference-counting scheme is also sometimes at odds with the stack.

Parrot chooses an alternate approach: It uses registers, as does the CPU of your system. To multiply two numbers, Parrot puts the numbers in two appropriate registers, and invokes the multiply operator. It performs the multiplication and puts the result in a third register.

Why is this useful? First, it removes some complexity from built-in operators. Instead of manipulating the stack, they can access the memory locations of registers directly. While it’s possible to access the stack as if it were a series of contiguous registers, the conceptual overhead of pushing several elements onto the stack before a call, popping those elements off of a call, pushing several elements onto the stack before a return, and popping those elements off of the stack after a return, is significantly different than accessing specific registers.

Second, in certain cases, registers can map directly to actual CPU registers, bypassing the stack-to-CPU operation in the multiply example. (Simple operators executed often in tight loops — CPU-bound number-crunching or bit-twiddling — benefit most from this optimization.)

The final advantage is greater. Parrot uses a system known as Continuation-Passing Style, or CPS, to pass control between operations. A continuation represents a point of control — think of it as the context of an operation. CPS passes a continuation as an invisible parameter to all operations. In Parrot, this continuation includes the register set. User code running on Parrot can even grab the current continuation and manipulate it to create its own flow of control, and can even interrupt an operation in progress and replay it later.

A stack machine could indeed simulate registers through simplified stack offset access, but there are memory and garbage collection implications. This approach would also complicate the use of CPS, at least in allowing user-level code to access the current continuation. For Parrot, the register model was simpler to implement.

Unlike most CPUs in existence today, Parrot provides an unlimited number of potential registers. Parrot’s internal bytecode compiler determines the optimal amount of registers necessary for any particular compilation unit.

Parrot Magic Cookies

Parrot has four base types: integers, floating point numbers, strings, and Parrot Magic Cookies. Everything that’s not one of the former three types is a PMC, including everything from complex numbers, to aggregate data structures, continuations, and objects (in the object-oriented sense).

Parrot also has four types of registers: integer, number, strings, and PMCs. Where a stack-based machine might push mixed arguments onto a stack before a function call, Parrot stores incoming arguments in specific registers. The function knows to look in appropriately typed registers for the correct parameters, obviating the need to check types (or assuming correctness) when popping arguments off of the stack.

A PMC is an object, in the sense that each one encapsulates its own data and interacts with the rest of the world through well-defined interfaces. This is an idea borrowed, in part, from Perl 5, which defines all language-visible data structures in terms of Scalar Values or SVs. Every type of value in Perl 5 is an SV or a descendent of an SV that adds behavior. For example, Array Values, or AVs, support aggregate operations and indexed access to data.

Unfortunately, Perl 5 reaches into the internals of SVs everywhere. This forces all current and future data structures to fit in a certain layout and structure in memory. There’s little encapsulation at the lowest levels. Supporting new operations or changing the way Perl behaves for even just one specific instance of a data structure is complex and difficult.

PMCs fix that design flaw. Every PMC supports a minimal set of behavior known as the vtable interface. Like classes, PMCs can extend this interface with other behavior and can inherit from other PMCs, but the uniform access principle and the enforced encapsulation remains. It’s not important to know how a PMC implements a behavior, merely that it does.

Consider creating an object that supports indexed access into a collection, such as an array. In Perl 5, this requires setting a flag marking a special type of magic behavior on the SV. Every operation in Perl 5 that accesses an SV’s internals has to check for specific magic flags. If any are present, the operation must perform different behavior. Multiply every operation that accesses the internals of an SV by over three-hundred operators in Perl 5 by several different types of magic and you can readily see the problem.

In contrast, a PMC’s vtable maps its required behaviors to code that performs the appropriate behaviors for that PMC. Changing the behavior of a particular PMC requires you to change only the association for that behavior. None of the code that merely uses the PMC knows that the behavior has changed at all. Parrot’s operators are blissfully unaware of the internal structure of PMCs.

PMCs play important parts in many of Parrot’s best features. In particular, the vtable approach allows PMCs written in multiple languages to interact well. Though the syntax for working with integers or lists or large numbers (to say nothing of classes and objects and functions) differ between Perl 5, Perl 6, Python, Ruby, Scheme, and Lisp, language-specific PMCs still maintain the same interface to get a raw string from a PMC, to invoke a PMC representing a function, or to look up a method on an object. Though interoperability is still a difficult problem to solve, providing a complete and basic set of behavior removes some difficulties.

Compiler Tools

Though a virtual machine’s internal instructions often resemble an assembly programming language — and especially in Parrot’s case, with its pervasive use of registers — there’s no requirement that its native programming language provide only the features of an assembly language.

Parrot’s native language is PIR, the Parrot Internal Representation. It’s an object-oriented, garbage-collected assembly language with continuations, coroutines, and complex data structures. Anything you can do in a higher level language, you can do in PIR; that’s how other languages run on Parrot.

PIR is an interesting platform for development. Though it supports all of Parrot’s interesting features, it’s still a line-oriented assembly language. Certain operations are more tedious than they would be in a traditional programming language. In particular, working with nested data structures can take multiple lines of code compared to Perl, Python, or Ruby. As well, there’s little support for advanced flow control (beyond jumps and branches) and no support for nested expressions.

PIR is sufficient for writing compiler tools, however. Parrot includes a suite of tree transformation tools, as well as a powerful grammar system based on Perl 6 grammars (http://dev.perl.org/perl6/doc/design/syn/S05.html).

To implement a language on Parrot, you first write a grammar describing the syntax of the language. PGE, the Parrot Grammar Engine, compiles this grammar into a Parrot program, which builds a parse tree from source code. Each node in the tree has a name corresponding to the rules and tokens in your grammar.

Names are important. TGE, the Tree Grammar Engine, uses a set of rules to transform one type of tree into a second type of tree, one node at a time. A very simple language might need only one transformation from the parse tree into a tree that can become PIR code or Parrot bytecode, while a complex language such as Perl 6 may have at least four stages, with additional optimization stages possible.

Why not translate directly from source code into machine instructions? It’s possible to do so for very simple cases, but general purpose programming languages are often too complex to translate with such ease.

Let’s look at a simple case. A parser for the very simple multiplication language shown earlier could be as concise as:

  rule operator {    | '*'    | 'x'  }  rule term {    | <integer>    | <number>  }  rule expression {    | <term>    | <expression> <operator> <expression>  }

An operator is a multiplication symbol, either the lower-case x or the * character. Within the grammar, the vertical bar, or pipe character, marks alternatives.

A term can be an integer or a number. (Assume that these are rules built into Parrot, and that they conform to your idea of how an integer and a floating-point number should look.)

An expression can be either a term or an expression followed by an operator and another expression. This isn’t necessarily the optimal way to write a grammar, but it demonstrates essential features of the grammar.

When PGE parses the program 6 x 9, it produces a parse tree that resembles this:

+- expression (6 x 9)|+--- term (6)|+--- operator (x)|+--- term (9)

Compare that to the optree representation earlier, which had the term s as the children of the multiply operator. Again, it’s possible to transform a tree this simple into the optree shown earlier with some degree of ease. Consider a more complex program, however:

6 * 3 * 3

There are two possibilities for the parse tree, depending on an important characteristic of the parser (which is unimportant to this specific expression since the order of mathematical operation is arbitrary). It could resemble…

+- expression (6 x 3 x 3)|+--+ expression (6 x 3 )|  ||  +--- term (6)|  ||  +--- operator (x)|  ||  +--- term (3)|+--- operator (x)|+--- term (3)

… or it could resemble this:

  +- expression (6 x 3 x 3)  |  +--- term (3)  |  +--- operator (x)  |  +--+ expression (3 x 3)     |     +--- term (3)     |     +--- operator (x)     |     +--- term (3)

In both cases, the nested expression complicates transformation into machine instructions. It’s still fairly easy to produce the correct answer because both operations are multiplication, and it doesn’t matter if you multiply six by three or three by three first, as you still get 42 as the answer (at least in this programming language).

What if you follow the precedence rules of arithmetic and introduce a different operator? Both the order of parsing and the order of evaluation matter! A translation from the input form to machine instructions grows more difficult.

With Parrot’s compiler tools, you write one rule for each type of token you need to transform. For this language, it would look something like:

transform past (expression) {    op      = node[ 'operator' ]    l_exp   = node[ 'expression'; 0 ]    r_exp   = node[ 'expression'; 1 ]        l_node  = tree.transform( l_exp )    r_node  = tree.transform( r_exp )        past_op = PAST::Op.new( :name( op.value ) )        past_op.add_child( l_node )    past_op.add_child( r_node )        return past_op}

That’s not exactly the Parrot syntax, but it’s similar. TGE starts at the top node of the tree to transform and calls the transformation rule that corresponds to the name of the node. That transformation rule creates nodes in the transformed tree by collecting and transforming information from the input tree into a form more suitable for producing PIR or Parrot bytecode. In this case, the first step is to turn a parse tree into a Parrot abstract syntax tree (PAST).

For every expression, this rule creates a PAST::Op object, which represents an operator. The name of this object comes from its value in the parse (x or *). Each op object has two children, either PAST::Val objects created by the term rule, or PAST::Op objects called from processing the expression rule again. The transform method calls on the expressions retrieved from the node itself produce either node appropriately.

The resulting parent PAST::Op returns to whichever rule called this rule. The final tree may resemble:

+-- PAST::Op (name: x)|+---- PAST::Val (6)|+---- PAST::Val (9)

This is much closer to the eventual virtual machine operations. Converting this to simple machine code is, again, trivial. Why bother with tree transformations, when transliteration would suffice? The PAST for the nested operations (6 x 3 x 3) is instructive:

+-- PAST::Op (name: x)|+---- PAST::Val (6)|+-+-- PAST::Op (name: x)    |    +---- PAST::Val (3)    |    +---- PAST::Val (3)

Generating the correct output while respecting mathematical precedence requires evaluating the expression 3 x 3 before the expression 6 x nested result.

A second tree transformation stage works much like first. It converts PAST to POST (Parrot Opcode — or Output — Syntax Tree). POST is close enough to PIR code that a final tree transformation can produce PIR code which you can inspect, compile, or run directly. A planned transformation layer will produce Parrot bytecode directly, which can run on machines without a Parrot compiler, just the runtime libraries.

Fortunately, most of the languages hosted on Parrot have similar requirements and can share most of the node types and many of the transformation and optimization rules.

Because all Parrot-hosted languages eventually compile down to Parrot bytecode (whether directly or through PIR), they can interoperate at this level as well. For example, the Pheme language — a Scheme variant — has a self-hosting test suite. All of the tests are themselves valid Pheme code that takes advantage of a testing library written in PIR and made available to Pheme through a few lines of glue code. itself all written in PIR.

By the way, while you cannot actually define generic functions in Pheme (also sometimes called multiple dispatch, where the type of the parameter — string, number, object — determines which variant of several synonymously-named functions to call), it can use them.

Parrot supports generic functions natively, and the PIR test library takes advantage of this feature. In particular, the is() function, which compares two items for equality, has several different implementations. One compares two integers; one compares two floating-point numbers; and one compares two strings.

Pheme knows nothing about this system. Just call in with two arguments, and Parrot calls the correct version of the generic function appropriately:

  (is (/ 999  3)  333) "integer division")  (is (/ 105 10) 10.5) "integer division producing a float")

(Pheme’s source code resembles the operator-centric tree in some ways but not in others.)

It’s not important to understand generic functions or the mechanics and semantics of the Pheme language. What is important point is that compiler writers and language designers can take advantage of Parrot features even if those features are not part of the languages themselves.

Building Parrot

Building Parrot requires a working Perl 5.8 installation. 5.8.6 and newer are better, but 5.8.0 should work. For best results, install the Bundle::Parrot module from the CPAN first (see http://search.cpan.org/perldoc?Bundle:: Parrot, or read perldoc CPAN for assistance).

Parrot also requires a working C compiler and environment, including a make utility, a linker, and the appropriate system headers. GCC 3.x works, though GCC 4.x is better.

The most recent released version of Parrot is always available within a day or two of release from http://www.parrotcode.org/source.html. That page also describes how to download the current version from the public Subversion repository. If you have Subversion installed, getting the tree is as easy as:

$ svn co https://svn.perl.org/parrot/trunk \   parrot 

Unlike most new free software programs, Parrot does not use autoconf and automake. Instead, configure Parrot with:

$ perl Configure.pl 

This probes your system and create necessary Makefile s. From there, build the system with make and run the tests with make test. In a few minutes, you should have a shiny new Parrot to play with.

If not, please read README and docs/intro.pod for advice and instructions on reporting bugs and getting assistance from either the mailing list ( parrot-porters@perl.org) or IRC (#perl on irc.perl.org).

If you’ve installed Bundle::Parrot or if you already have Test::TAP::HTMLMatrix installed, run the full test suite for Parrot itself or for the bundled languages hosted on Parrot with make smoke and make languages-smoke. These targets collect your results and report them to the Parrot smoke server (http://smoke.parrotcode.org/smoke), which allows the development team to catch regressions and failures quickly.

What’s in the Tree

Parrot is a large project, and it has several distinct pieces. The source tree reflects this.

  • src contains most of the source code for Parrot itself. For the most part, each subsystem has its own subdirectory: io, gc (memory management), jit (the Just-In-Time compiler to native machine instructions).

  • src/pmc contains .pmc files, which are pre-processed files that become C code that implements the various core PMCs. Read the documentation with perldoc pmc_name.pmc.

  • src/ops contains .ops files, which are pre-processed files that become C code that implements various core operations. Read this documentation with perldoc ops_category.ops.

  • languages contains several languages implemented in part or in whole on top of Parrot. In particular, abc is the easiest example to understand, while perl6 is the most complex, and either punie or tcl is the furthest along.

  • compilers contains various compilers and compiler tools, of which pge and tge are the most important right now.

  • compilers/imcc contains Parrot’s internal PIR compiler.

  • docs contains developer documentation. Of particular note is docs/pdds, which contain the Parrot design documents, the specifications for various Parrot subsystems.

  • t contains several subdirectories of automated test files to exercise Parrot’s defined and existing behavior, as well as adherence to coding standards. (Each language itself has its own test suite in a subdirectory of t.)

  • lib contains Perl modules written and included to assist in building, configuring, and testing Parrot.

  • runtime contains PIR modules written to assist in testing or using Parrot, as well as a small standard library.

  • examples contains standalone example code, including graphical games written using Parrot.

It seems like a lot of code, and it is. As of this writing, there are nearly 3,200 files in the tree. Yet the design of Parrot means that compiler writers rarely need to know the internals of Parrot, beyond where to find documentation on the features and the interfaces of specific operators and PMCs.

Concluding Thoughts

At the end of 2006, Parrot moved to a monthly release cycle, with a dedicated Bug Day conducted in #parrot on irc.perl.org the Saturday before the release. Since that time, development velocity has increased rapidly. By the time you read this, Parrot will likely have its underlying object system implemented fully, with the compiler tools and Perl 6 using this system. (This is the milestone for the 0.5.0 release.)

There are plenty of other available development opportunities, from testing and patching with various compilers and operating systems, to minor Perl and C programming tasks to major cleanups and feature additions.

Besides serving as what is hoped to be an appropriate and effective platform for Perl 6 and other languages, Parrot is a grand experiment. Is it possible to build a large, cross-platform system in C by using pervasive code generation? What concurrency models are appropriate in a virtual machine? Can languages interoperate at a deeper level than sharing a set of calling conventions? Are there better ways to write compilers than using the lex and yacc model?

Parrot’s design reflects some unconventional choices. The design and implementation process hasn’t always been easy, but given current commit rates and the incredible amounts of progress made in recent months, it may be time to give Parrot another look — or a first look.

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62