dcsimg

Programming PIR: Build applications in Parrot’s native programming language

Parrot is not just a virtual machine for running dynamic languages; it also includes several tools for building dynamic languages, including a grammars engine and a tree transformation system. Currently, all of these tools are available through PIR, Parrot's native programming language.

Parrot is not just a virtual machine for running dynamic languages; it also includes several tools for building dynamic languages, including a grammars engine and a tree transformation system. Currently, all of these tools are available through PIR, Parrot’s native programming language. PIR is an assembly language that more or less resembles the actual operations the virtual machine performs when it executes a program. It’s possible to represent every program written in a high-level language hosted on Parrot as a PIR program. You just don’t always want to.

(Actually, Parrot bytecode — the actual instructions Parrot executes — itself is a binary form of an even lower language called PASM, or Parrot Assembly. PIR stands for Parrot Intermediate Representation. You can program directly in PASM if you like, but PIR has a few additional syntactic elements that make it much more pleasant, including a simpler syntax for function calls and parameters, in particular. You can safely ignore the existence of PASM quite happily.)

The PIR layer is also the point at which most language interoperability will occur. That is, because all code running on Parrot is Parrot bytecode, calling a Perl 6 function from a Cardinal (Ruby on Parrot) program that returns an object written in pure PIR should look to Parrot as if all three languages were PIR. (The semantics of native types in each language may differ, and rightfully so, but language interoperability should never change the semantics of data that moves across language barriers.)

In any case, whatever your interest in Parrot, learning PIR is valuable. If you want to write a compiler, right now you need at least some ability to write PIR to maintain your compiler. If you’d like to help develop Parrot, most of the test suite uses PIR very heavily. If you’re curious about virtual machines or have never used an assembly language before, Parrot runs on multiple platforms and can help demonstrate advanced language concepts in a different way.

Opcodes

The core syntactic element of the PIR language is an opcode. Nearly everything else is a hint to the compiler. Even the simple “Hello, world! ” program shows the opcode form:

print "Parrot wants a cookie!\n"

(If you’re reading this by a computer with Parrot compiled and working, don’t try to run this program yet.)

In this case, print is the opcode. It takes one argument, a string. If you look in Parrot’s IO ops documentation (found in the source tree in docs/ops/io.pod, after a successful build of Parrot), you’ll see several variants of the print opcode, varying only in the number and types of arguments. For example, print can also take a single integer, a floating point value (or number), or a PMC. Thus you can also write…


print 77
print 4.2
print some_pmc

… provided, of course, that you have already declared a PMC named some_pmc and that it can stringify itself appropriately. That will make more sense momentarily. Unlike Perl, however, print does not take a list of items to print. This is an invalid operation in PIR:

# invalid PIR code
print 1, 2, 3

Parrot enforces strict typing on its opcodes; the PIR compiler yields an error if you use an opcode with invalid arguments. Though this design decision may seem complex and arbitrary, it actually simplifies most of the code representing Parrot opcodes: it performs very little type checking at run time and can be small and fast, which is important, because a normal program may execute millions of operations.

All of the work that happens in a PIR program occurs in opcodes. Once you know the form of opcodes, you can decipher the arguments. First, you have to understand a little bit about how Parrot processes those arguments.

Registers

As was mentioned last month, Parrot is a register-based machine, not a stack-based machine. This affects opcodes strongly. In the print example, the four, single-argument variants of print each take a different argument type in a different register. Each prints the contents of the appropriate register to standard output. The default example way of writing these four variants is:

print S0      # print a string
print I0      # print an integer
print N0      # print a number
print P0      # print a PMC's string representation

Each register — string, integer, number, or PMC — has a unique name, starting with the first letter of the type of value it holds and followed by one or more digits. You can refer to registers directly, as in this example, but that practice has some limitations. (In particular, it can change the behavior of Parrot’s register allocator such that your program works less efficiently than it could otherwise. Worse, using registers with high numbers can make your program use more memory than it needs.) A better alternative is to use symbolic registers. The only syntactic difference is that symbolic registers look like $S0, $I0, $N0, and $P0. That is, they have a leading $ character. The Parrot compiler is free to renumber and shuffle these registers into literal registers as necessary, and does so as efficiently as possible.

All of this knowledge should suffice to understand a slightly longer Parrot program:

set $S0, "Parrot wants a cookie!"
length $I0, $S0
print $S0
print " is "
print $I0
print " characters long!\n"
split $P0, ' ', $S0
set $I1, $P0
div $N0, $I0, $I1
print "There are "
print $I1
print " words and around "
print $N0
print " characters per word.\n"

The program creates a string in virtual string register $S0. It takes the length of the string in characters and stores that value in virtual integer register $I0, then prints a message to that effect. Next, the program splits the string on the space character and stores the resulting PMC in the virtual PMC register $P0. Finally, it stores the number of elements in the PMC in $I0 and divides the total number of characters in the original string with the number of words in the sentence, to give a not-quite-useful statistic.

There’s one tricky opcode in this program, set $I0, $P0. It may help to think of this sort of as a cast or a context-based coercion, in Perl 5 terms. It also demonstrates an important property of opcodes: the first register in a multi-register opcode is always the destination. You can read this operation as “set the integer register $I1 from the PMC in $P0. “

What will $I1 contain? It must be an integer, or this program would never work. Like evaluating an array in scalar context in Perl 5, this operation produces the number of elements in the array. However, the context here is an integer context and the relevant opcode is set($I0, $P0). Internally, Parrot calls a vtable entry on the PMC to fetch the integer value of the PMC. It doesn’t matter what type the PMC is; all PMCs have this vtable entry. (Some of them may throw exceptions, but none crash Parrot.)

Additional Syntax

Though the standard opcode form wins points for regularity, it’s not always the clearest syntax. This is particularly true for assignments, such as length $I0, $S0 or set $I0, $P0. PIR allows an alternate form for such opcodes; this form resembles that of a procedural language more directly:

$I0 = length $S0
...
$I1 = $P0

In general, this syntax works for all opcodes that take a destination register. For the sake of maintainability, use it only where it makes the most sense.

Speaking of maintainability, you’re probably sick of reading and remembering what $I0 is versus $I1. That’s why PIR has named registers. Think of them like variable names. Start by declaring the names:

.local string exclamation
.local int    num_characters, num_words
.local num    chars_per_word
.local pmc    words

Each declaration starts with the .local directive; next comes the register type (necessary because the names have no other connection to the register type). The final component is the actual identifier itself, which follows standard rules of identifier naming — word characters only, no overlap allowed with keywords (in this case, opcodes), and must start with a letter character.

Combining only these two additional syntactic features makes the silly counting example much more readable, if slightly longer:

.local string exclamation
exclamation = "Parrot wants a cookie!"
.local int num_characters
num_characters = length exclamation
print exclamation
print " is "
print num_characters
print " characters long!\n"
.local pmc words
words = split ' ', exclamation
.local int num_words
num_words = words
.local num chars_per_word
chars_per_word = num_characters / num_words
print "There are "
print num_words
print " words and around "
print chars_per_word
print " characters per word.\n"

If you’ve read very carefully, you probably notice that this example replaced the use of the div operator with an infix /. Parrot allows this for certain mathematic operators. While the resulting code is still somewhat verbose, it’s fairly readable for an assembly language. The only obvious point of complaint may be that you can only put a single expression on a line; this makes the logical print statements span multiple lines. If you can deal with that limitation, you’re well on your way to becoming a PIR programmer.

Compilation Units

Having gone through all of that work to show examples of working, if trivial, PIR programs, it’s time to confess that, as shown, they don’t actually work. A crucial point has been omitted to lead up to another piece of syntax. Fortunately, it resembles syntax you’ve already seen.

All PIR code must belong to a compilation unit. In terms of existing programming languages, this means that all code must be part of a function or subroutine. It can’t float freely around in the aether somewhere. Thus, you must define a subroutine to hold even the simple” Hello, world!” program:

.sub 'main'
  print "Now /this/ is a working PIR program!\n"
.end

Like register name declarations, subroutine declarations start with a leading period. Next comes the name of the subroutine, in this case main. Unlike register names, subroutine names can contain all sorts of interesting characters, including spaces and punctuation — if you quote them. If you make a habit of quoting subroutine names, you’ll save yourself trouble later. Consistency helps tremendously when programming in an assembly language.

As in many languages that force all code into compilation units, there must be a single, unambiguous entry point into a program. In Parrot, that’s the main subroutine. Unlike C and Java, for example, the actual name of this subroutine doesn’t matter. If you have more than one subroutine in a file, the first one that appears is the main subroutine. (If you have only one subroutine in a file, it becomes the main subroutine by default.) Otherwise, you can mark a subroutine explicitly as the main subroutine, in which case it can appear anywhere in the file.

.sub
  print "world!"
.end
.sub 'main' :main
  print_hello()
  print_world()
  print "\n"
.end
.sub 'print_hello'
  print "Hello, "
.end

The marker is the :main adverb attached to the first line of the subroutine declaration. This example also shows how to call subroutines. Simply refer to them by name and add any arguments in the bracketed list.

More on PMCs

The examples shown last month used an Array PMC, but glossed over most of the type’s behavior. The split opcode returns an array-like PMC of some specific type.

...
.local pmc words
words = split ' ', exclamation
$S0   = typeof words
print "Array is a "
print $S0
print " PMC\n"
...

When given a PMC as an argument, the typeof operator returns a string representing the specific type of that PMC. If you think of PMCs as objects in PIR, you’re mostly right. In this particular case, the result is ResizableStringArray, which means that it contains only String PMCs and that you can add or remove elements arbitrarily; there’s no fixed size.

Aggregate PMCs contain only PMCs or strings as values, never numeric primitives. The built-in Hash and Array PMCs autobox integers and numbers when storing them and autounbox them when retrieving those values. You generally don’t have to think about this, but if you were curious about how Parrot handles pass-by-value versus pass-by-reference semantics, this will give you an interesting hint.

You also saw that retrieving the integer value of an array PMC gives you the number of elements in the array. Similarly, assigning an integer value to the PMC sets the number of elements in the array:

...
.local int num_words
num_words = words     # words is 4, so num_words is now 4
words     = 2         # words is now 2; the final two elements are gone
num_words = words     # num_words is now 2
...

This is useful if you want to pre-extend an array (especially a non-resizable array.) It demonstrates that many of the assignment possibilities from primitives to PMCs are bidirectional. In the same way, creating PMCs from primitives and retrieving primitive values from PMCs should look natural:

.local pmc boxed_string
boxed_string    = new 'String'
boxed_string    = 'Brad and Jack are cats'
.local string cat_fact
cat_fact        = boxed_string
.local pmc boxed_integer
boxed_integer   = new 'Integer'
boxed_integer   = 42
.local int ultimate_answer
ultimate_answer = boxed_integer

You get the idea. The only new syntax there is the use of the new opcode. It creates a PMC of the named type. (The same syntax instantiates objects from classes defined in Parrot, but you don’t need to know that to use PMCs. Objects and classes in Parrots are themselves PMCs.)

With that digression complete, it’s possible to talk about more interesting features of aggregate PMCs. Getter and setter access for array and hash elements uses the keyed syntax:

.local pmc cat_colors
cat_colors         = new 'Hash'
cat_colors['Jack'] = 'brownish gray'
cat_colors['Brad'] = 'brown, dark brown, and white'
$S0                = cat_colors['Jack']
print "Jack is "
print $S0
print "\n"

Similarly, access array elements numerically with numeric keys between square brackets:

.local pmc book_list
book_list    = new 'Array'
book_list    = 10
book_list[0] = 'Nine Princes in Amber'
book_list[1] = 'The Guns of Avalon'
...
book_list[9] = 'Prince of Chaos'
$S0          = book_list[4]
print "Book 5 (arrays start at 0) is '"
print $S0
print "'\n"

Here, the only line worth noting is the third one, which sets the size of the array to 10 elements. It’s possible to create instead a ResizableStringArray and push the books on instead. Only the initialization step would change; the array access remains the same and the assignment of 10 would be unnecessary.

Accessing aggregates with literal keys is easy enough, but it’s not always sufficient. Sometimes you need a little bit more indirection. That’s why you can use values in registers (named or not) or PMCs as keys:

.local string cat_name
cat_name = 'Brad'
$S0      = cat_colors[cat_name]
print cat_name
print " is "
print $S0
print "\n"
...
.local pmc book_number
book_number = new 'Integer'
book_number = 9
$S0         = book_list[book_number]
print "Book "
print book_number
print " is '"
print $S0
print "'\n"

Similar to Perl 5, it’s important to be mindful of the context of particular operations. For example, sometimes it’s better to be explicit about retrieving the string value from a PMC then relying on Parrot to do it for you automatically. It’s not that Parrot can’t or won’t do so, but that it’s not always obvious when looking at PIR code what you expect to happen. You may choose to eschew the use of numbered registers, even symbolic registers, except in well-known idioms, and then almost never anything more than $S0 or $I0.

Sometimes even accessing elements through variables isn’t enough; sometimes you need to iterate over the contents of an aggregate. In those cases, you need a little bit of control flow and the Iterator PMC. Hang on, there’s still a little bit of syntax remaining. First, here’s how to iterate through an array:

.local pmc iter
iter  = new 'Iterator', book_list
iter_loop:
    unless iter goto iter_end

    .local pmc title
    title = shift iter

    print title
    print "\n"

    goto iter_loop
iter_end:

The first new piece of syntax is the use of the additional argument to the new opcode for an Iterator. Pass in the PMC over which you wish to iterate. The rest of the code is a standard iterator idiom.

The outdented lines with colons are labels, as you might find in C or similar languages. At this point, you may have forgotten briefly that PIR is still an assembly language. Let this goto-based control flow remind you that it is, and deliberately so. These labels exist as targets for the two goto operations.

After creating an iterator, its boolean value os true if there are elements remaining in the iteration. Think of this like a foreach loop in languages that support such syntaxes for collections. The unless...goto... statement checks the truth or falsehood of the register given as its first argument when deciding whether to jump to the label given as the second argument. While there are elements left in the iteration, iter is true in a boolean context, and Parrot continues to the next statement. When the iterator is exhausted, iter is false in a boolean context and Parrot will jump to the iter_end label.

Within the body of the loop (it may not look much like a loop to you now, but it is a loop, and this is the standard Parrot idiom for iteration over an array), you can shift the current element off of the iterator. This operation must be destructive, or else the iteration would never complete. Of course, it’s only destructive to the iterator itself, not to the aggregate or to any other iterators on the aggregate. (If that doesn’t mean anything in particular to you, just remember that it probably does exactly what you expect.)

If you want to iterate over the indices of an array, you don’t even need an iterator. Create a similar loop, checking a counter against the number of elements of the array as your break condition.

Iterating over a hash is similar to iterating over array elements, except that the iterator returns the keys of the hash, not the values. Also, most Parrot hashes are unordered (except for OrderedHash, but you don’t need to know that right now).

.local pmc iter
iter = new 'Iterator', cat_colors
iter_loop:
    unless iter goto iter_end

    .local pmc key, value
    key   = shift iter
    print key
    print " is "
    value = iter[key]
    print value
    print "\n"

    goto iter_loop
iter_end:

One new syntactic element here is the declaration of multiple named registers in a single line. One interesting point is that you can access the value through the iterator without having to refer to the original hash. This is convenient and allows you to pass around the iterator by itself and still have access to all of the elements you want without also having to pass around the hash.

(Slightly) Advanced Subroutines

Even with the ability to deal with aggregate PMCs dynamically, programs with hard-coded data aren’t very useful. Useful programs almost universally take some sort of user input. The easiest place to do so is from the command line. As in C, all arguments on the command-line become parameters to the main function in an array. It’s easy to write a program that greets the user whose name is the single argument to the program:

.sub 'main' :main
    .param pmc args

    .local pmc name
    name = shift args

    print "Hello, "
    print name
    print "!\n"
.end

There are two new features in this program. The first is the use of the .param directive, which both declares a named register and identifies a required subroutine parameter. Thus, main takes one argument, an array PMC. (This is a special case within Parrot; your main doesn’t always have to take this argument.) Unlike C, there’s no argc parameter; you can get the number of arguments in args by examining its integer value.

The same .param mechanism works for declaring subroutine parameters throughout Parrot subroutines. As with declaring .local variables, you must provide the type of the register and a unique identifier following the naming guidelines. Unlike .local variables, the order and placement of the declarations matters. To a certain degree, Parrot’s default argument passing is positional. That is, it works like you expect; the first PMC parameter in the list at the point of the call is the first one you get in the parameter list.

Interestingly, order matters only within a register set. These two parameter lists are equivalent:

.param int count
.param pmc item_to_repeat
...
.param pmc item_to_repeat
.param int count

Enough discussion; here’s one more piece of code that should help fix these ideas in your head:

.sub 'main' :main
    .param pmc args

    .local string progname
    progname = shift args

    .local int arg_count
    arg_count = args

    .local pmc funcs
    funcs = get_funcs()

    .local int counter
    counter   = 0

    loop_start:
        if counter == arg_count goto loop_end
        .local pmc arg
        arg  = args[counter]

        .local int idx
        idx  = counter % 2

        .local pmc func
        func = funcs[idx]

        func( counter, arg )

        inc counter
        goto loop_start
    loop_end:
.end
.sub 'even'
    .param int counter
    .param pmc arg

    print counter
    print " is "
    print arg
    print "\n"
.end
.sub 'odd'
    .param int counter
    .param pmc arg

    print arg
    print " is at "
    print counter
    print "\n"
.end
.sub 'get_funcs'
    .local pmc func, funcs
    funcs    = new 'Array'
    funcs    = 2
    func     = find_global 'even'
    funcs[0] = func
    func     = find_global 'odd'
    funcs[1] = func

    .return( funcs )
.end

Yes, it’s a silly example, but it demonstrates many of the features explained in this article. As usual, there are new pieces of PIR here.

The get_funcs subroutine shows the use of the find_global opcode. This takes the string name of a globally-stored PMC and returns it, if it exists. In this case, it looks up the Sub PMCs for the even and odd functions. These aren’t just function pointers; they’re first-class objects in Parrot. The .return() directive is how you return values from Parrot functions. This one returns only one value, but you can return as many as you like.

Within main, the loop looks up either the odd or even Sub PMC in the array returned from get_funcs and invokes it through the PMC, passing the counter and the argument itself. (If you noticed that there is a potential syntactical ambiguity where local register names may conflict with function names, you’re right. Parrot provides other features which ameliorate the problem somewhat, but this PIR feature is currently an open topic for discussion.)

Conclusion

All of this knowledge is still only a fraction of the power and features of PIR (try saying that about most assembly languages). Yet it’s almost enough to write useful programs. If you’d like to experiment with other features, skim the opcode and PMC documentation in doc/ops/*.pod and doc/pmc/*.pod in a Parrot tree after compilation. Of the other documentation, doc/intro.pod, doc/faq.pod, and doc/gettingstarted.pod are decent places to continue your journey. As well, O’Reilly Media has generously donated the copyright of their Perl 6 and Parrot Essentials to The Perl Foundation, and the latest version of the Parrot chapters is available in the Parrot tree under docs/book/*.pod.

Of course, the documentation of an active project is always in flux. As with other free software projects, the Parrot developers welcome your questions and contributions.

Just imagine, if Parrot provides all of these features natively — and they’re only a fraction of the features Parrot provides — what powerful features will the next generation of dynamic languages support, and how easy will it be to build them on Parrot?

Fatal error: Call to undefined function aa_author_bios() in /opt/apache/dms/b2b/linux-mag.com/site/www/htdocs/wp-content/themes/linuxmag/single.php on line 62