Think writing a compiler is difficult? It is—unless you use Parrot, a complete compiler construction kit. With Parrot, crafting a new programming language is as easy as authoring a new website.
If you look back over the history of programming languages, surprisingly few new features have been added in the past four decades. Even the most exciting features of modern dynamic languages existed in early languages such as Lisp and Smalltalk. Certainly, the features have been combined in different ways, and there’s progress in the implementation details or interface, but truly new ideas are in short supply.
And yet, there are signs that the current set of programming languages and language features are inadequate for both pressing and emerging problems. For example, support for concurrency is woefully deficient. Hardware is moving rapidly down the path of multiple cores, but software already lags far behind. Few applications take full advantage of the computing power available to them.
To be sure, writing concurrent software is hard, but for all the wrong reasons. Humans are perfectly capable of parallelism (as anyone who has watched TV while reading email, eating a sandwich, and drinking a soda knows). Computers are perfectly capable of parallelism, too. But the available abstractions for humans to tell computers how to behave concurrently are hopelessly primitive. What concurrent programming needs is a flash of inspiration like the one that turned hash tables from an abstract theoretical feature into an easy to use, everyday, don’t-even-think-about-it Dictionary in Python or Hash in Perl.
But programming languages are slow to evolve. Refinement is easier than invention, and a known quantity is a safer investment then speculative research. Academic work gives weight to ideas held to be true by a large number of people, favoring ideas that have been around a while. Corporate R&D is often never released publicly, especially not the new ideas that might turn out to be a competitive advantage.
There are practical hindrances to advancement as well. The state-of-the-art has grown to the point that it can easily take 10 years to implement a throughly modern language with features like garbage collection, exceptions, events, objects, support for multiple character sets, and concurrency. Simply reaching the state-of-the-art is such a monumental achievement that few projects look for what might be beyond.
Adoption patterns also play a part. Language users need more than just a bare language—they need an established body of libraries to solve real-world problems. It’s often said that Perl’s greatest advantage is CPAN, a large repository of user-contributed libraries. But CPAN represents millions of hours of developer time (a conservative estimate of 30 hours per library would be 2 million hours). It takes time to build up enough users to write the libraries, so the language can be useful enough to attract more users.
Changing the Game
So how do you accelerate programming language development? How do you change the game?
Ignoring the pace of academic research and corporate R&D policy, the practical problems are solvable. Parrot’s approach is to bundle up fundamental features into a flexible tool set, and provide access to libraries across multiple languages.
When you get right down to it, it’s a waste for each new language to spend 10 years reimplementing basics like garbage collection. It’s equally a waste to have each language reimplement the same basic modules for interacting with a database, parsing JSON, processing HTML templates, receiving and sending email, handling international dates, and all the other hundreds of little tasks developers need to do every day.
It’s much better to reuse a core implementation of garbage collection. It’s much better to just use the libraries that already exist in other languages. You might use Python’s libraries for natural language parsing, Perl’s libraries for manipulating text files, and PHP for the web interface, and tie them all together in a new language, all running on the same virtual machine.
When you cut out the 10-year core implementation hit and the time for library development, and add in a few more next-generation tools, you begin to measure time-to-completion for new languages in weeks instead of years. Parrot’s record for a language implementation was a conference session where the speaker implemented the basics of the LOLCODE language from scratch as a live demonstration. It was a 5-minute lighting talk.
What’s the impact of accelerating language development like this? The effects are just becoming clear. The past few years have seen substantial growth in the idea of altering the syntax of a language in minor ways to make an advanced feature of a particular library easier to understand. This trend was popularized by the domain-specific languages of Ruby on Rails, but it’s not unique to Ruby or Rails. That’s the first step down the evolutionary path, and an easy one because it keeps the familiar underpinnings of an existing language.
The next step is new languages. Good tools lower the risk of trying out new ideas. Spending 10 years implementing a radical new concept in programming languages, alien to users of existing languages, is a daunting prospect. Spending a few weeks, on the other hand, is quite tolerable. Successful experiments can quickly go on to solve more problems, while the unsuccessful ones can be quickly set aside, applying the lessons learned to the next language.
Dynamic to the Core
Implementing a compiler tool set for multiple languages isn’t a new idea. C compilers have been supporting both C and C++ for decades. The GNU Compiler Collection (GCC) even supports more archaic half-relatives like Fortran and Ada.
What’s new about Parrot is taking the extreme dynamic approach to multi-lingual virtual machines. In many ways, the problem Parrot is trying to solve is substantially simpler than virtual machines like Sun’s JVM or Microsoft’s CLR, which are now working to wed dynamic language features into a largely static system. Their greatest pain points are at the boundaries between static and dynamic.
Static vs. Dynamic
The key difference between static and dynamic languages is whether they perform various actions like subroutine construction, type definition, class creation, or dispatch calculation at compile-time or runtime.
An extreme case of a static language would compile everything down to machine code and allow no variation at runtime, not even user input. An extreme case of a dynamic language would provide no pre-compiled behavior, implementing the entire system, the language, and all features on the fly every time a program ran.
No existing language is entirely static or dynamic, most actually center around a conservative mid-point between the two extremes.
Because of this focus on dynamic languages, Parrot’s architecture is substantially different than its peers. Many static virtual machines are stack-based, meaning they perform low-level operations by pushing values onto a global stack data structure and popping them off again for the operation. Parrot is register-based, storing all values for a particular subroutine in a small structure local to that subroutine. You can think of a register set as something like an array, where you can store multiple values at the same time, and immediately access or modify any of them by an index number (their position in the array). Registers give a modest speed gain, since operations can directly manipulate the values stored in registers, and save the CPU time spent pushing and popping the stack. Registers are also a security advantage, localizing data storage to each subroutine, making it harder to accidentally or maliciously write over critical data with the wrong values. (Didn’t your mother tell you not to use globals?)
Many static virtual machines use that same global stack for control flow. They handle subroutine returns by pushing a return address onto the stack together with the arguments for the call. When the subroutine is ready to return, it pops the address off the stack and jumps to it. Mixing up your user data and control flow data this way may sound a little dangerous. It is. “Stack overflow”, “stack underflow”, and “stack smashing” are a well-known terms to developers and black-hat hackers, the source of many bugs and security flaws in current software.
Parrot uses the continuation for flow control, instead of a return address on a global stack. A continuation is a kind of subroutine that takes a snapshot of the state of the virtual machine when it’s created, and restores it when called. Parrot handles subroutine returns by creating a continuation when it makes a call, passing it along as a hidden argument to the call. When it’s ready to return, it calls the continuation (just like it would call an ordinary subroutine), which restores the control flow back to the point right after the original subroutine call.
Continuation-based control flow is more secure both because it avoids mixing user data and control flow data, and because the control flow data itself is an encapsulated subroutine object rather than a raw memory address. Continuations were traditionally considered slow, because in stack-based virtual machines a continuation has to copy the entire current stack to properly capture and restore “the state of the virtual machine”. Eliminating the stack makes continuations quite speedy.
Powerful Compiler Tools
Dynamic languages are notoriously difficult to parse, largely because they emphasise syntax that’s easily understood by humans, instead of encouraging human programmers to think more like machines. Some people blame dynamic languages for being too flexible or too context-dependent, but the problem isn’t the languages, it’s the available tools. Dynamic languages need dynamic parsing tools.
Parrot provides two important pieces here, a grammar engine and a set of base compiler libraries.
- The Parser Grammar Engine (PGE) is a full recursive descent parser and operator precedence parser, with the ability to dynamically switch between the two. It’s powerful, but not difficult to use. You work with it by defining a series of rules that look a lot like simple regular expressions, telling it how to match, for example, a mulitiplication operation or a subroutine call in your language.
- The Parrot Compiler Toolkit (PCT) provides a lightweight language to instruct Parrot how to transform the raw parse result into a language-neutral abstract syntax tree. The backend components of the toolkit turn the abstract syntax tree into native bytecode, so the compiler writer never has to touch anything deeper than the abstract syntax tree.
The unique components for every language are its syntax (the parser) and its semantics (the abstract syntax tree). The base libraries of the toolkit also supply an extensive set of default features for the language implementations, like interactive command-line and debugging tools for the stages of compilation. The 5-minute language implementation was possible because of these default features, plus a script included in the Parrot distribution that generates a working new language shell, ready to be extended for your language.
Openness Drives Innovation
Parrot has had the rare privilege over its eight-year lifetime of seeing its primary goal move from “crazy idea” to the hot center of commercial and academic research. The world changed as the Parrot developer worked, and the team contributed to that change. Sun and Microsoft were some of the strongest critics of the whole idea of a virtual machine for dynamic languages when Parrot got started. Would they now be proudly proclaiming their support for dynamic languages if Parrot never existed? No one can say for sure, but it seems unlikely.
It’s a strong case for openness. The fact that Parrot is completely open, both the specification and the implementation, is a unique strength in pushing forward the state-of-the-art in virtual machines, languages, and language implementations, not only in open source, but also in academic and corporate contexts. Parrot stands apart from the clash of corporate titans, but at the same time influences them. The Parrot authors get to take in the best ideas from a broad range of sources, and contribute back our own innovations. It’s exhilirating to be involved.
You may be interested in the nitty gritty details of how programming languages are made. You may be interested in performance or security. Or you may just be interested in getting the job done well and quickly so you can head out for a beer. Whatever your interest is, it’s worth taking a look at Parrot. You might find it a valuable addition to your toolkit.
is the lead developer of the Parrot project and plans on experimenting with concurrency features, first as extensions to existing languages, and then as the core principle of new languages.