The computing world is undergoing a sea change. As computers become more powerful, high performance computing, an endeavor historically relegated to national labs and government institutions, is becoming mainstream. For example, on a daily basis, search engine companies around the world employ farms of machines executing parallel code. Additionally, global networks of privately-owned computers harness processing power - think Folding@home and SETI@home - tackling some of the most computationally challenging problems of modern science. At the same time, microchip manufacturers, in a continuing effort to improve performance, are designing new chips that contain increasing numbers of cores (processors) on a single chip.
Ironically and unfortunately, modern programming languages are ill-equipped to keep pace with these changes. To exploit those cores, programs must execute not only in parallel, but with a degree of parallelism that varies from platform to platform. Portable programs cannot be tied to a specific number of cores, because that number will vary across machines.
The same issue arises when writing programs for large clusters: Programs must execute in parallel on a large and varying number of processors. Most languages don’t directly support any notion of parallelism, requiring the programmer to graft additional notation such as MPI or OpenMP onto programs. Even languages with support for parallelism, such as Java, support “heavyweight” notions of parallelism, where threads are explicitly defined and allocated, and the cost of creating a new thread is typically high.
Moreover, to write effective parallel programs, the programmer needs help. It’s notoriously easy to inadvertently create race conditions when writing parallel programs. Existing language features designed to control parallelism, such as locks, are inadequate because they combine poorly. If an application uses two libraries, where each maintains a locking discipline over its respective data structures, there’s no way to ensure that the composition of these libraries in an application remains free of race conditions or deadlocks, since the libraries do not use the same locks.
Compiler writers have tried to find ways to execute programs in parallel automatically, without altering the semantics of the underlying language. Unfortunately, this kind of “autoparallelization” is difficult to do because the language semantics are often violated by parallelization in minor ways.
To address these shortcomings of modern programming languages, Sun Labs is designing Fortress, a new programming language for scientific computing. Fortress was initially developed as part of the DARPA HPCS project to achieve new levels of productivity and performance in high performance computing. It is now an open source project with a prototype implementation freely available at http://fortress.sunsource.net. An active community of programmers external to Sun contributes code and provides suggestions for refining the language semantics.
Fortress includes built-in support to easily define and control parallelism in a portable manner, and also includes many other language features that make it well-suited for both scientific and general-purpose programming. For the scientific programmer, Fortress provides a syntax based on standard mathematical notation, and includes support for concepts that are important for scientific computation, such as vectors, matrices, and physical units (meters, seconds, and so on). For general-purpose programming, Fortress includes a trait-based type system that supports multiple implementation inheritance, first-class overloaded functions with multiple dynamic dispatch, and various facilities for “programming in the large”, such as contracts, automated unit testing, and a component system for developing, deploying, and upgrading libraries and other programs.
Let’s explore the features of Fortress, as well as the principles underlying the design of those features, and how you can participate in its continued development and use.
“To do for Fortran what Java did for C”
The name “Fortress” is derived from the intent to produce a “secure Fortran,” a language for scientific computation that provides built-in support for parallelism, along with abstraction and type safety in line with modern programming language principles, without compromising performance.
Fortress is intended to enable scientific programmers to write safer, more efficient and more portable code by providing many of the benefits that the Java programming language brought to C programmers: a rich object-oriented type system with static type checking, a virtual execution environment with automatic memory management and dynamic compilation, extensive libraries, and built-in support for parallel programming.
Indeed, Fortress goes significantly beyond the Java on each of these aspects. However, Fortress is not derived from Fortran, nor from the Java programming language. Rather, it is an entirely new language designed to meet the needs of scientific programmers.
High-level High-performance Computing
Unlike most parallel languages that augment a sequential language with parallel features or annotations, Fortress is implicitly parallel wherever possible, and provides constructs and annotations to serialize execution when necessary. As a result, a compiler or virtual execution environment need not concern itself with determining whether executing a program in parallel is permissible, only whether doing so is advantageous. (To be sure, this latter question is not trivial; parallel execution might not improve performance if all available processors are already fully-utilized, for example, or if moving data to an underutilized processor is more costly than doing the computation.)
For example, the arguments to a function or method (or the operands of an operator) may all be evaluated in parallel. If the order of evaluation is important, the arguments must be explicitly ordered as statements in a block. Similarly, the iterations of a for loop may be executed in parallel; operations in different iterations may be freely interleaved.
Thus, the for loop…
for i <- 1:5 do
print(i " ")
print(i " ")
end
… prints each integer from 1 to 5 twice, but they may occur in any order, as in:
2 1 3 3 5 2 5 4 1 4
If the loop must be executed sequentially, the programmer must explicitly say so, as in:
for i <- sequential(1:5) do
print(i " ")
print(i " ")
end
Parallelism in for loops is controlled by generators, which produce the values that the loop iterates over. In the examples above, the generator 1:5 generates the integers from 1 to 5 in parallel, while the generator sequential(1:5) generates the integers in order. Generators, like most types, are provided by libraries and include aggregate types such as arrays and sets. Some common generators are shown in Table One. All these generators generate values in parallel.
| Table One: Common Generators in Fortress |
| Generator |
Purpose |
j:k |
A range: the integers from j to k |
j#n |
n consecutive integers starting from j |
{0, 1, 4, 9} |
The elements of a set or list |
a.indices |
The indices of an array a |
Generators are also used in comprehensions and other reductions, which compute some function of each of the generated values and combines the results into a single value, as in:
{ i^2 | i <- 1:5 }
PRODUCT[i <- 1:n] i
C[i,k] = SUM[j <- 1:n] A[i,j] B[j,k], i <- 1:m, k <- 1:p
A naive implementation of a reduction may compute the individual results in parallel, but serialize the combining operations. If, as in all the examples above, the combining operation is associative and commutative, or it has other nice properties, it is possible to do much better. This is especially important for scientific programming, in which such reductions are common. The Fortress libraries support many different reductions, depending on the properties satisfied by the operation.
Note that Fortress specifies when evaluation may proceed in parallel (or rather, when it must not), but does not require it to do so. If there are no spare processors, for example, or distributing the computation is more expensive than doing it, or for any other reason, an implementation may execute parallel tasks sequentially, and in any order. Thus, a programmer should allow as much parallelism as possible, giving the implementation maximal flexibility to execute the program efficiently. Using a technique called work stealing, an implementation can effectively distribute parallel tasks to multiple processors with minimal overhead.
Parallel execution of tasks accessing shared memory can result in surprising behavior due to race conditions. In existing languages, these are typically avoided using locks, which limit the parallelism possible and introduce a host of software engineering problems. To avoid these problems, Fortress supports transactional memory through the use of atomic expressions. An atomic expression appears to be executed in isolation: the evaluation of an atomic expression never appears to be interleaved with operations due to other tasks. Thus, other threads never observe a partially-completed atomic expression. Atomic expressions often allow greater parallelism than locks through the use of speculative execution, and free the programmer from the burden of managing locks correctly.
On large machines, another important factor is data distribution: The cost of access to memory is extremely nonuniform, so it is crucial to place data “near” to the processor that uses it. To manage this, Fortress provides distributions, which specify “regions” of the machine where data resides and computation occurs. These regions are arranged in a hierarchy that abstractly represents the relative cost of access: computation can cheaply access data in its own or nearby regions. As with parallelism, distributions are specified by generators, so that programmers need not think about data locality unless they design their own distributions, which are provided in libraries. Distributions are only advisory, but they are a way for library writers to provide guidance to implementations as to how data should be distributed, a task that thus far compilers have not been able to do very well.
Fortress implementations are likely to aggressively use both static and dynamic compilation to improve performance, and the deployment model of Fortress is designed to support this. For example, Fortress runs in a managed environment, and thus benefits from automatic memory management (garbage collection) and just-in-time compilation. On the other hand, Fortress limits visibility into components via application programming interfaces (APIs), permitting a static compiler to optimize code within a component, as long as it preserves the behavior observable through its APIs. Furthermore, Fortress statically links components, enabling aggressive cross-component optimization at link time.
Mathematical Notation
Perhaps the most striking difference between Fortress and most existing programming languages is its syntax. As much as possible, Fortress syntax resembles the standard mathematical notation used by scientists. Fortress includes rich support for entering and displaying Unicode characters.
For example, Fortress allows the use of Greek letters (and many other character sets) as identifiers, braces to enclose sets, summation notation, and hundreds of different operators, including the use of simple juxtaposition to denote multiplication (as in x y) or function application (as in sin x), depending on the types of the operands. Furthermore, Fortress specifies rules for rendering programs according to standard mathematical conventions, and for entering programs easily with a standard keyboard.
For example, the righthand operand of the exponentiation operator is rendered as a superscript, and indices of matrix element are rendered as subscripts. Also, identifiers are rendered according to rules that allow programmers to use variables “decorated” with mathematical glyphs such as primes, “hats”, and overbars.
Using mathematical notation reduces the overhead for scientists to learn Fortress, because it looks like the equations already used in publications or when writing on paper or a whiteboard. Ideally, programs that look like the equations can reduce errors introduced in “translating” those equations into program code, and will enable scientists to more quickly find any errors that are introduced. Also, mathematical notation has developed over hundreds of years, and at its best, compactly and effectively expresses what needs to be computed, rather than how to compute it efficiently. The latter task is handled by libraries that implement the operations used in the equations, and are written by programmers who are experts in efficient algorithms, rather than scientists who are expert in their discipline.
As an example of Fortress code, Listing One shows a function that performs a matrix/vector” conjugate gradient” calculation (based on the NAS” CG” conjugate gradient benchmark for parallel algorithms). The function conjGrad takes a matrix and a vector, performs a number of calculations, and returns two values: a result vector and a measure of the numerical error. Most of the calculations either multiply a vector by a constant and add the result to another vector, or calculate the dot product of two vectors. In one place, however, a vector is multiplied by the given matrix; this is the operation that takes the most time and, fortunately, provides the greatest opportunities for speedup through parallel computation.

Listing One: A function that performs a matrix/vector “conjugate gradient” calculation
In Fortress, the function conjGrad has not only two value parameters A and x, but also four static parameters Elt, N, Mat, and Vec. The parameter N may be any natural number (nonnegative integer). The other three static parameters are types; Elt may be any numeric type, Mat may be any type that is an N- by- N matrix with elements of type Elt; and Vec may be any type that is a length- N vector with elements of type Elt. Because the length N of the vector is a parameter, conjGrad may be used with vectors (and matrices) of different sizes; because N is a static parameter, the Fortress compiler can check at compile time that the matrix is square and that its size matches the length of the vector.
One can specify all the type information explicitly when calling conjGrad on, for example, a 1000-by-1000 matrix B and a length-1000 vector y:
(z: RR64^1000, norm: RR64) = conjGrad[RR64, 1000, RR64^(1000 BY 1000), RR64^1000](B, x)
But that’s usually unpleasantly verbose. If the programmer omits the static type arguments…
(z, norm) = conjGrad(B, x)
… the Fortress implementation infers the necessary type information from the value arguments B and x. This is typical of Fortress coding style: programmers of library functions pay careful attention to type information so that application programmers need not worry much about types. As a result, type parameters appear extensively in library code, but are typically omitted and inferred in application code.
Fortress distinguishes between defining variables and assigning new values to variables. If you use a simple equals sign (=) to give a variable a value, that defines the variable and gives it a value, and that value cannot change; the variable is read-only, and the type of the variable is the type of the value. In the conjGrad example, q is defined to be equal to A p; for the rest of that block, q cannot change. (Of course, when the loop iteration ends, the do block completes and q becomes undefined; the next loop iteration starts a new execution of the do block and therefore creates a new definition for q.)
To define a variable that can be updated, you must use := instead of =, and you must specify the type of the variable explicitly. Then := can be used as an assignment operator to update the variable. In the conjGrad example, z is a vector variable that is initially zero but can be updated, and indeed it is updated in the loop body. Unlike q, the value of z persists from one iteration of the loop to the next. (By the way, the loop is sequential because the function seq converts the generator 1:25 into a sequential generator that guarantees to perform loop iterations in ascending order. Without this use of seq, the loop iterations may be performed in parallel if there are extra processors available.)
There is no special return keyword; the value of a function is provided by the expression after the = in the definition, and the value of a do block is the last expression in the block. So the value of conjGrad is a 2-tuple, that is, a pair whose first item is the vector z and whose second item is the result of the matrix norm expression.
There is no explicit multiplication operator for multiplying a matrix by a vector; as in mathematics, one simply writes a juxtaposition of the matrix and the vector: A x. But even though there is no symbol in the code, nevertheless juxtaposition is an operator just as real as + and /, and user-written library code can provide new definitions for this juxtaposition operator.

Listing Two: Part of the implementation of a “dense matrix” object
Fortress is a very intriguing language. I eagerly await its completion. For example, in Fortress, tuples — including function argument lists — can be built/executed in parallel. There is a nice parallel block statement
do …
also do …
also do …
end;
Check out more at the fortress project site. If you do download the Fortress project to try at home, please be aware that (at the time of this article’s publication), the language and runtime are still under development. In particular, the runtime is still all interpreted and therefore is fairly slow - certainly not comparable with Fortran… yet.
If you explore Fortress and learn more, please contribute to the Fortress wiki at http://projectfortress.sun.com/Projects/Community/wiki , so others can benefit from what you learn. The folks on the fortress mailing list (see the project home page for links) are also very responsive and helpful.
Good initiative, well designed, but extremely unreadable.