The X=X+1 Issue

A trip down memory (addressing) lane for parallel programming.

Like many people my age, my first programming experience was with the BASIC language. I recall using an ASR 33 telprinter to create/edit/run BASIC programs. I would marvel at how the cylindrical print head would dance and bang across the paper printing error messages from my ten line programs. Of course, at 300 BAUD you had plenty of time to watch things happen. Back in the day, BASIC was usually the first computer language you would learn. There were other more “advanced” languages like FORTRAN or COBOL that were deemed too difficult for the novice. By the way we capitalized Fortran back then.

I was a senior in high school at the time and had the requisite college prep math (algebra, trigonometry, geometry). I was interested in learning about how computers worked and took the BASIC programming course. (Actually, I wanted to build my own HAL 2000, but that is another story.) In any case, the teacher started talking about variables and writing things like:

Let X = 1

on the blackboard. Then something new was presented. Something weird like:

Let X = X + 1

Wait a minute, I thought, if X=1 then how can X = X + 1 or 1 = 1 + 1. That whole idea was counter to what I was just taught in algebra. I don’t think I was the only one confused. The teacher preceded to explain why you can do this with a computer. The word variable means it can be assigned at any time to anything on the right hand side of the equal sign — including itself.

The ability to assign a memory location with variable data is the cornerstone of computer programming. It turns out the ability to re-assign that same memory location is perhaps one of the biggest consternations in the parallel programming world. At the time, multiple assignment seemed like (and was) a good idea. Coupled with a looping control structure, massive calculations could be performed. For instance, one could write a program to sum the numbers from 1 to 100 (and beyond with simple variable change):

Let SUM = 0
FOR I = 1 to 100
 LET SUM = SUM + I
NEXT

While incredibly useful, multiple assignment creates “state” within the program because the value of variables can change as program runs. The typical programming language allows you to manage the state of program variables. i.e. the memory location that holds the variable SUM changes over time. As any programmer will tell you, program state is easily managed on a single Von Neuwman CPU through the plethora of programming languages. (Some may argue this point, however.) On a large number of CPUs, managing coupled program states becomes extremely difficult. In a parallel environment, it is up to the programmer to manage the state of I and SUM. In a sequential single CPU environment, the programmer can assume I = 100 after the loop completes. In a parallel environment, where the loop may have been broken into parts, I may equal 10 on one CPU, and 20 on another. Of course, the example is simple and obvious. Creating large complex parallel applications does not present such luxuries, however.

This type of “state management” programming has been very successful and is often called “procedural” or imperative programming. The programmer provide statements (the procedure) that change the program state. Adapting procedural programs to parallel environments can range from easy to difficult. Given a typical sequential C program with data structures and pointers, how easy do you think it would be to manage the program state across 16 cores? Could you be sure that those indirect pointers are pointing to the correct data at all times? And, what if there is serialized loop dependency that requires synchronization across all cores?

The simple powerful idea of multiple assignment has come back to bite us in the asymptote. Who could have guessed. Some people not only guessed, but they did not like the whole multiple assignment thing in the first place. What if, they suggested, we only allow single assignment of variables like we do in algebra. Can we do computing in this fashion and avoid managing programming state? Their answer takes the form of declarative programming. Instead of describing “how” (managing state) the programmer specifies “what” the program must do. By moving to the “what”, the programmer relinquishes state management duties. Research and case studies have shown declarative programming is more efficient than procedural programming. In general, declarative programming requires less lines of code (and therefore less opportunity for bugs) and higher programmer efficiency. Even with these impressive results, the majority of the programming community has continued to use X, where X is a single assignment of your favorite procedural language (C, C++, Perl, Python, Java, Fortran, etc.) For many, the legacy cost to move away from the established methods is considered too high to facilitate any kind of switch. It has long been my belief, however, that parallel (i.e. Multi-core) software development costs may require reevaluation of this conclusion.


Declarative languages fall into three basic categories, functional, logic, and constraint. I’ll talk about the differences in the next column, but each of these only allow single assignment to variables. This concept is perhaps the most difficult to grasp because it does not allow looping control structures, which is actually good because loops are the “how” stuff from which we want to move away. So the “how” does one “do a loop” with single assignment. The answer, is recursion, but not procedural recursion. Let’s think about “what” we want to do first, then we will worry about the “how” part. When we add the numbers from 1 to 100 we can start at 100 and we know our answer is 100 plus the sum of 1 to 99. Continuing we know that the sum of 1 to 99 is 99 plus the sum of 1 to 98, and so on. When we reach the number 1, we are done and the sum of number 1 to 1 is 1. I have represented these facts in pseudo code rules to help the procedural side of your brain.

SUM(1) THEN RETURN 1
SUM(I) THEN NEW_I = I - 1; RETURN {I + SUM(NEW_I)}

If I ask for SUM(100), then the rules I have proposed will recursively find the total. How do I represent these rules into a declarative language? I just did. The pseudo-code rules are essentially a declarative program. Each recursive loop assigns NEW_I only once. Note, each new invocations of the rule uses new variables, even though they have the same name. Something else, to notice, there are two definitions of the SUM function. The first only works when the argument it is 1. The second works for everything else. And, notice my rules tell the computer “what” to do, not “how” to do it.

The procedural programmer in you is probably jumping up and down screaming “recursion is inefficient, looping is better, you are going to run out of memory.” Take a breath code boy it will be alright. Recall, I said we are not going to worry about the “how” just the “what.” We have described “what” we want with recursive rules, how the rules are executed is up the declarative language. In reality, any good declarative language recognizes recursive rules and executes efficiently as though it were actually a loop, but then we don’t worry about the “how.” Think about this example for a minute. It is concise, clean, and more like the algebra we learned many years ago. Most importantly it is “closer” to our problem.

We’ll let that percolate for this week. The take-way for this week is multiple assignment is bad for parallel, single assignment is good for parallel. Well see how this works out in upcoming columns. And you thought BASIC was the bomb.

Douglas Eadline is the Senior HPC Editor for Linux Magazine.

Comments on "The X=X+1 Issue"

fernandoaren

10 Neat.
20 And now you tell me there’s something else than BASIC
30 Geez.
40 GOTO 10

Reply
tphillips

Maybe it’s just me, or maybe it’s just a clumsy example, but it looks to me as if you are telling the computer how to do the sum. You are just telling it how using recursion rather than looping.

I’m not saying it’s completely equivalent, but it doesn’t seem to be clearly superior.

Not in this case, anyway.

Reply
ruzam

I’m sure there will be more of a point to this later, but all you’ve demonstrated is recursive style vs non-recursive style which doesn’t have much of anything to do with imperative vs declarative style.

You’re SUM() example could be just as easily coded as a ‘single assignment’ recursive function in any procedural language that support s recursion (in as few lines with as much programmer efficiency). But as you say “how the rules are executed is up the declarative language”. How the rules are executed is also up to the imperative language. Each coding method can be equally tuned (or broken) for parallel execution.

Reply
lilmikey93

This seems fitting.

Hey Hey 16k.

http://www2.b3ta.com/heyhey16k/

Enjoy the video.

Reply
gsmyth

Is multiple assignment bad in parallel computing? In parallel programming we would essentially look at what level of granularity we want to break down the data to be processed over the number of recursive loops to process and assign the computations to threads over various cores. The issue would be synchronising the partial results into a shared location and the synchronisation overhead in this case although necessary would slow the computation down. (assuming thread-safe libraries).
Or perhaps say a high level of granularity would be better and increase the time period between synchronisation points. This discussion may be useful to mention and further comment in the article.
Checking on metrics via sar or clock execution times will determine how well the granularity is performing.

Reply
avagothen

Very nice idea on a very real problem.
Unfortunately, your recursive solution will not run any faster on a parallel machine as each iteration depends on, and so has to wait for, the result of the next before completing. Likewise, each iteration can only be spawned by the one before.

Any code inserted between “NEW_I=I-1″ and “RETURN…” could be run in parallel, but only if NEW_I isn’t (or can’t be) changed within that block. It would take some dandy compiler or interpreter to work out if/how to parallelise that.

Could such a piece of software be the solution to this whole problem? If so, the procedural/declarative distinction would become largely irrelevant. More likely, the programmer has to be smart, not the software.

On the plus side, it does abstract the problem by a level and gets people thinking about it. Hmmm…

Reply
rosbif

300 baud ???

Boy, your ASR33 was _fast_ ;-)

Mine was 110 baud :(

Reply
kavadias

Interesting!! Your definition of SUM(1) and SUM(I) is close to the definition
SUM(I,J) if (I==J) THEN RETURN J
ELSE RETURN { SUM( I, (I+J)/2 ) + SUM( ((I+J)/2)+1, J ) }

that is easier to parallelize. In this sense it might make easier for programmers to learn how to write parallelizable code. But this can be done all the same in imperative programming, although it may take some (more?) training, and is only because of the more abstract approach to “what” instead of “how” …
Other than that I’m very curious what you can make out of not using multiple assignment. My SUM(I,J) (implicitely) introduces a tree of intermediate variables that would help parallelization. This is because of recursion that is closer to parallelization of the devide-and-conquer style…

Reply
loup-vaillant

For those who talk about recursive vs iterative: the point isn\’t one being more declarative than the other. The point is we don\’t need while loops. Therefore, we don\’t need that evil multiple assignment¹. If you ban it, or at least isolate it in well defined parts of the program, you will enable or simplify many optimizations. Being more declarative is just a bonus —a huge one, by the way.

Now, using C or C++ without multiple assignment is hardly more suicidal than declarative. Functional languages, on the other hand, make this possible, thanks to \”first class\” functions.

Sure, this particular example was bad. The better one would be:

sum(n) { return fold(+, 0, list(1, n)); }

where we assume that the + function is associative. list() just generates a list, and fold() applies an binary, associative operation to a list. For example, sum(4) could be computed as ((1+2)+3)+4 or (1+2)+(3+4). The first one can be optimized for a single thread and low stack usage, and the second one can be parallelized. In both cases, you can even avoid the construction of the list (it has been done). What makes these optimizations possible is the high abstraction of this code. Such abstraction is nearly impossible if the compiler have to worry about order of execution. If multiple assignment wasn\’t banned or contained, it would have to.

Therefore, multiple assignment is bad for parallel.

[1] http://theroleplayingprogrammer.blogspot.com/2009/06/assignment-considered-harmfull.html

Reply
AdmnUnpwnd

Thank You very much! I was thinking about asking this to my mathematics teacher. So the ‘=’ in BASIC, or in any other language, is like the STO -> (store) key in TI BASIC.

Reply

Leave a Reply to loup-vaillant Cancel reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>