Understanding of both use and implementation issues of parallel languages can
be enhanced by exercises involving language implementation. While it is not
feasible to include ab initio language development in an introductory
one-semester course on parallel computing, it is possible to give students
experience in some of the central issues by providing a framework for building
interpreters. The endeavor of building parallel interpreters in existing
high-level languages is called "meta-programming" because it focuses on building
programs for executing application programs, rather than on the applications
themselves. This paper provides one example of a strategy for parallel
meta-programming. As an example, we present in some detail an interpretive
framework based on C++ and Solaris threads implementing a flexible language
with Lisp-like syntax. A side benefit to providing the kind of experience
indicated is that a parallel meta-interpreter can serve as the framework for a
"coordination language" for routines and programs implemented in diverse
languages. As such, it gives both student and faculty a malleable, and
potentially portable, tool for further experimentation. It is hoped that this
approach will enable objective comparison of features found in different languages.
Topics offered in an introductory course on parallel computation might well include aspects algorithms, complexity theory, languages, architecture, systems, and performance evaluation. We focus on the language aspect here, being attracted to languages because of their utility role. Languages form a middle-ground on which one can concentrate effort for parallel computation. If the effort is successful, many algorithms expressed in the language can be executed in parallel. Languages hopefully abstract techniques that are used in parallel computation, ideally in a manner which obviates the repeated reimplementation of those techniques.
Of course, no language with universal appeal has been designed to date.
However, it seems clear to us that language, not hardware, has become the
number one impediment to the widespread use and acceptance of parallel
computing. If the overhead of making an application parallel could be lessened
through making the parallel aspect of the language more transparent, this would
be a major step. A platform for experimentation, using meta-programming, can
help us arrive at this goal. Meta-programming can also be seen not as a
replacement for many excellent language development efforts already underway,
but as a way of enhancing our understanding of them, by simulating what they
accomplish on a common platform.
Some of the author's early investigations in this area were based on a simple set of intercommunication primitives added to Quintus Prolog (known as the Quintus Multiprocessing Package and provided by Quintus for the Sequent multiprocesor). We discovered that by using a small set of primitives, we could build more complex software architectures, for example knowledge-servers that represent results of parallel search processes, Linda-like servers [Carriero and Gelernter 1990] [Quintus 1989][Keller 1989] and systems for both OR-parallel and AND-parallel pure Prolog [Keller 1992]. While such systems made no attempt to compete with multi-man-year projects for high-performance parallel Prologs, they did provide for us several features:
In this paper, we will illustrate the construction of a usable interactive parallel programming language based on an interpreter. The platform is Sparc Solaris using multithreading for true concurrency. While this particular system is currently limited to Sun workstations and servers, it can be adapted to other systems offering some form of multithreading. There is some indication [Powell et al. 1991] that the form of threads used is close to the emerging Posix standard for threads.
The implementation
language we use is C++ [Stroustrup 1991], with our own package for polymorphic lists
as in Lisp. The example object language has Lisp-like syntax. The set of parallel
features being implemented is flexible. We use a functional programming
dialect as the current object language for sake of illustration, but the approach is by
no means limited to functional programming. Our approach is thus in the spirit
of [Kamin 1990] which provides comparative study of many programming languages
using interpreters, all based on S-expression syntax.
It has been appreciated for some time that functional languages have the key virtue of functionality or determinacy when used in a parallel execution setting. Our work in this area goes back to the University of Utah, 1979-86, with the proposed AMPS (Applicative Multiprocessing System) and Rediflow (Reduction + Dataflow) architectures and their attendant languages. In a functional language, arguments to a function can be executed in any order, or in parallel, with no effect on the result of the function: there simply are no visible side-effects. When this aspect of function evaluation is leveraged in the context of functions which iterate or map over large data structures, a powerful source of parallelism accrues.
While it is not our current belief that contemporary parallel languages should be limited to functional programming, neither should they exclude this very substantial source of parallelism which comes at little cost. Also included here is the ability to do parallel pipelining across streams of data.
What we describe here is a pedagogical approach to building such a parallel functional language. This type of approach was partly inspired part by Peter Henderson's work on LispKit, a sequential Lisp interpreter which could be written in Pascal [Henderson 1980]. We exploit the availability of multithreading on the common Sparc Solaris platform [Sunsoft 1993]. From another viewpoint, the language can be viewed as a more user-friendly way to use threads than the one provided by the vendor.
It is very easy to add new primitives to our language, any of which can be backed up by calls to arbitrary C or C++ functions. Thus, from another viewpoint, the object language can be used as a "coordination" or "glue" language to provided multithreaded execution of independently-developed code.
We start with a couple of views of the object language, which we call "Lisa" for its resemblance to Lisp (or, more closely, Scheme [Abelson and Sussman 1984]). We are using Lisp-like (S-expression) syntax. Part of the reason for use of S-expressions is that our focus is on parallel semantic nuances, not syntactic ones. We don't wish to stop and think up a new notation for every new concept, nor do we want to play with the parser at every step. Such syntactic considerations can always be done after the fact.
In Lisa S-expressions we have:
Note that operators need not represent true functions, although they most often do. This is just an example. Since the interpreter, and thus the language definition, is at the pleasure of the user, any of these things can be modified.
Some simple S-expressions in Lisa with their meanings:
Variables can be bound either globally, as the result of applying a user-defined function, or in a 'let' expression.
(* a b)
where a is bound to 9 and b to 3.
In Lisa, functions are first-class citizens. They are represented internally as closures, carrying their free-variable environment of definition with them. Functions can be returned as results. The following Lisa definition defines 'double' to take a function argument and return a function (represented by the lambda expression) which composes its argument with itself.
In the language Lisa presented so far, no parallel execution is evident. We
now discuss how parallelism is added. We try to be conservative, making each
addition go as far as possible. The first extensions to the language add
parallel multithreaded execution in the form of sow and reap primitives. The
analogy with planting crops can be used:
(sow Expression)
starts the evaluation of Expression (the crops)
but does not wait for them to grow. Instead, it
returns an identifier (like a claim-check, at the risk of mixing metaphors)
which can be used when the value is
wanted later on. The counterpart of sow, which gets the actual value is
(reap Expression)
where Expression should evaluate to the claim-check. The claim-check identifies
a structure in memory allocated for the purpose, known as a "suspension" or
"future". Such structures are common in parallel implementations of functional
languages (cf. [Friedman and Wise 1976], [Keller, Lindstrom, and Patil 1979],
[Halstead 1984]).
The following code uses the claim-check idea to
compute the product of a range of numbers in parallel,
by dynamically building a computation tree (CAUTION: This paper contains
both Lisa and C++ code. You will be able to differentiate them by noting that
Lisa code always starting with a left parenthesis, and the C++ code
never doing so.):
A Lisa example using sow and reap:
It is easy to see that one of the two sown threads is unnecessary, since the sowing thread has to wait for one of the sown threads shortly after it creates the thread. At the expense of destroying symmetry, we could optimize the above code to:
Of course, at some point, it is not worth it to create any more threads, as the balance of work to be done is too small. We could handle this by introducing a minimum-grain parameter Cutoff:
For reasons of good design, we try as much as possible to make the interpreters implementation factorable into a sequential part and a parallel part. The sequential interpreter is represented by the function 'eval' of two arguments: an expression and an environment. The environment provides the interpretation for any free variables occurring in the expression. This is pretty much modeled after the classical Lisp 1.5 interpreter [McCarthy 1962]. Lisp-like data structures are implemented as a polymorphic data type we call "poly"s. The abstract grammar for polys, prior to introduction of the facets discussed in this paper, is:
Parallel aspects of the interpreter are implemented using Solaris threads. Threads exist in the context of a common memory space of a process. In Solaris, a thread is created using a call to thr_create, which specifies a procedure to be executed and an argument to that procedure. Because of uniformity considerations, the designers of Solaris threads require the procedure's argument to be of type (void*), meaning that a generic pointer will be passed, but no type checking is provided. The type-safety problems that this can produce are another reason for using an interpreter as a first exposure to threads, rather than programming using threads directly.
The first thing done when sow is called is to create a "future" struct which will contain the actual expression to be evaluated, the environment, and various other necessities to be described. In our initial development, we passed the address of a future struct via thr_create to an interpreter procedure thr_eval, which calls the main interpreter function eval re-entrantly. When the evaluation of the expression in the future is complete, procedure thr_eval stores the result in the future.
A function executing reap on the future will be forced to wait until the
value has been computed by thr_eval. While a first approximation to this
synchronization can be achieved by the built-in function thr_wait which awaits
the completion of a designated thread, one flaw in this usage is that only one thread can
wait for the completion of a given thread at a time. In general, we want it to
be the case that multiple threads can reap the value of a given thread. It
thus becomes necessary to use another mechanism to implement the reap feature.
For this purpose, Solaris' "conditional variable" abstraction was extremely
convenient. It allows multiple threads to wait until a condition becomes true
using cond_wait. The provider of the value, thr_eval, can awaken all
waiting threads by using cond_broadcast. Moreover, condition variables
are nicely coupled with mutex variables so that race problems are avoided.
As useful as sow and reap are, reap is something we'd like to bury quickly, because it is not pleasant to have to keep track of whether or not a result is a claim check. What we'd like to do is modify the interpreter so that, if a value is a future, then we reap the value first, while if not, we just use the value. This can be handled at the interpreter level by introducing the nuance 'hard_eval' as a variant of eval. Whereas eval treats a claim-check as a constant, hard_eval forces a claim check to be turned into a value before proceeding. The meta-programmer can then selectively apply hard_eval or eval to achieve special effects. For example, such as the built-in arithmetic functions would apply hard_eval to their arguments, whereas the 'if' construct would use hard_eval only for its first argument, and "soft"eval on the other two. User-defined functions use soft eval to evaluate their arguments, relying on functions in the body to use hard_eval when it becomes necessary.
A consequence of the presence of sow is that normal-order evaluation can be achieved for selective arguments, although the evaluator is basically an applicative-order one. Indeed, we could usefully add another operator, 'defer' which packages the argument in a future but does not evaluate it in a separate thread. The evaluation is done by whichever thread requires the argument first. If other threads try to reap the value meanwhile, they will wait using the condition variable mechanism mentioned earlier.
The modification of the interpreter to eliminate the need for explicit reap calls could be left as an exercise for the advanced student, although subtleties may make it preferable that it be given as a primitive. In any case, once this step is taken, the code takes on a more pleasant appearance. For example, the second parallel range product now becomes:
As another example, consider the following parallel merge sort:
In the process of cleaning up the Lisa interpreter, it was deemed necessary to provide a type-safe implementation of both futures and closures, our initial implementation having been done by using lists to simulate these constructs. This clean-up was done by adding new variants to the existing poly variants, which already included types long, double, string, and polylist. Although it drifts away from the kind of things we would consider fair as student exercises, we felt it was important to have an implementation in which futures and closures were unforgeable data objects. In the process of implementing futures, it became clear that the appropriate way to handle reaping is as an object-oriented method for polys. It also became clear that we could take away the dependence on eval and make futures, reap, and sow into more of a free-standing concept. This led to a new constructor for class poly, which takes the following arguments:
The use of a two-argument futurand is purely for convenience in the current interpreter application. A futurand could, in principle, have any number of arguments. It is also possible to package multiple arguments as one by creating a list. In this paper, however, a two-argument futurand has the greatest utility, and we use only this number of arguments to avoid clutter.
One consequence of the introduction of futures into the poly class is that most of the code relating to parallelism is now in the data structures rather than in the interpreter. One might say that this is in fact where it should be, since from an application viewpoint, most parallelism naturally accrues out of data considerations anyway.
A second, unanticipated, benefit is that we can now "compile" parallel
functions into the meta-level, rather than requiring them to go through the
interpreter. This includes functions which operate on, or produce, infinite
lists by using the deferring form of futures. This yields a significant
speed benefit, stripping away the interpretive overhead. By presenting a few
examples, the student can see the connection and develop his own code. A
possible exercise is to develop a compiler from the Lisp dialect to the C++
meta-code.
The internal struct for a future with a 2-argument futurand looks like:
The only place a new thread is created in the Lisa implementation is in this constructor, which is in turned called by one of the poly constructor.
Each thread created by a future constructor executes the following code. The
call to cond_broadcast
awakens all threads waiting for the value of the future.
The method reap reaps the value of a future struct. The
cond_wait
primitive in Solaris threads provides for there being
multiple waiting threads on a single future.
From the viewpoint of the user of polys, a constructor is
added for futures. Depending on the second argument, the constructor
creates a thread for evaluation or simply defers it.
The following comparison shows how the object-level code translates into
meta-level code:
object-level range:
Given the parallel meta-interpreter as a base, it should be possible to assign a variety of exercises involving modifications and enhancements which explore various parallel language paradigms. These include:
In the course of providing a vehicle for demonstrating various concepts in
parallel language implementation, we have developed an approach based on
construction of an interpreter for parallel languages. In the process, we
have seen how a set of self-contained abstractions at the meta-level can
provide for greater modularity and efficiency of language implementation,
while at the same time providing flexibility for exploration of language
concepts, our original goal.
Hal Abelson and Gerald Sussman. Structure and interpretation of computer programs. MIT Press (1984).
Guy E. Blelloch. Vector models for data parallel computing. MIT Press (1990).
Nicholas Carriero and David Gelernter. How to write parallel programs - A first course. MIT Press (1990).
Daniel P. Friedman and David S. Wise. CONS should not evaluate its arguments. in S. Michaelson and R. Milner (eds.), Automata, Languages and Programming. Edinburgh University Press (1976).
Al Geist, et al. PVM: Parallel virtual machine - A users' guide and tutorial for networked parallel computing. MIT Press (1994).
Robert Halstead. Implementation of Multilisp: Lisp on a multiprocessor. ACM Symposium on Lisp and Functional Languages (1984).
Peter Henderson. Functional programming - Application and implementation. Prentice-Hall (1980).
Samuel N. Kamin. Programming languages - An interpreter-based approach. Addison-Wesley (1990).
R.M. Keller, G. Lindstrom and S. Patil. A loosely-coupled applicative multi-processing system. AFIPS Proceedings. 613-622 (June 1979).
R.M. Keller. Divide and CONCer: Data structuring for applicative multiprocessing systems. Proceedings of the 1980 Lisp Conference, 196-202 (Aug. 1980).
R.M. Keller and F.C.H Lin. Simulated performance of a reduction-based multiprocessing system. Computer, 17, 7, 70-82 (July 1984).
R.M. Keller. Distributed computation by graph reduction. Systems Research, 2, 4, 285-295, Pergamon press (1985).
R.M. Keller and M.R. Sleep. Applicative caching. ACM Transactions on Programming Languages and Systems, 8, 1, 88-108 (January 1986).
R.M. Keller, J.W. Slater, and K.T. Likes. Overview of Rediflow II Development. in Fasel and Keller (eds.), Graph reduction, Lecture Notes in Computer Science, 279, 203-214 (September 1986).
R.M. Keller. Rediflow architecture prospectus. University of Utah, Department of Computer Science, Technical Report UUCS-85-105 (August 1985); reprinted in S.S. Thakkar (ed.), Dataflow and Reduction Architectures, 366-394, IEEE (1987).
R.M. Keller. A position on multiprocessing in Prolog. in Proc. Joint Japanese/American Workshop on parallel knowledge systems and logic programming, Argonne National Laboratory ANL-89/43 (1989).
R.M. Keller. Meta-parallel programming. in J. Tanaka, et al. (eds.), Workshop on future directions of parallel programming and architecture. ICOT TM-1185, Tokyo (1992).
John McCarthy, et al. The Lisp 1.5 programmer's manual. MIT Press (1962).
M.L. Powell, et al. SunOS multi-thread architecture. Proc. Usenix Winter '91 (1991). (indirect link to on-line version of the referenced paper)
Quintus Corp. The Quintus Multiprocessing Package. (1989).
Bjarne Stroustrup.
Sunsoft. SunOS 5.2 Guide to Multi-Thread Programming. (1993).