Paper to be presented at the Forum on Parallel Computing Curricula,
Wellesley College, April, 1995.

The Future of Meta-Programming in Parallel Languages

Robert M. Keller
Harvey Mudd College
Claremont, California 91711

keller@cs.hmc.edu

27 March 1995

Abstract

Understanding of both use and implementation issues of parallel languages can be enhanced by exercises involving language implementation. While it is not feasible to include ab initio language development in an introductory one-semester course on parallel computing, it is possible to give students experience in some of the central issues by providing a framework for building interpreters. The endeavor of building parallel interpreters in existing high-level languages is called "meta-programming" because it focuses on building programs for executing application programs, rather than on the applications themselves. This paper provides one example of a strategy for parallel meta-programming. As an example, we present in some detail an interpretive framework based on C++ and Solaris threads implementing a flexible language with Lisp-like syntax. A side benefit to providing the kind of experience indicated is that a parallel meta-interpreter can serve as the framework for a "coordination language" for routines and programs implemented in diverse languages. As such, it gives both student and faculty a malleable, and potentially portable, tool for further experimentation. It is hoped that this approach will enable objective comparison of features found in different languages.

Introduction

Background

Current System

Example Object Language

Parallel Object Language

Initial Implementation

Extended Implementation

Parallel Abstractions at the Meta Level

Salient Implementation Points

Introduction

Topics offered in an introductory course on parallel computation might well include aspects algorithms, complexity theory, languages, architecture, systems, and performance evaluation. We focus on the language aspect here, being attracted to languages because of their utility role. Languages form a middle-ground on which one can concentrate effort for parallel computation. If the effort is successful, many algorithms expressed in the language can be executed in parallel. Languages hopefully abstract techniques that are used in parallel computation, ideally in a manner which obviates the repeated reimplementation of those techniques.

Of course, no language with universal appeal has been designed to date. However, it seems clear to us that language, not hardware, has become the number one impediment to the widespread use and acceptance of parallel computing. If the overhead of making an application parallel could be lessened through making the parallel aspect of the language more transparent, this would be a major step. A platform for experimentation, using meta-programming, can help us arrive at this goal. Meta-programming can also be seen not as a replacement for many excellent language development efforts already underway, but as a way of enhancing our understanding of them, by simulating what they accomplish on a common platform.

Background

Some of the author's early investigations in this area were based on a simple set of intercommunication primitives added to Quintus Prolog (known as the Quintus Multiprocessing Package and provided by Quintus for the Sequent multiprocesor). We discovered that by using a small set of primitives, we could build more complex software architectures, for example knowledge-servers that represent results of parallel search processes, Linda-like servers [Carriero and Gelernter 1990] [Quintus 1989][Keller 1989] and systems for both OR-parallel and AND-parallel pure Prolog [Keller 1992]. While such systems made no attempt to compete with multi-man-year projects for high-performance parallel Prologs, they did provide for us several features:

A relatively low-cost entry-level approach to parallel execution, suitable for large-grain parallel applications.
An enhanced understanding of many of the issues involved in building parallel language implementations.
A demonstration vehicle which students could then modify with their own nuances, some of which were suggested by exercises.
An interactive framework for launching and executing parallel tasks.

Current System

In this paper, we will illustrate the construction of a usable interactive parallel programming language based on an interpreter. The platform is Sparc Solaris using multithreading for true concurrency. While this particular system is currently limited to Sun workstations and servers, it can be adapted to other systems offering some form of multithreading. There is some indication [Powell et al. 1991] that the form of threads used is close to the emerging Posix standard for threads.

The implementation language we use is C++ [Stroustrup 1991], with our own package for polymorphic lists as in Lisp. The example object language has Lisp-like syntax. The set of parallel features being implemented is flexible. We use a functional programming dialect as the current object language for sake of illustration, but the approach is by no means limited to functional programming. Our approach is thus in the spirit of [Kamin 1990] which provides comparative study of many programming languages using interpreters, all based on S-expression syntax.

Example Object Language

It has been appreciated for some time that functional languages have the key virtue of functionality or determinacy when used in a parallel execution setting. Our work in this area goes back to the University of Utah, 1979-86, with the proposed AMPS (Applicative Multiprocessing System) and Rediflow (Reduction + Dataflow) architectures and their attendant languages. In a functional language, arguments to a function can be executed in any order, or in parallel, with no effect on the result of the function: there simply are no visible side-effects. When this aspect of function evaluation is leveraged in the context of functions which iterate or map over large data structures, a powerful source of parallelism accrues.

While it is not our current belief that contemporary parallel languages should be limited to functional programming, neither should they exclude this very substantial source of parallelism which comes at little cost. Also included here is the ability to do parallel pipelining across streams of data.

What we describe here is a pedagogical approach to building such a parallel functional language. This type of approach was partly inspired part by Peter Henderson's work on LispKit, a sequential Lisp interpreter which could be written in Pascal [Henderson 1980]. We exploit the availability of multithreading on the common Sparc Solaris platform [Sunsoft 1993]. From another viewpoint, the language can be viewed as a more user-friendly way to use threads than the one provided by the vendor.

It is very easy to add new primitives to our language, any of which can be backed up by calls to arbitrary C or C++ functions. Thus, from another viewpoint, the object language can be used as a "coordination" or "glue" language to provided multithreaded execution of independently-developed code.

We start with a couple of views of the object language, which we call "Lisa" for its resemblance to Lisp (or, more closely, Scheme [Abelson and Sussman 1984]). We are using Lisp-like (S-expression) syntax. Part of the reason for use of S-expressions is that our focus is on parallel semantic nuances, not syntactic ones. We don't wish to stop and think up a new notation for every new concept, nor do we want to play with the parser at every step. Such syntactic considerations can always be done after the fact.

In Lisa S-expressions we have:

atomic values: integer and floating-point numerals, e.g.: 0 1234 -567 8.95e-12 strings: the literal string with a quote wrapper, to distinguish literals from variables: (quote foo) represents the literal string "foo" (quote (foo (bar baz))) represents nested lists ("foo" ("bar" "baz" )) Boolean values: We follow C rather than Lisp in interpreting 0 as false and non-zero as true. closures: representation of a user-defined function with its static environment futures: an un-evaluated expression that stands for a value

Non-atomic values are always lists:

(): is the empty list (read "nil") and is not considered to be atomic in Lisa. The empty list stands for itself.
(op arg1 arg2 .... argN): denotes the application of an operator to N arguments. Operators can be built-in (defined at the meta-level), user-defined (defined at the object-level), or expressions which are evaluated to get a user-defined or built-in function.
(op): denotes the application of a 0-argument function to no arguments (as opposed to just op by itself, which identifies the operator but does not apply it)

Note that operators need not represent true functions, although they most often do. This is just an example. Since the interpreter, and thus the language definition, is at the pleasure of the user, any of these things can be modified.

Some simple S-expressions in Lisa with their meanings:

( + 1 2 3 4 5) 15, sum of some integers ( / 2.0 3) 0.666667 ( > 2 1.9 ) 1, representing true ( <= 2. -5) 0, representing false (if ( > 3 4) 5 6) 6, since 3 is not greater than 4 (sqrt 2) 1.41421 (list (+ 2 3) 4) (5 4) (cons 1 (cons 2 ())) (1 2) (first (list (+ 2 3) 4)) 5

Variables can be bound either globally, as the result of applying a user-defined function, or in a 'let' expression.

(let ( (a 9) (b 3) ) (* a b) ) has the value of 27, i.e. (* a b) where a is bound to 9 and b to 3. (def (f x) (+ (* x x) 3)) defines f as the "square and add 3" function, so (f 9) yields 84.

In Lisa, functions are first-class citizens. They are represented internally as closures, carrying their free-variable environment of definition with them. Functions can be returned as results. The following Lisa definition defines 'double' to take a function argument and return a function (represented by the lambda expression) which composes its argument with itself.

(def (double f) (lambda (x) (f (f x))))

Parallel Object Language

In the language Lisa presented so far, no parallel execution is evident. We now discuss how parallelism is added. We try to be conservative, making each addition go as far as possible. The first extensions to the language add parallel multithreaded execution in the form of sow and reap primitives. The analogy with planting crops can be used:
(sow Expression)

starts the evaluation of Expression (the crops) but does not wait for them to grow. Instead, it returns an identifier (like a claim-check, at the risk of mixing metaphors) which can be used when the value is wanted later on. The counterpart of sow, which gets the actual value is

(reap Expression)

where Expression should evaluate to the claim-check. The claim-check identifies a structure in memory allocated for the purpose, known as a "suspension" or "future". Such structures are common in parallel implementations of functional languages (cf. [Friedman and Wise 1976], [Keller, Lindstrom, and Patil 1979], [Halstead 1984]).

The following code uses the claim-check idea to compute the product of a range of numbers in parallel, by dynamically building a computation tree (CAUTION: This paper contains both Lisa and C++ code. You will be able to differentiate them by noting that Lisa code always starting with a left parenthesis, and the C++ code never doing so.):

A Lisa example using sow and reap:

(def (product M N) (if (>= M N) M (let ((MP (/ (+ M N) 2))) (let ( (t1 (sow (product M MP))) (t2 (sow (product (+ MP 1) N))) ) (* (reap t1) (reap t2))))))

It is easy to see that one of the two sown threads is unnecessary, since the sowing thread has to wait for one of the sown threads shortly after it creates the thread. At the expense of destroying symmetry, we could optimize the above code to:

(def (product M N) (if (>= M N) M (let ((MP (/ (+ M N) 2))) (let ( (t1 (sow (product M MP))) (t2 (product (+ MP 1) N)) ) (* (reap t1) t2)))))

Of course, at some point, it is not worth it to create any more threads, as the balance of work to be done is too small. We could handle this by introducing a minimum-grain parameter Cutoff:

(def (product M N Cutoff) (if (< (- N M) Cutoff) (serial_product M N) (let ((MP (/ (+ M N) 2))) (let ( (t1 (sow (product M MP Cutoff))) (t2 (product (+ MP 1) N Cutoff)) ) (+ (reap t1) t2))))) (def (serial_product M N) (if (>= M N) M (+ M (serial_product (+ M 1) N)))) here serial_product takes over when width of the range of numbers to be multiplied has gotten below the value Cutoff.

Initial Implementation

For reasons of good design, we try as much as possible to make the interpreters implementation factorable into a sequential part and a parallel part. The sequential interpreter is represented by the function 'eval' of two arguments: an expression and an environment. The environment provides the interpretation for any free variables occurring in the expression. This is pretty much modeled after the classical Lisp 1.5 interpreter [McCarthy 1962]. Lisp-like data structures are implemented as a polymorphic data type we call "poly"s. The abstract grammar for polys, prior to introduction of the facets discussed in this paper, is:

poly = long | double | string | polylist polylist = nil | poly polylist Polys provide dynamic polymorphism in the implementation language C++, which is nominally a strongly-typed language. The programmer essentially can use Lisp-like paradigms in C++, which is why the package was developed in the first place. Basing the interpreter on polys provides additional bonuses at the meta-level, as shall be seen. The preferred way to achieve such polymorphism in C++ is to use inheritance: each of the variant types inherits from a single base type. Additional types can be added as additional classes inheriting from this base type.

Parallel aspects of the interpreter are implemented using Solaris threads. Threads exist in the context of a common memory space of a process. In Solaris, a thread is created using a call to thr_create, which specifies a procedure to be executed and an argument to that procedure. Because of uniformity considerations, the designers of Solaris threads require the procedure's argument to be of type (void*), meaning that a generic pointer will be passed, but no type checking is provided. The type-safety problems that this can produce are another reason for using an interpreter as a first exposure to threads, rather than programming using threads directly.

The first thing done when sow is called is to create a "future" struct which will contain the actual expression to be evaluated, the environment, and various other necessities to be described. In our initial development, we passed the address of a future struct via thr_create to an interpreter procedure thr_eval, which calls the main interpreter function eval re-entrantly. When the evaluation of the expression in the future is complete, procedure thr_eval stores the result in the future.

A function executing reap on the future will be forced to wait until the value has been computed by thr_eval. While a first approximation to this synchronization can be achieved by the built-in function thr_wait which awaits the completion of a designated thread, one flaw in this usage is that only one thread can wait for the completion of a given thread at a time. In general, we want it to be the case that multiple threads can reap the value of a given thread. It thus becomes necessary to use another mechanism to implement the reap feature. For this purpose, Solaris' "conditional variable" abstraction was extremely convenient. It allows multiple threads to wait until a condition becomes true using cond_wait. The provider of the value, thr_eval, can awaken all waiting threads by using cond_broadcast. Moreover, condition variables are nicely coupled with mutex variables so that race problems are avoided.

Extended Implementation

As useful as sow and reap are, reap is something we'd like to bury quickly, because it is not pleasant to have to keep track of whether or not a result is a claim check. What we'd like to do is modify the interpreter so that, if a value is a future, then we reap the value first, while if not, we just use the value. This can be handled at the interpreter level by introducing the nuance 'hard_eval' as a variant of eval. Whereas eval treats a claim-check as a constant, hard_eval forces a claim check to be turned into a value before proceeding. The meta-programmer can then selectively apply hard_eval or eval to achieve special effects. For example, such as the built-in arithmetic functions would apply hard_eval to their arguments, whereas the 'if' construct would use hard_eval only for its first argument, and "soft"eval on the other two. User-defined functions use soft eval to evaluate their arguments, relying on functions in the body to use hard_eval when it becomes necessary.

A consequence of the presence of sow is that normal-order evaluation can be achieved for selective arguments, although the evaluator is basically an applicative-order one. Indeed, we could usefully add another operator, 'defer' which packages the argument in a future but does not evaluate it in a separate thread. The evaluation is done by whichever thread requires the argument first. If other threads try to reap the value meanwhile, they will wait using the condition variable mechanism mentioned earlier.

The modification of the interpreter to eliminate the need for explicit reap calls could be left as an exercise for the advanced student, although subtleties may make it preferable that it be given as a primitive. In any case, once this step is taken, the code takes on a more pleasant appearance. For example, the second parallel range product now becomes:

(def (product M N) (if (>= M N) M (let ((MP (/ (+ M N) 2))) (let ( (t1 (sow (product M MP))) (t2 (product (+ MP 1) N)) ) (* t1 t2)))))

As another example, consider the following parallel merge sort:

(def (sort L) (repeatedly_merge_pairs (make_singleton_lists L))) (def (make_singleton_lists L) (map (lambda (A) (list A)) L)) (def (repeatedly_merge_pairs L) (if (null (rest L)) (first L) (repeatedly_merge_pairs (merge_pairs L)))) (def (merge_pairs L) (if (null L) () (if (null (rest L)) L (cons (sow (merge (first L) (first (rest L)))) (merge_pairs (rest (rest L))))))) (def (merge L M) (if (null L) M (if (null M) L (let ((A (first L)) (B (first M))) (if (< A B) (cons A (merge (rest L) M)) (cons B (merge L (rest M)))))))) This sort creates a list of singleton lists from its input, then repeatedly merges those lists a pair at a time, until only a singleton list remains. The code is basically the same as the sequential algorithm, except for the sow wrapped around the call to merge, which creates a thread to do the merging. Since sow returns immediately, the resulting future is consed into the list, the construction of which continues with the recursive call to merge_pairs. Primitives first and rest are built into Lisa and extract the first element of a list and the rest of the list, respectively. Primitive null tests whether its argument is the empty list or not.

Parallel Abstractions at the Meta Level

In the process of cleaning up the Lisa interpreter, it was deemed necessary to provide a type-safe implementation of both futures and closures, our initial implementation having been done by using lists to simulate these constructs. This clean-up was done by adding new variants to the existing poly variants, which already included types long, double, string, and polylist. Although it drifts away from the kind of things we would consider fair as student exercises, we felt it was important to have an implementation in which futures and closures were unforgeable data objects. In the process of implementing futures, it became clear that the appropriate way to handle reaping is as an object-oriented method for polys. It also became clear that we could take away the dependence on eval and make futures, reap, and sow into more of a free-standing concept. This led to a new constructor for class poly, which takes the following arguments:

the futurand, a function of type poly X poly -> poly
an indication of whether the future is to be demanded immediately (e.g. whether calling from a sow or a defer)
the intended poly arguments of the futurand

Again, this abstraction itself does not make any direct use of eval. Instead, eval can be supplied as a futurand, with the two futurand arguments being the expression to be evaluated and the environment of that expression.

The use of a two-argument futurand is purely for convenience in the current interpreter application. A futurand could, in principle, have any number of arguments. It is also possible to package multiple arguments as one by creating a list. In this paper, however, a two-argument futurand has the greatest utility, and we use only this number of arguments to avoid clutter.

One consequence of the introduction of futures into the poly class is that most of the code relating to parallelism is now in the data structures rather than in the interpreter. One might say that this is in fact where it should be, since from an application viewpoint, most parallelism naturally accrues out of data considerations anyway.

A second, unanticipated, benefit is that we can now "compile" parallel functions into the meta-level, rather than requiring them to go through the interpreter. This includes functions which operate on, or produce, infinite lists by using the deferring form of futures. This yields a significant speed benefit, stripping away the interpretive overhead. By presenting a few examples, the student can see the connection and develop his own code. A possible exercise is to develop a compiler from the Lisp dialect to the C++ meta-code.

Salient Implementation Points

The internal struct for a future with a 2-argument futurand looks like:

enum future_state {UNDEMANDED, DEMANDED, READY}; // major states of future struct future { future_state status; // whether argument is evaluated poly (*fun)(poly, poly); // the futurand (function to be called) poly arg0; // first argument to futurand poly arg1; // second argument to futurand poly result; // actual result thread_t thrid; // thread id mutex_t lock; // lock for conditional variable cond_t waiting; // conditional variable // future constructor future(poly (*Fun)(poly, poly), future_state Demand, poly Arg0, poly Arg1 ); } The constructor for class future is given by future::future(poly (*Fun)(poly, poly), future_state Demand, poly Arg0, poly Arg1) : fun(Fun), arg0(Arg0), arg1(Arg1), status(Demand), result(NIL), thrid(0) { cond_init(&waiting, USYNC_THREAD, 0); mutex_init(&lock, USYNC_THREAD, 0); if( Demand == DEMANDED ) // if we need to evaluate { // create a thread thr_create(0, 0, (void*(*)(void*))thr_eval, (void*)this, 0, &thrid); } }

The only place a new thread is created in the Lisa implementation is in this constructor, which is in turned called by one of the poly constructor.

Each thread created by a future constructor executes the following code. The call to cond_broadcast awakens all threads waiting for the value of the future.

void* thr_eval(future* p) { p -> result = (p -> fun)(p -> arg0, p -> arg1); // evaluate the body mutex_lock(&(p -> lock)); p -> status = READY; // change the status cond_broadcast(&(p -> waiting)); // tell others mutex_unlock(&(p -> lock)); }

The method reap reaps the value of a future struct. The cond_wait primitive in Solaris threads provides for there being multiple waiting threads on a single future.

poly reap(future* p) { poly result; mutex_lock(&(p -> lock)); // lock out other accessors switch( p -> status ) { case READY: // The value is available result = p -> result; break; case UNDEMANDED: // The value is not available, and no thread is evaluating result = (p -> fun)(p -> arg0, p -> arg1); // evaluate now p -> result = result; p -> status = READY; break; // note that it is not necessary to cond_broadcast to other waiters // here, since they would have been blocked on the mutex. When they // do finally get it, they will take the READY case and proceed case DEMANDED: // In this case, some thread is already evaluating cond_wait(&(p -> waiting), &(p -> lock)); result = p -> result; break; } mutex_unlock(&(p -> lock)); return result; } Once a future wrapped inside a poly is reaped, the poly is replaced with the value and the future may be reclaimed.

From the viewpoint of the user of polys, a constructor is added for futures. Depending on the second argument, the constructor creates a thread for evaluation or simply defers it.

poly(poly (*Fun)(poly, poly), int Demand, poly Arg0, poly Arg1); This constructor is called, for example, when the interpreter interprets the Lisa operator sow: poly eval_sow(arglist Args, env V) { return poly(eval, 1, first(Args), poly(V)); } Similarly, the Lisa operator defer is implemented by: poly eval_defer(arglist Args, env V) { return poly(eval, 0, first(Args), poly(V)); } We do not show the integration of the new poly constructor into the pre-existing poly code here. However, it should be mentioned that once this has been done, the reap method can be introduced as a private method to class poly, invisibly called as needed if the user tries to get the value of a poly which is a future. This has the desirable feature of making the future-handling primitives self-contained within the poly class.

The following comparison shows how the object-level code translates into meta-level code:

object-level range:

(def (range Low Hi) (if (> Low Hi) () (cons Low (defer (range (+ Low 1) Hi))))) meta-level range: poly eval_range(arglist Args, env V) { return poly(range_driver, 0, eval(first(Args), V), eval(second(Args), V)); } poly range_driver(poly A0, poly A1) { long Low = A0; long Hi = A1; if( Low > Hi ) return NIL; return cons(Low, poly(range_driver, 0, Low+1, Hi)); } Above, the poly calls are to the future-producing constructor which will apply function range_driver to the packaged pair of arguments. A second example is the useful function map:

object-level map: (def (map F L) (if (null L) () (cons (F (first L)) (defer (pmap F (rest L)))))) meta-level map: poly eval_map(arglist Args, env V) { return poly(map_driver, 0, eval(first(Args), V), eval(second(Args), V)); } poly map_driver(poly A0, poly A1) { poly F = A0; // function to be mapped polylist L = A1; // list over which mapping occurs if( null(L) ) return NIL; return cons(apply(F, list(first(L)), NIL), poly(map_driver, 0, F, rest(L))); } As one can see below, the difference between map and pmap, which evaluates the map calls in parallel, are very minor. The main difference is in the demand (second) argument to the poly constructor:

object-level pmap: (def (pmap F L) (if (null L) () (cons (sow (F (first L))) (defer (pmap F (rest L)))))) meta-level pmap: poly pmap_driver(poly A0, poly A1) { poly F = A0; // function to be mapped polylist L = A1; // list over which mapping occurs if( null(L) ) return NIL; return cons(apply(F, list(first(L)), NIL), poly(pmap_driver, 1, F, rest(L))); } poly eval_pmap(arglist Args, env V) { return poly(pmap_driver, 1, eval(first(Args), V), eval(second(Args), V)); }

Student Exercises

Given the parallel meta-interpreter as a base, it should be possible to assign a variety of exercises involving modifications and enhancements which explore various parallel language paradigms. These include:

Object-Level Exercises:

These would simply require programming in the object language (Lisa in this case) to achieve particular effects:

Application code
Aggressive vs. lazy versions of various utility functions (such as pmap vs. map described above)

Higher-Level Exercises:

These would involve implementing preprocessors which analyze and transform code with the objective of increasing parallelism, decreasing communication costs (provided such a notion were added), etc.

Minor Meta-Programming Exercises

Implement a "parallel apply" form (par (Fun A1 .... An)) which sows each argument
Explore speculative computation constructs, such as (parallel-OR A1 .... An) which sows each argument but returns a 1 as soon as any result is 1, rather than waiting for all results.
Implement parallel data representations such as conc [Keller 1980]: (conc L1 .... Ln) produces the virtual concatenation of lists L1, ...., Ln evaluated in parallel. Conc structures provide a new option for parallel map and reduce functions. Related to this are parallel "array comprehensions" (as produced by 'make' operator in [Keller 1980]; see also [Keller and Sleep 1986]).

Major Meta-Programming Exercises

Worker model:: The current interpreter does not attempt to control the number of threads it generates. This may be wasteful, depending on the thread implementation used. Re-implement the interpreter so that a pre-set number of "worker" threads do the work, each taking from a common pool.
Data parallel models:: Explore the implementation of collection-oriented models such as in [Blelloch 1990].
Linda:: Implement a Linda-like [Carriero and Gelernter 1990] server abstraction.
PVM:: Produce a Lisa binding for PVM [Geist, et al. 1994], which could then provide an interactive parallel front-end for computations written in diverse languages running on diverse machines.
Instrumentation Exercises: Design and implement techniques for assessing the performance of parallel algorithms based on the meta-interpreter approach.

Conclusion

In the course of providing a vehicle for demonstrating various concepts in parallel language implementation, we have developed an approach based on construction of an interpreter for parallel languages. In the process, we have seen how a set of self-contained abstractions at the meta-level can provide for greater modularity and efficiency of language implementation, while at the same time providing flexibility for exploration of language concepts, our original goal.

References

Hal Abelson and Gerald Sussman. Structure and interpretation of computer programs. MIT Press (1984).

Guy E. Blelloch. Vector models for data parallel computing. MIT Press (1990).

Nicholas Carriero and David Gelernter. How to write parallel programs - A first course. MIT Press (1990).

Daniel P. Friedman and David S. Wise. CONS should not evaluate its arguments. in S. Michaelson and R. Milner (eds.), Automata, Languages and Programming. Edinburgh University Press (1976).

Al Geist, et al. PVM: Parallel virtual machine - A users' guide and tutorial for networked parallel computing. MIT Press (1994).

Robert Halstead. Implementation of Multilisp: Lisp on a multiprocessor. ACM Symposium on Lisp and Functional Languages (1984).

Peter Henderson. Functional programming - Application and implementation. Prentice-Hall (1980).

Samuel N. Kamin. Programming languages - An interpreter-based approach. Addison-Wesley (1990).

R.M. Keller, G. Lindstrom and S. Patil. A loosely-coupled applicative multi-processing system. AFIPS Proceedings. 613-622 (June 1979).

R.M. Keller. Divide and CONCer: Data structuring for applicative multiprocessing systems. Proceedings of the 1980 Lisp Conference, 196-202 (Aug. 1980).

R.M. Keller and F.C.H Lin. Simulated performance of a reduction-based multiprocessing system. Computer, 17, 7, 70-82 (July 1984).

R.M. Keller. Distributed computation by graph reduction. Systems Research, 2, 4, 285-295, Pergamon press (1985).

R.M. Keller and M.R. Sleep. Applicative caching. ACM Transactions on Programming Languages and Systems, 8, 1, 88-108 (January 1986).

R.M. Keller, J.W. Slater, and K.T. Likes. Overview of Rediflow II Development. in Fasel and Keller (eds.), Graph reduction, Lecture Notes in Computer Science, 279, 203-214 (September 1986).

R.M. Keller. Rediflow architecture prospectus. University of Utah, Department of Computer Science, Technical Report UUCS-85-105 (August 1985); reprinted in S.S. Thakkar (ed.), Dataflow and Reduction Architectures, 366-394, IEEE (1987).

R.M. Keller. A position on multiprocessing in Prolog. in Proc. Joint Japanese/American Workshop on parallel knowledge systems and logic programming, Argonne National Laboratory ANL-89/43 (1989).

R.M. Keller. Meta-parallel programming. in J. Tanaka, et al. (eds.), Workshop on future directions of parallel programming and architecture. ICOT TM-1185, Tokyo (1992).

John McCarthy, et al. The Lisp 1.5 programmer's manual. MIT Press (1962).

M.L. Powell, et al. SunOS multi-thread architecture. Proc. Usenix Winter '91 (1991). (indirect link to on-line version of the referenced paper)

Quintus Corp. The Quintus Multiprocessing Package. (1989).

Bjarne Stroustrup. The C++ programming language. Addison-Wesley (1991).

Sunsoft. SunOS 5.2 Guide to Multi-Thread Programming. (1993).

Glossary

class: a definition of the form and methods for a set of objects with common characteristics. An abstract data type.
closure: a standard device for representing a function as a package consisting of the code for the function and the environment, the latter prescribing bindings for any free variables in the function.
conc: A constructor which creates a list out of a number of lists, rather than as an element and a list as does cons.
cond_broadcast: a Solaris multi-threading primitive which causes resumption of all threads waiting on a specified condition variable (link to an example using cond_broadcast).
cond_wait: a Solaris multi-threading primitive which causes a thread to wait on a specified condition variable (link to an example using cond_wait).
cons: the list constructor of Lisp and Lisa. Creates a new list from a first element for that list and an existing list, which becomes the rest of the list. Also, a verb denoting this action (to cons) (link to code using cons).
constructor: in C++, a specification for how an object is to be created, using what arguments, etc. A given class of objects can have multiple constructors.
eval: typical name of the main function in an interpreter. eval dispatches other functions which carry out specific purposes, as identified by names in the object language.
evaluator: another word for interpreter.
future: a data structure that represents the value of an expression which can be computed on demand, or is being computed by a separate thread (link to code for a future).
futurand: the function argument to a future at the meta level. Arguments of the futurand are also stored in the future and evaluated when demand occurs, which it may be initially in the case of sow.
interpreter: a program which implements a programming language by interpreting functions or commands in the latter.
Lisa: the name for a Lisp variant object language discussed in this paper (link to examples of Lisa expressions).
map: a function which applies another function to each element of a list, creating a new list as its result (link to implementation code for map).
meta-: a prefix meaning doing the interpreting, as opposed to being interpreted. An interpreter is a meta-program, in the sense that it evaluates a program at another level, called the object-program (link to example of meta-code ).
method: a procedure for accessing an object.
mutex: a data object that provides mutual exclusion among threads, through the use of mutex_lock and mutex_unlock primitives (link to an example using mutex).
object-: the program being interpreted. See meta-.
object: an entity in an object-oriented language consisting of a structure coupled with particular methods for accessing the structure (link to an example of object code).
poly: the name for a class of polymorphic object which can take on an identity of one of several atoms, including futures, or a list. Can be thought of as the internal representation of S-expressions.
reap: a Lisa primitive representing collecting the value evaluated by a separate thread (link to an example using reap) (link to implementation code for reap).
reduce: a function which repeatedly applies a binary operator to elements of a list pairwise, resulting in a single value of the same type as an element. Also, the acts of expanding a user-defined function when the latter is applied to its arguments and of evaluating built-in functions.
S-expression: a notation for expressing syntax of a programming language which is very nearly abstract syntax. An expression is either atomic or is a list of items S-expressions within a pair of parentheses. The first item in the list identifies the syntactic category, typically the application of a function, to the remaining items. In our C++ interpreter for Lisa, S-expressions translate into polys. (link to examples of S-expressions ).
Solaris: the SunOS 5.2 operating system, a variant of Unix System V.4.
sow: a Lisa primitive representing the evaluation of an expression by a separate thread (link to an example using sow) (link to implementation code for sow).
thread: one of several locii of control sharing a common memory space of a process. (link to Sun www page on threads )
thr_create: a Solaris multi-threading primitive for creating a new thread (link to an example using thr_create).
thr_eval: a function used in the Lisa interpreter to represent the code for a thread for calling the interpreter in parallel (link to code calling thr_eval).

The Future of Meta-Programming in Parallel Languages

Robert M. Keller Harvey Mudd College Claremont, California 91711 keller@cs.hmc.edu 27 March 1995

Abstract

Contents

Robert M. Keller
Harvey Mudd College
Claremont, California 91711

keller@cs.hmc.edu

27 March 1995