Report Number: CSL-TR-95-676
Institution: Stanford University, Computer Systems Laboratory
Title: The COOL Parallel Programming Language: Design,
Implementation, and Performance
Author: Chandra, Rohit
Date: January 1995
Abstract: Effective utilization of multiprocessors requires that a
program be partitioned for parallel execution, and that it
execute with good data locality and load balance. Although
automatic compiler-based techniques to address these concerns
are attractive, they are often limited by insufficient
information about the application. Explicit programmer
participation is therefore necessary for programs that
exploit unstructured task-level parallelism. However, support
for such intervention must address the tradeoff between ease
of use and providing a sufficient degree of control to the
programmer.
In this thesis we present the programming language COOL, that
extends C++ with simple and efficient constructs for writing
parallel programs. COOL is targeted towards programming
shared-memory multiprocessors. Our approach emphasizes the
integration of concurrency and synchronization with data
abstraction. Concurrent execution is expressed through
parallel functions that execute asynchronously when invoked.
Synchronization for shared objects is expressed through
monitors, and event synchronization is expressed through
condition variables. This approach provides several benefits.
First, integrating concurrency with data abstraction allows
construction of concurrent data structures that have most of
the complex details suitably encapsulated. Second, monitors
and condition variables integrated with objects offer a
flexible set of building blocks that can be used to build
more complex synchronization abstractions. Synchronization
operations are clearly identified through attributes and can
be optimized by the compiler to reduce synchronization
overhead. Finally, the object framework supports abstractions
to improve the load distribution and data locality of the
program.
Besides these mechanisms for exploiting parallelism, COOL
also provides support for the programmer to address the
performance issues, in the form of abstractions that can be
used to supply hints about the objects referenced by parallel
tasks. These hints are used by the runtime system to schedule
tasks close to the objects they reference, and thereby
improve data locality. The hints are easily supplied by the
programmer in terms of the objects in the program, while the
details of task creation and scheduling are managed
transparently within the runtime system. Furthermore, the
hints do not affect the semantics of the program and allow
the programmer to easily experiment with different
optimizations.
COOL has been implemented on several shared-memory machines,
including the Stanford DASH multiprocessor. We have
programmed a variety of applications in COOL, including many
from the SPLASH parallel benchmark suite. Our experience has
been promising: the applications are easily expressed in
COOL, and perform as well as hand-tuned codes using
lower-level primitives. Furthermore, supplying hints has
proven to be an easy and effective way of improving program
performance. This thesis therefore demonstrates that (a) the
simple but powerful constructs in COOL can effectively
exploit task-level parallelism across a variety of
application programs, (b) an object-based approach improves
both the expressiveness and the performance of parallel
programs, and (c) improving data locality can be simple
through a combination of programmer abstractions and smart
scheduling mechanisms.
http://i.stanford.edu/pub/cstr/reports/csl/tr/95/676/CSL-TR-95-676.pdf