Report Number: CSL-TR-95-676
Institution: Stanford University, Computer Systems Laboratory
Title: The COOL Parallel Programming Language: Design, Implementation, and Performance
Author: Chandra, Rohit
Date: January 1995
Abstract: Effective utilization of multiprocessors requires that a program be partitioned for parallel execution, and that it execute with good data locality and load balance. Although automatic compiler-based techniques to address these concerns are attractive, they are often limited by insufficient information about the application. Explicit programmer participation is therefore necessary for programs that exploit unstructured task-level parallelism. However, support for such intervention must address the tradeoff between ease of use and providing a sufficient degree of control to the programmer. In this thesis we present the programming language COOL, that extends C++ with simple and efficient constructs for writing parallel programs. COOL is targeted towards programming shared-memory multiprocessors. Our approach emphasizes the integration of concurrency and synchronization with data abstraction. Concurrent execution is expressed through parallel functions that execute asynchronously when invoked. Synchronization for shared objects is expressed through monitors, and event synchronization is expressed through condition variables. This approach provides several benefits. First, integrating concurrency with data abstraction allows construction of concurrent data structures that have most of the complex details suitably encapsulated. Second, monitors and condition variables integrated with objects offer a flexible set of building blocks that can be used to build more complex synchronization abstractions. Synchronization operations are clearly identified through attributes and can be optimized by the compiler to reduce synchronization overhead. Finally, the object framework supports abstractions to improve the load distribution and data locality of the program. Besides these mechanisms for exploiting parallelism, COOL also provides support for the programmer to address the performance issues, in the form of abstractions that can be used to supply hints about the objects referenced by parallel tasks. These hints are used by the runtime system to schedule tasks close to the objects they reference, and thereby improve data locality. The hints are easily supplied by the programmer in terms of the objects in the program, while the details of task creation and scheduling are managed transparently within the runtime system. Furthermore, the hints do not affect the semantics of the program and allow the programmer to easily experiment with different optimizations. COOL has been implemented on several shared-memory machines, including the Stanford DASH multiprocessor. We have programmed a variety of applications in COOL, including many from the SPLASH parallel benchmark suite. Our experience has been promising: the applications are easily expressed in COOL, and perform as well as hand-tuned codes using lower-level primitives. Furthermore, supplying hints has proven to be an easy and effective way of improving program performance. This thesis therefore demonstrates that (a) the simple but powerful constructs in COOL can effectively exploit task-level parallelism across a variety of application programs, (b) an object-based approach improves both the expressiveness and the performance of parallel programs, and (c) improving data locality can be simple through a combination of programmer abstractions and smart scheduling mechanisms.