Report Number: CSL-TR-94-634
Institution: Stanford University, Computer Systems Laboratory
Title: Architectural and Implementation Tradeoffs for
Multiple-Context Processors
Author: Laudon, James P.
Date: September 1994
Abstract: Tolerating memory latency is essential to achieving high
performance in scalable shared-memory multiprocessors. In
addition, tolerating instruction (pipeline dependency)
latency is essential to maximize the performance of
individual processors. Multiple-context processors have been
proposed as a universal mechanism to mitigate the negative
effects of latency. These processors tolerate latency by
switching to a concurrent thread of execution whenever one of
the threads blocks due to a high-latency operation. Multiple
context processors built so far, however, either have a high
context-switch cost which disallows tolerance of short
latencies (e.g., due to pipeline dependencies), or
alternatively they require excessive concurrency from the
software.
We propose a multiple-context architecture that combines full
single-thread support with cycle-by-cycle context
interleaving to provide lower switch costs and the ability to
tolerate short latencies. We compare the performance of our
proposal with that of earlier approaches, showing that our
approach offers substantially better performance for parallel
applications. We also explore using our approach for
uniprocessor workstations --- an important environment for
commodity microprocessors. We show that our approach also
offers much better performance for multiprogrammed
uniprocessor workloads.
Finally, we explore the implementation issues for both our
proposed and existing multiple-context architectures. One of
the larger costs for a multiple-context processor arises in
providing a cache capable of handling multiple outstanding
requests, and we propose a lockup-free cache which provides
high performance at a reasonable cost. We also show that
amount of processor state that needs to be replicated to
support multiple contexts is modest and the extra complexity
required to control the multiple contexts under both our
proposed and existing approaches is manageable. The
performance benefits and reasonable implementation cost of
our approach make it a promising candidate for addition to
future microprocessors.
http://i.stanford.edu/pub/cstr/reports/csl/tr/94/634/CSL-TR-94-634.pdf