Report Number: CSL-TR-92-550
Institution: Stanford University, Computer Systems Laboratory
Title: Cache Coherence Directories for Scalable Multiprocessors
Author: Simoni, Richard
Date: October 1992
Abstract: Directory-based protocols have been proposed as an
efficient means of implementing cache coherence in
large-scale shared-memory multiprocessors. This
thesis explores the trade-offs in the design of cache
coherence directories by examining the organization
of the directory information, the options in the
design of the coherency protocol, and the implementation
of the directory and protocol.
The traditional directory organization that maintains a full
valid bit vector per directory entry is unsuitable for
large-scale machines due to high storage overhead. This
thesis proposes several alternate organizations. Limited
pointers directories replace the bit vactor with several
pointers that indicate those caches containing the data.
Although this scheme performs well across a wide range of
workloads, its performance does not improve as the read/write
ratio becomes very large. To address this drawback, a
dynamic pointer allocation directory is proposed. This
directory allocates pointers from a pool to particular memory
blocks as they are needed. Since the pointers may be
allocated to any block on the memory module, the probability
of running short of pointers is very small. Among the set of
possible organizations, dynamic pointer allocation lies at an
attractive cost/performance point.
Measuring the performance impact of three coherency protocol
features makes the virtues of simplicity clear. Adding a
clean/exclusive state to reduce the time required to write a
clean block results in only modest performance improvement.
Using request forwarding to transfer a dirty block directly to
another cache that has requested it yields similar results.
For small cache block sizes, write hits to clean blocks can
be simply treated as write misses without incurring significant
extra network traffic. Protocol features designed to improve
performance must be examined carefully, for they often
complicate the protocol without offering substantial benefit.
Implementing directory-based coherency presents several
challenges. Methods are described for preventing deadlock,
maintaining a model of parallel execution, handling subtle
situations caused by temporary inconsistencies between cache
and directory state, and tolerating out-of-order message
delivery. Using these techniques, cache coherence can be
added to large-scale multiprocessors in an inexpensive yet
effective manner.
http://i.stanford.edu/pub/cstr/reports/csl/tr/92/550/CSL-TR-92-550.pdf