Report Number: CSL-TR-95-670
Institution: Stanford University, Computer Systems Laboratory
Title: Design and Analysis of Update-Based Cache Coherence Protocols for Scalable Shared-Memory Multiprocessors
Author: Glasco, David Brian
Date: June 1995
Abstract: This dissertation examines the performance difference between invalidate-based and update-based cache coherence protocols for scalable shared-memory multiprocessors. The first portion of the dissertation reviews cache coherence. First, chapter 1 describes the cache coherence problem and identifies the two classes of cache coherence protocols, invalidate-based and update-based. The chapter also reviews bus-based protocols and reviews the additional requirements placed on the protocols to extend them to scalable systems. Next, chapter 2 reviews two latency tolerating techniques, relaxed memory consistency models and software-controlled data prefetch, and examines their impact on the cache coherence protocols. Finally, chapter 3 reviews the details of three invalidate-based protocols defined in the literature and defines two new update-based protocols. The second portion of this dissertation examines the performance differences between invalidate-based and update-based protocols. First, chapter 4 presents the methodology used to examine the performance of the protocols. This presentation includes a discussion of the simulation environment, the simulated architecture and the scientific applications. Next, chapter 5 describes and analyzes the performance of two enhancements to the update-based cache coherence protocols. The first enhancement, a fine-grain or word based synchronization scheme, combines data synchronization with the data. This allows the system to take advantage of the fine-grain data updates which result from the update-based protocols. The second enhancement, a write grouping scheme, is necessary to reduce the network traffic generated by the update-based protocols. Next, chapter 6 presents and discusses the simulated results that demonstrate that update-based protocols, with the two enhancements, can significantly improve the performance of the fine-grain scientific applications examined compared to invalidate-based protocols. Chapter 7 examines the sensitivity of the protocols to changes in the architectural parameters and to migratory data. Finally chapter 8 discusses how the choice of protocols affect the correctness, cost and efficiency of the cache coherence mechanism. Overall, this work demonstrates that update-based protocols can be used not only as a coherence mechanism, but also as a latency reducing and tolerating technique to improve the performance of a set of fine-grain scientific applications. But as with other latency reducing techniques, such as data prefetch, the technique must be used with an understanding of its consequences.