Report Number: CSL-TR-96-688
Institution: Stanford University, Computer Systems Laboratory
Title: OS Support for Improving Data Locality on CC-NUMA Compute Servers
Author: Verghese, Ben
Author: Devine, Scott
Author: Gupta, Anoop
Author: Rosenblum, Mendel
Date: February 1996
Abstract: The dominant architecture for the next generation of cache-coherent shared-memory multiprocessors is CC-NUMA (cache-coherent non-uniform memory architecture). These machines are attractive as compute servers, because they provide transparent access to local and remote memory. However, the access latency to remote memory is 3 - 5 times the latency to local memory. Given the large remote access latencies, data locality is potentially the most important performance issue. In compute-server workloads, when moving processes between nodes for load balancing, to maintain data locality the OS needs to do page-migration and page-replication. Through trace-analysis and actual runs of realistic workloads, we study the potential improvements in performance provided by OS supported dynamic migration and replication. Analyzing our kernel-based implementation of the policy, we provide a detailed breakdown of the costs and point out the functions using the most time. We study alternatives to using full-cache miss information to drive the policy, and show that sampling of cache misses can be used to reduce cost without compromising performance, and that TLB misses are inconsistent as an approximation for cache misses. Finally, our workload runs show that OS supported dynamic page-migration and page-replication can substantially increase performance, as much as 29%, in some workloads.
http://i.stanford.edu/pub/cstr/reports/csl/tr/96/688/CSL-TR-96-688.pdf