Report Number: CS-TR-90-1330
Institution: Stanford University, Department of Computer Science
Title: Parallel ICCG on a hierarchical memory
multiprocessor - addressing the triangular solve bottleneck
Author: Rothberg, Edward
Author: Gupta, Anoop
Date: October 1990
Abstract: The incomplete Cholesky conjugate gradient (ICCG) algorithm
is a commonly used iterative method for solving large sparse
systems of equations. In this paper, we study the parallel
solution of sparse triangular systems of equations, the most
difficult aspect of implementing the ICCG method on a
multiprocessor. We focus on shared-memory multiprocessor
architectures with deep memory hierarchies. On such
architectures we find that previously proposed
parallelization approaches result in little or no speedup.
The reason is that these approaches cause significant
increases in the amount of memory system traffic as compared
to a sequential approach. Increases of as much as a factor of
10 on four processors were observed. In this paper we propose
new techniques for limiting these increases, including data
remappings to increase spatial locality, new processor
synchronization techniques to decrease the use of auxiliary
data structures, and data partitioning techniques to reduce
the amount of interprocessor communication. With these
techniques, memory system traffic is reduced to as little as
one sixth of its previous volume. The resulting speedups are
greatly improved as well, although they are still much less
than linear. We discuss the factors that limit further
speedups. We present both simulation results and results of
experiments on an SGI 4D/340 multiprocessor.
http://i.stanford.edu/pub/cstr/reports/cs/tr/90/1330/CS-TR-90-1330.pdf