Report Number: CS-TR-90-1305
Institution: Stanford University, Department of Computer Science
Title: A comparative evaluation of nodal and supernodal parallel
sparse matrix factorization: detailed simulation results
Author: Rathberg, Edward
Author: Gupta, Anoop
Date: February 1990
Abstract: In this paper we consider the problem of factoring a large
sparse system of equations on a modestly parallel
shared-memory multiprocessor with a non-trivial memory
hierarchy. Using detailed multiprocessor simulation, we study
the behavior of the parallel sparse factorization scheme
developed at the Oak Ridge National Laboratory. We then
extend the Oak Ridge scheme to incorporate the notion of
supernodal elimination. We present detailed analyses of the
sources of performance degradation for each of these schemes.
We measure the impact of interprocessor communication costs,
processor load imbalance, overheads introduced in order to
distribute work, and cache behavior on overall parallel
performance. For the three benchmark matrices which we study,
we find that the supernodal scheme gives a factor of 1.7 to
2.7 performance advantage for 8 processors and a factor of
0.9 to 1.6 for 32 processors. The supemodal scheme exhibits
higher performance due mainly to the fact that it executes
many fewer memory operations and produces fewer cache misses.
However, the natural task grain size for the supernodal
scheme is much larger than that of the Oak Ridge scheme,
making effective distnbution of work more difficult,
especially when the number of processors is large.
http://i.stanford.edu/pub/cstr/reports/cs/tr/90/1305/CS-TR-90-1305.pdf