Report Number: CS-TR-90-1305
Institution: Stanford University, Department of Computer Science
Title: A comparative evaluation of nodal and supernodal parallel sparse matrix factorization: detailed simulation results
Author: Rathberg, Edward
Author: Gupta, Anoop
Date: February 1990
Abstract: In this paper we consider the problem of factoring a large sparse system of equations on a modestly parallel shared-memory multiprocessor with a non-trivial memory hierarchy. Using detailed multiprocessor simulation, we study the behavior of the parallel sparse factorization scheme developed at the Oak Ridge National Laboratory. We then extend the Oak Ridge scheme to incorporate the notion of supernodal elimination. We present detailed analyses of the sources of performance degradation for each of these schemes. We measure the impact of interprocessor communication costs, processor load imbalance, overheads introduced in order to distribute work, and cache behavior on overall parallel performance. For the three benchmark matrices which we study, we find that the supernodal scheme gives a factor of 1.7 to 2.7 performance advantage for 8 processors and a factor of 0.9 to 1.6 for 32 processors. The supemodal scheme exhibits higher performance due mainly to the fact that it executes many fewer memory operations and produces fewer cache misses. However, the natural task grain size for the supernodal scheme is much larger than that of the Oak Ridge scheme, making effective distnbution of work more difficult, especially when the number of processors is large.
http://i.stanford.edu/pub/cstr/reports/cs/tr/90/1305/CS-TR-90-1305.pdf