Report Number: CS-TR-92-1401
Institution: Stanford University, Department of Computer Science
Title: The performance impact of data reuse in parallel dense
Cholesky factorization
Author: Rothberg, Edward
Author: Gupta, Anoop
Date: January 1992
Abstract: This paper explores performance issues for several prominent
approaches to parallel dense Cholesky factorization. The
primary focus is on issues that arise when blocking
techniques are integrated into parallel factorization
approaches to improve data reuse in the memory hierarchy. We
first consider panel-oriented approaches, where sets of
contiguous columns are manipulated as single units. These
methods represent natural extensions of the column-oriented
methods that have been widely used previously. On machines
with memory hierarchies, panel-oriented methods
significantly increase the achieved performance over
column-oriented methods. However, we find that panel-
oriented methods do not expose enough concurrency for
problems that one might reasonably expect to solve on
moderately parallel machines, thus significantly limiting
their performance. We then explore block-oriented approaches,
where square submatrices are manipulated instead of sets of
columns. These methods greatly increase the amount of
available concurrency, thus alleviating the problems
encountered with panel-oriented methods. However, a number of
issues, including scheduling choices and block- placement
issues, complicate their implementation. We discuss these
issues and consider approaches that solve the resulting
problems. The resulting block-oriented implementation yields
high processor utilization levels over a wide range of
problem sizes.
http://i.stanford.edu/pub/cstr/reports/cs/tr/92/1401/CS-TR-92-1401.pdf