Report Number: CSL-TR-97-719
Institution: Stanford University, Computer Systems Laboratory
Title: Automatic Computation and Data Decomposition for Multiprocessors
Author: Anderson, Jennifer-Ann Monique
Date: March 1997
Abstract: Memory subsystem efficiency is critical to achieving high performance on parallel machines. The memory subsystem organization of modern multiprocessor architectures makes their performance highly sensitive to both the distribution of the computation and the layout of the data. A key issue in programming these machines is selecting the computation and data decomposition, the mapping of the computation and data, respectively, across the processors of the machine. A popular approach to the decomposition problem is to require programmers to perform the decomposition analysis themselves, and to communicate that information to the compiler using language extensions. This thesis presents a new compiler algorithm that automatically calculates computation and data decompositions for dense-matrix scientific codes. The core of the algorithm is based on a linear algebra framework for expressing and calculating decompositions. Since the best decompositions may change as different phases of the program are executed, the algorithm also considers re-organizing the data dynamically. The analysis is performed both within and across procedure boundaries so that entire programs can be analyzed. We evaluated the effectiveness of the algorithm by applying it to a suite of benchmark programs. We found that our decomposition analysis and optimization can lead to significant increases in program performance.