Report Number: CSL-TR-00-807
Institution: Stanford University, Computer Systems Laboratory
Author: Liao, Shih-Wei
Date: August 2000
Abstract: Shared-memory multiprocessors that use the latest microprocessors are becoming widely used both as compute servers and as desktop computers. But the difficulty in developing parallel software is a major obstacle to the effective use of the multiprocessors to solve a single task. To increase the productivity of multiprocessor programmers, we developed an interactive interprocedural parallelizer called SUIF Explorer. Our experience with SUIF Explorer also helps to identify missing interprocedural analyses that can significantly improve an automatic parallelizer. As a parallel programming tool, the Explorer actively guides the programmers in the parallelization process using a set of advanced static and dynamic analyses and visualization techniques. Our interprocedural program analyses provide high- quality information that restricts the need for user assistance. The Explorer is also the first tool to apply slicing analysis to aid the programmer in uncovering program properties for interactive parallelization. These static and dynamic analyses minimize the number of lines of code requiring programmer assistance to produce parallel codes for real-world applications. As a tool for finding missing compiler techniques, SUIF Explorer helps the compiler researchers design the next-generation parallelizer. Our experience with the Explorer shows that interprocedural array liveness analysis is an enabler of several important optimizations, such as privatization and array contraction. We developed and evaluated an efficient context-sensitive and flow-sensitive interprocedural array liveness algorithm and integrated it into the parallelizer. We use the liveness information to enable contraction of arrays that are not live at loop exits, which results in a smaller memory footprint and better cache utilization. The resulting codes run faster on both uni- and multi- processors. Another key interprocdural analysis which we developed and evaluated is the array reduction analysis. Our reduction algorithm extends beyond previous approaches in its ability to locate reductions to array regions, even in the presence of arbitrarily complex data dependences. To exploit the multiprocessors effectively, the algorithm can locate interprocedural reductions, reduction operations that span multiple procedures. In summary, we successfully apply the Explorer to help the user develop parallel codes effectively and to help the compiler researcher develop the next-generation parallelizer.