Report Number: CS-TR-89-1261
Institution: Stanford University, Department of Computer Science
Title: Pipelined parallel computations, and sorting on a pipelined hypercube.
Author: Mayr, Ernst W.
Author: Plaxton, C. Greg
Date: May 1989
Abstract: This paper brings together a number of previously known techniques in order to obtain practical and efficient implementations of the prefix operation for the complete binary tree, hypercube and shuffle exchange families of networks. For each of these networks, we also provide a "pipelined" scheme for performing k prefix operations in O(k + log p) time on p processors. This implies a similar pipelining result for the "data distribution" operation of Ullman [16]. The data distribution primitive leads to a simplified implementation of the optimal merging algorithm of Varman and Doshi, which runs on a pipelined model of the hypercube [17]. Finally, a pipelined version of the multi-way merge sort of Nassimi and Sahni [10], running on the pipelined hypercube model, is described. Given p processors and n < p log p values to be sorted, the running time of the pipelined algorithm is O(log2 p/log((p log p)/n)). Note that for the interesting case n = p this yields a running time of 0(log2 p/log log p), which is asymptotically faster than Batcher's bitonic sort[3].
http://i.stanford.edu/pub/cstr/reports/cs/tr/89/1261/CS-TR-89-1261.pdf