Report Number: CS-TR-91-1351
Institution: Stanford University, Department of Computer Science
Title: Sequence vs. pipeline parallel multiple joins in Paradata
Author: Z hu, Liping
Author: Keller, Arthur M.
Author: Wiederhold, Gio
Date: February 1991
Abstract: In this report we analyze and compare hash-join based
parallel multi-join algorithms for sequenced and pipelined
processing. The BBN Butterfly machine serves as the host for
the performance analysis. The sequenced algorithm handles the
multiple join operations in a conventional sequenced manner,
except that it distributes the work load of each operation
among all processors. The pipelined algorithms handle the
different join operations in parallel, by dividing the
processors into several groups, with the data flowing through
these groups.
The detailed timing tests revealed the bus/memory contention
that grows linearly with the number of processors. The
existence of such a contention leads to an optimal region for
the number of processors, given the join operands fixed. We
present the analytical and experimental formulae for both
algorithms, which incorporate this contention. We discuss the
way of finding an optimal point, and give the heuristics for
choosing the best processor's partition in pipelined
processing.
The study shows that the pipelined algorithms produce the
first joined result sooner than the sequenced algorithm and
need less memory to store the intermediate result. The
sequenced algorithm, on the other hand, takes less time to
finish the whole join operations.
http://i.stanford.edu/pub/cstr/reports/cs/tr/91/1351/CS-TR-91-1351.pdf