Report Number: CS-TR-96-1570
Institution: Stanford University, Department of Computer Science
Title: Optimization of SQL Queries for Parallel Machines
Author: Hasan, Waqar
Date: May 1996
Abstract: Parallel execution offers a method for reducing the response time of queries against large databases. We address the problem of parallel query optimization: Given a declarative SQL query, find a procedural parallel plan that delivers the query result in minimal time. We develop optimization algorithms using models that incorporate both sources and obstacles to speedup. We address independent, pipelined and partitioned parallelism. We incorporate inherent constraints on available parallelism and the extra cost of parallel execution. Our models are motivated by experiments with NonStop SQL, a commercial parallel DBMS. We adopt a two-phase approach to parallel query optimization: JOQR (join ordering and query rewrite), followed by parallelization. JOQR minimizes total work. Then, parallelization spreads work among processors to minimize response time. For JOQR, we model communication costs and abstract physical characteristics of data as colors. We devise tree coloring and reordering algorithms that are efficient and optimal. We model parallelization as scheduling a tree whose nodes represent operators and edges represent parallel/precedence constraints. Computation/communication costs are represented as node/edge weights. We prove worst-case bounds on the performance ratios of our algorithms and measure average cases using simulation. Our results enable the construction of SQL compilers that effectively exploit parallel machines.