Report Number: CS-TR-97-1598
Institution: Stanford University, Department of Computer Science
Title: Query Planning and Optimization in Information Integration
Author: Duschka, Oliver M.
Date: December 1997
Abstract: Information integration systems provide uniform user
interfaces to varieties of different information sources. Our
work focuses on query planning in such systems. Query
planning is the task of transforming a user query,
represented in the user's interface language and vocabulary,
into queries that can be executed by the information sources.
Every information source might require a different query
language and might use different vocabularies.
We show that query plans with a fixed number of database
operations are insufficient to extract all information from
the sources, if functional dependencies or limitations on
binding patterns are present. Dependencies complicate query
planning because they allow query plans that would otherwise
be invalid. We present an algorithm that constructs query
plans that are guaranteed to extract all available
information in these more general cases. This algorithm is
also able to handle datalog user queries.
We examine further extensions of the languages allowed for
user queries and for describing information sources:
disjunction, recursion and negation in source descriptions,
negation and inequality in user queries. For these more
expressive cases, we determine the data complexity required
of languages able to represent "best possible" query plans.