Report Number: CS-TR-97-1595
Institution: Stanford University, Department of Computer Science
Title: Maintaining data warehouses under limited source access
Author: Huyn, Nam
Date: September 1997
Abstract: A data warehouse stores views derived from data that may not
reside at the warehouse. Using these materialized views, user
queries can be answered quickly because querying the external
sources where the base data reside is avoided. However, when
the sources change, the views in the warehouse can become
inconsistent with the base data and must be maintained. A
variety of approaches have been proposed for maintaining
these views incrementally. At the one end of the spectrum,
the required view updates are computed without restricting
which base relations can be used. View maintenance with this
approach is simple but can be expensive, since it may involve
querying the external data sources. At the other end of the
spectrum, additional views are stored at the warehouse to
make sure that there is enough information to maintain the
views without ever having to query the data sources. While
this approach saves on external source access, it may require
a large amount of information to be stored and maintained at
the warehouse. In this thesis, we propose an intermediate
approach to warehouse maintenance based on what we call {\em
Runtime View Self-Maintenance}, where the views are
incrementally maintained without using all the base relations
but without requiring additional views to facilitate
maintenance. Under limited information, however, maintaining
a view unambiguously may not always be possible. Thus, the
main questions in runtime view self-maintenance are:
- View self-maintainability. Under what conditions (on the
given information) can a view be maintained unambiguously
with respect to a given update? - View self-maintenance. If a
view can be maintained unambiguously, how do we maintain it
using only the given information?
The information we consider using for maintaining a view
includes:
- At least the contents of the view itself and the update
instance - Optionally, the contents of other views in the
warehouse, functional dependencies the base relations are
known to satisfy, a subset of the base relations, and partial
contents of a base relation.
Developing efficient complete solutions for the runtime
self-maintenance of conjunctive-query views is the main focus
and the main contribution of this thesis.
http://i.stanford.edu/pub/cstr/reports/cs/tr/97/1595/CS-TR-97-1595.pdf