Report Number: CS-TR-97-1595
Institution: Stanford University, Department of Computer Science
Title: Maintaining data warehouses under limited source access
Author: Huyn, Nam
Date: September 1997
Abstract: A data warehouse stores views derived from data that may not reside at the warehouse. Using these materialized views, user queries can be answered quickly because querying the external sources where the base data reside is avoided. However, when the sources change, the views in the warehouse can become inconsistent with the base data and must be maintained. A variety of approaches have been proposed for maintaining these views incrementally. At the one end of the spectrum, the required view updates are computed without restricting which base relations can be used. View maintenance with this approach is simple but can be expensive, since it may involve querying the external data sources. At the other end of the spectrum, additional views are stored at the warehouse to make sure that there is enough information to maintain the views without ever having to query the data sources. While this approach saves on external source access, it may require a large amount of information to be stored and maintained at the warehouse. In this thesis, we propose an intermediate approach to warehouse maintenance based on what we call {\em Runtime View Self-Maintenance}, where the views are incrementally maintained without using all the base relations but without requiring additional views to facilitate maintenance. Under limited information, however, maintaining a view unambiguously may not always be possible. Thus, the main questions in runtime view self-maintenance are: - View self-maintainability. Under what conditions (on the given information) can a view be maintained unambiguously with respect to a given update? - View self-maintenance. If a view can be maintained unambiguously, how do we maintain it using only the given information? The information we consider using for maintaining a view includes: - At least the contents of the view itself and the update instance - Optionally, the contents of other views in the warehouse, functional dependencies the base relations are known to satisfy, a subset of the base relations, and partial contents of a base relation. Developing efficient complete solutions for the runtime self-maintenance of conjunctive-query views is the main focus and the main contribution of this thesis.