Report Number: CS-TR-99-1623
Institution: Stanford University, Department of Computer Science
Title: Efficient Maintenance and Recovery of Data Warehouses
Author: Labio, Wilburt Juan
Date: August 1999
Abstract: Data warehouses collect data from multiple remote sources and
integrate the information as materialized views in a local
database. The materialized views are used to answer queries
that analyze the collected data for patterns, and trends.
This type of query processing is often called on-line
analytical processing (OLAP).
The warehouse views must be updated when changes are made to
the remote information sources. Otherwise, the answers to
OLAP queries are based on stale data. Answering OLAP queries
based on stale data is clearly a problem especially if OLAP
queries are used to support critical decisions made by the
organization that owns the data warehouse. Because the
primary purpose of the data warehouse is to answer OLAP
queries, only a limited amount of time and/or resources can
be devoted to the warehouse update. Hence, we have developed
new techniques to ensure that the warehouse update can be
Also, the warehouse update is not devoid of failures. Since
only a limited amount of time and/or resources are devoted to
the warehouse update, it is most likely infeasible to restart
the warehouse update from scratch. Thus, we have developed
new techniques for resuming failed warehouse updates.
Finally, warehouse updates typically transfer gigabytes of
data into the warehouse. Although the price of disk storage
is decreasing, there will be a point in the ``lifetime" of a
data warehouse when keeping and administering all of the
collected is unreasonable. Thus, we have investigated
techniques for reducing the storage cost of a data warehouse
by selectively ``expiring'' information that is not needed.