Report Number: CS-TR-99-1623
Institution: Stanford University, Department of Computer Science
Title: Efficient Maintenance and Recovery of Data Warehouses
Author: Labio, Wilburt Juan
Date: August 1999
Abstract: Data warehouses collect data from multiple remote sources and integrate the information as materialized views in a local database. The materialized views are used to answer queries that analyze the collected data for patterns, and trends. This type of query processing is often called on-line analytical processing (OLAP). The warehouse views must be updated when changes are made to the remote information sources. Otherwise, the answers to OLAP queries are based on stale data. Answering OLAP queries based on stale data is clearly a problem especially if OLAP queries are used to support critical decisions made by the organization that owns the data warehouse. Because the primary purpose of the data warehouse is to answer OLAP queries, only a limited amount of time and/or resources can be devoted to the warehouse update. Hence, we have developed new techniques to ensure that the warehouse update can be done efficiently. Also, the warehouse update is not devoid of failures. Since only a limited amount of time and/or resources are devoted to the warehouse update, it is most likely infeasible to restart the warehouse update from scratch. Thus, we have developed new techniques for resuming failed warehouse updates. Finally, warehouse updates typically transfer gigabytes of data into the warehouse. Although the price of disk storage is decreasing, there will be a point in the ``lifetime" of a data warehouse when keeping and administering all of the collected is unreasonable. Thus, we have investigated techniques for reducing the storage cost of a data warehouse by selectively ``expiring'' information that is not needed.