Report Number: CS-TN-96-36
Institution: Stanford University, Department of Computer Science
Title: Efficient Snapshot Differential Algorithms for Data
Warehousing
Author: Garcia-Molina, Hector
Author: Labio, Wilburt Juan
Date: June 1996
Abstract: Detecting and extracting modifications from information
sources is an integral part of data warehousing. For
unsophisticated sources, in practice it is often necessary to
infer modifications by periodically comparing snapshots of
data from the source. Although this em snapshot differential
problem is closely related to traditional joins and
outerjoins, there are significant differences, which lead to
simple new algorithms. In particular, we present algorithms
that perform (possibly lossy) compression of records. We also
present a {\em window} algorithm that works very well if the
snapshots are not ``very different.'' The algorithms are
studied via analysis and an implementation of two of them;
the results illustrate the potential gains achievable with
the new algorithms.
http://i.stanford.edu/pub/cstr/reports/cs/tn/96/36/CS-TN-96-36.pdf