Report Number: CS-TR-93-1494
Institution: Stanford University, Department of Computer Science
Title: Index Structures for Information Filtering Under the Vector
Space Model
Author: Yan, Tak W.
Author: Garcia-Molina, Hector
Date: December 1993
Abstract: With the ever increasing volumes of information generation,
users of information systems are facing an information
overload. It is desirable to support information filtering as
a complement to traditional retrieval mechanism. The number
of users, and thus profiles (representing users' long-term
interests), handled by an information filtering system is
potentially huge, and the system has to process a constant
stream of incoming information in a timely fashion. The
efficiency of the filtering process is thus an important
issue.
In this paper, we study what data structures and algorithms
can be used to efficiently perform large-scale information
filtering under the vector space model, a retrieval model
established as being effective. We apply the idea of the
standard inverted index to index user profiles. We devise an
alternative to the standard inverted index, in which we,
instead of indexing every term in a profile, select only the
significant ones to index. We evaluate their performance and
show that the indexing methods require orders of magnitude
fewer I/Os to process a document than when no index is used.
We also show that the proposed alternative performs better in
terms of I/O and CPU processing time in many cases.
http://i.stanford.edu/pub/cstr/reports/cs/tr/93/1494/CS-TR-93-1494.pdf