Report Number: CS-TN-95-21
Institution: Stanford University, Department of Computer Science
Title: Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies
Author: Gravano, Luis
Author: Garcia-Molina, Hector
Date: April 1995
Abstract: As large numbers of text databases have become available on the Internet, it is getting harder to locate the right sources for given queries. In this paper we present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statistics on the available databases to estimate which databases are the potentially most useful for a given query. gGlOSS extends our previous work, which focused on databases using the boolean model of document retrieval, to cover databases using the more sophisticated vector-space retrieval model. We evaluate our new techniques using real-user queries and 53 databases. Finally, we further generalize our approach by showing how to build a hierarchy of gGlOSS brokers. The top level of the hierarchy is so small it could be widely replicated, even at end-user workstations.