Report Number: CS-TN-94-10
Institution: Stanford University, Department of Computer Science
Title: Precision and Recall of GlOSS Estimators for Database Discovery
Author: Tomasic, Anthony
Author: Gravano, Luis
Author: Garcia-Molina, Hector
Date: July 1994
Abstract: The availability of large numbers of network information sources has led to a new problem: finding which text databases (out of perhaps thousands of choices) are the most relevant to a query. We call this the text-database discovery problem. Our solution to this problem, GlOSS--Glossary-Of-Servers Server, keeps statistics on the available databases to decide which ones are potentially useful for a given query. In this paper we present different query-result size estimators for GlOSS and we evaluate them with metrics based on the precision and recall concepts of text-document information-retrieval theory. Our generalization of these metrics uses different notions of the set of relevant databases to define different query semantics.