Report Number: CS-TN-94-10
Institution: Stanford University, Department of Computer Science
Title: Precision and Recall of GlOSS Estimators for Database
Discovery
Author: Tomasic, Anthony
Author: Gravano, Luis
Author: Garcia-Molina, Hector
Date: July 1994
Abstract: The availability of large numbers of network information
sources has led to a new problem: finding which text
databases (out of perhaps thousands of choices) are the most
relevant to a query. We call this the text-database discovery
problem. Our solution to this problem,
GlOSS--Glossary-Of-Servers Server, keeps statistics on the
available databases to decide which ones are potentially
useful for a given query. In this paper we present different
query-result size estimators for GlOSS and we evaluate them
with metrics based on the precision and recall concepts of
text-document information-retrieval theory. Our
generalization of these metrics uses different notions of the
set of relevant databases to define different query
semantics.
http://i.stanford.edu/pub/cstr/reports/cs/tn/94/10/CS-TN-94-10.pdf