Our work is driven by the vision of a Global InfoBase (GIB): a ubiquitous and
universal information resource, simple to use, up to date, and comprehensive.
The project consists of four interrelated thrusts:
(i) Combining Technologies: integrating technologies for information
retrieval, database management, and hypertext navigation, to achieve a
"universal" information model;
(ii) Personalization: developing tools for personalizing information
(iii) Semantics: Using natural-language processing and structural techniques
for analyzing the semantics of Web pages; and
(iv) Data Mining: designing new algorithms for mining information in order to
synthesize new knowledge.
This project is supported by the National Science Foundation under Grant EIA
Students (full-time and part-time, grad and undergrad)
Sriram Raghavan and Hector Garcia-Molina. Integrating
Diverse Information Management Systems: A Brief Survey. Proceedings of
the IEEE Data Engineering Bulleting, December 2001.
Arvind Arasu and Hector Garcia-Molina. Extracting
structured data from web pages. Proceedings of the ACM SIGMOD
International Conference on Management of Data, June 2003.
Arvind Arasu, Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke, and Sriram
Raghavan. Searching the Web.
ACM Transactions on Internet Technology, 1(1):2-43, August 2001.
Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer
Widom. Models and issues in
data stream systems. Proceedings of the ACM Symposium on Principles of
Database Systems, pages 1-16, 2002.
Brian Babcock and Chris Olston. Distributd
top-k monitoring. Proceedings of the ACM SIGMOD International Conference
on Management of Data, June 2003.
Cheng Yang. "Music
Database Retrieval Based on Spectral Similarity." In International
Symposium on Music Information Retrieval, October 2001.
T. Haveliwala. Search Facilities for Internet Relay Chat. Proceedings of the
Joint Conference on Digital Libraries (Poster session), 2002.
C. Olston and J. Widom. Best-Effort
Cache Synchronization with Source Cooperation. To appear: SIGMOD 2002.
C. Olston and J. Widom. Approximate
Caching for Continuous Queries over Distributed Data Sources . February
2002 Technical Report.
C. Olston, B. T. Loo and J. Widom. Adaptive
Precision Setting for Cached Approximate Values. ACM SIGMOD 2001.
International Conference on Management of Data, May 2001.
D. Klein and T. Haveliwala. Concise Labeling of Document Clusters.
Submitted. Technical Report, Stanford University, April 2002.
Sriram Raghavan and Hector Garcia-Molina. Crawling
the hidden Web. Proceedings of the 27th International Conf. on Very
Large Databases (VLDB), pp. 129-138, September 2001.
T. Haveliwala. Topic-Sensitive
PageRank. Proceedings of the Eleventh International World Wide Web
Conference, 2002.
T. Haveliwala, A. Gionis, D. Klein, and P. Indyk. Evaluating
Strategies for Similarity Search on the Web. Proceedings of the Eleventh
International World Wide Web Conference, 2002.
D. Klein, S. Kamvar, and C. Manning. From
Instance-level Constraints to Space-level Constraints: Making the Most of
Prior Knowledge in Data Clustering. Proceedings of the Nineteenth
International Conference on Machine Learning, 2002.
S. Kamvar, D. Klein, and C. Manning. Interpreting
and Extending Classical Agglomerative Clustering Algorithms using a
Model-Based Approach. Proceedings of the Nineteenth International
Conference on Machine Learning, 2002.
Glen Jeh and Jennifer Widom. SimRank:
A Measure of Structural-Context Similarity. Technical Report, Computer
Science Department, Stanford University, 2001.
Glen Jeh and Jennifer Widom. Scaling
Personalized Web Search. In Proceedings of the 12th International World
Wide Web Conference.
Dan Klein, Joseph Smarr, Huy Nguyen, and Christopher D. Manning. Named
Entity Recognition with Character-Level Models. In CoNLL 2003.
Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, and Gene H.
Golub. Extrapolation
methods for accelerating PageRank computations. In Proceedings of the
12th International World Wide Web Conference, May 2003.
Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, and Gene H.
Golub. Exploiting the block
structure of the web for computing PageRank. Technical Report, Computer
Science Department, Stanford University, 2003.
C. Olston, J. Jiang, and J. Widom. Adaptive
filters for continuous queries over distributed data streams..
Proceedings of the ACM SIGMOD International Conference on Management of
Data, June 2003
Sriram Raghavan and Hector Garcia-Molina. Representing
Web graphs. Proceedings of the IEEE International Conference on Data
Engineering, March 2003.
Sriram Raghavan and Hector Garcia-Molina. Complex
queries over Web repositories. To appear in the Proceedings of the 29th
International Conference on Very Large Databases (VLDB), September 2003.
- Dan Klein, Joseph Smarr, Huy Nguyen, and Christopher D. Manning. Named
Entity Recognition with Character-Level Models Proceedings the Seventh Conference on Natural Language
Learning,2003, pp. 180-183.
- Sepandar D. Kamvar, Dan Klein and Christopher D. Manning. Spectral
Learning Proceedings of the
International Joint Conference on Artificial Intelligence 2003.
- Dan Klein and Christopher D. Manning.
Unlexicalized Parsing. Proceedings
of the Association for Computational Linguistics 2003.
- Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer.
Part-of-Speech Tagging with a Cyclic Dependency Network.
Proceedings of Human Language Technology/North American Association
for Computational Linguistics 2003.
- Dan Klein and Christopher D. Manning.
Parsing: Fast Exact Viterbi Parse Selection.
Proceedings of Human Language Technology/North American Association
for Computational Linguistics 2003.
- Dan Klein and Christopher D. Manning.
Exact Inference with a Factored Model for Natural Language Parsing.
To appear in Suzanna Becker, Sebastian Thrun, and Klaus Obermayer (eds),
Advances in Neural Information Processing Systems 15
(NIPS 2002).
Sites relevant to the project include: DB
Group home page, Infolab home page,
NLP Group home page, Digital
Libraries project home page.
Report, 2003
Report, 2002
Progress Report,
Last modified: July 7th 2003