Our work is driven by the vision of a Global InfoBase (GIB):
a ubiquitous and universal information resource, simple to use, up to date, and
comprehensive. The project consists of four interrelated thrusts:
(i) Combining Technologies: integrating technologies for information
retrieval, database management, and hypertext navigation, to achieve a
"universal" information model;
(ii) Personalization: developing tools for personalizing information
management;
(iii) Semantics: Using natural-language processing and structural
techniques for analyzing the semantics of Web pages; and
(iv) Data Mining: designing new algorithms for mining information in
order to synthesize new knowledge.
This project is supported by the National Science Foundation under Grant IIS-0085896.
Faculty
Students (full-time, part-time, grad, undergrad and alums)
- A. Arasu and H.
Garcia-Molina. Extracting
structured data from web pages. Proceedings of the ACM SIGMOD
International Conference on Management of Data, June 2003.
- A. Arasu, J. Cho,
H. Garcia-Molina, A. Paepcke, and S. Raghavan. Searching the Web. ACM
Transactions on Internet Technology, 1(1):2-43, August 2001.
- B. Babcock,
Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data
stream systems. Proceedings of the ACM Symposium on Principles of
Database Systems, pages 1-16, 2002.
- Brian Babcock and
Chris Olston. Distributd
top-k monitoring. Proceedings of the ACM SIGMOD International
Conference on Management of Data, June 2003.
- P. Ganesan; H. Garcia-Molina
and J. Widom. Exploiting Hierarchical Domain Structure to Compute Similarity.
ACM Transactions on Information Systems (TOIS), Vol. 21, No. 1, 2003, pp.
64-93.
- T. Haveliwala, Sepandar
Kamvar, Dan Klein, Christopher Manning, and Gene Golub. Computing PageRank using
Power Extrapolation , Preprint,
July 2003.
- T. Haveliwala, Sepandar D.
Kamvar, and Glen Jeh. An
Analytical Comparison of Approaches to Personalizing PageRank ,
Preprint, June 2003.
- T. Haveliwala. Topic-Sensitive PageRank.
Proceedings of the Eleventh International World Wide Web Conference, 2002.
- T. Haveliwala, A.
Gionis, D. Klein, and P. Indyk. Evaluating Strategies for
Similarity Search on the Web. Proceedings of the Eleventh
International World Wide Web Conference, 2002.
- T. Haveliwala. Search
Facilities for Internet Relay Chat. Proceedings of the Joint Conference on
Digital Libraries (Poster session), 2002.
- Glen Jeh and
Jennifer Widom. SimRank:
A Measure of Structural-Context Similarity. Technical Report, Computer
Science Department, Stanford University, 2001.
- Glen Jeh and
Jennifer Widom. Scaling
Personalized Web Search. In Proceedings of the 12th International
World Wide Web Conference.
- G. Jeh and J. Widom. Mining the Space of
Graph Properties. In Proceedings of the Tenth ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, Seattle, Washington,
August 2004.
- S. D. Kamvar, D. Klein, and
C. Manning. Interpreting
and Extending Classical Agglomerative Clustering Algorithms using a
Model-Based Approach. Proceedings of the Nineteenth International
Conference on Machine Learning, 2002.
- S. D. Kamvar, Taher H.
Haveliwala, Christopher D. Manning, and Gene H. Golub. Extrapolation methods for
accelerating PageRank computations. In Proceedings of the 12th
International World Wide Web Conference, May 2003.
- S. D. Kamvar,
Taher H. Haveliwala, Christopher D. Manning, and Gene H. Golub. Exploiting the block
structure of the web for computing PageRank. Technical Report,
Computer Science Department, Stanford University, 2003.
- S. D. Kamvar, Dan Klein and
Christopher D. Manning. Spectral
Learning Proceedings of the
International Joint Conference on Artificial Intelligence 2003.
- S. D. Kamvar and Taher H.
Haveliwala. The
Condition Number of the PageRank Problem. Preprint, June 2003.
- Taher H. Haveliwala and
Sepandar D. Kamvar. The
Second Eigenvalue of the Google Matrix. Preprint, April 2003.
- D. Klein, Joseph Smarr, Huy
Nguyen, and Christopher D. Manning. Named Entity
Recognition with Character-Level Models. In CoNLL 2003.
- D. Klein and T.
Haveliwala. Concise Labeling of Document Clusters. Submitted. Technical
Report, Stanford University, April 2002.
- D. Klein, Joseph Smarr, Huy
Nguyen, and Christopher D. Manning. Named Entity
Recognition with Character-Level Models Proceedings the Seventh Conference on Natural Language
Learning,2003, pp. 180-183.
- D. Klein and Christopher D.
Manning. Accurate
Unlexicalized Parsing.
Proceedings of the Association for Computational Linguistics 2003.
- D. Klein, S. Kamvar, and C.
Manning. From
Instance-level Constraints to Space-level Constraints: Making the Most of
Prior Knowledge in Data Clustering. Proceedings of the Nineteenth
International Conference on Machine Learning, 2002.
- D. Klein and Christopher D.
Manning. Fast Exact
Inference with a Factored Model for Natural Language Parsing. To appear in Suzanna Becker, Sebastian
Thrun, and Klaus Obermayer (eds), Advances in Neural Information Processing
Systems 15 (NIPS 2002).
- D. Klein and Christopher D.
Manning. A* Parsing:
Fast Exact Viterbi Parse Selection.
Proceedings of Human Language Technology/North American Association
for Computational Linguistics 2003.
- C. Olston and J.
Widom. Best-Effort
Cache Synchronization with Source Cooperation. To appear: SIGMOD 2002.
- C. Olston and J.
Widom. Approximate
Caching for Continuous Queries over Distributed Data Sources .
February 2002 Technical Report.
- C. Olston, B. T.
Loo and J. Widom. Adaptive
Precision Setting for Cached Approximate Values. ACM SIGMOD 2001.
International Conference on Management of Data, May 2001.
- C. Olston, J. Jiang, and J.
Widom. Adaptive filters
for continuous queries over distributed data streams.. Proceedings of
the ACM SIGMOD International Conference on Management of Data, June 2003
- S. Raghavan and H.
Garcia-Molina. Crawling
the hidden Web. Proceedings of the 27th International Conf. on Very
Large Databases (VLDB), pp. 129-138, September 2001.
- S. Raghavan and H.
Garcia-Molina. Integrating
Diverse Information Management Systems: A Brief Survey. Proceedings of
the IEEE Data Engineering Bulleting, December 2001.
- S. Raghavan and H.
Garcia-Molina. Representing
Web graphs. Proceedings of the IEEE International Conference on Data
Engineering, March 2003.
- S. Raghavan and H.
Garcia-Molina. Complex
queries over Web repositories. To appear in the Proceedings of the
29th International Conference on Very Large Databases (VLDB), September
2003.
- Q. Su and J. Widom. Indexing Relational
Database Content Offline for Efficient Keyword-Based Search. Submitted
for conference publication, May 2004.
- K. Toutanova, Dan Klein,
Christopher D. Manning, and Yoram Singer.
Feature-Rich
Part-of-Speech Tagging with a Cyclic Dependency Network. Proceedings of Human Language
Technology/North American Association for Computational Linguistics 2003.
- Cheng Yang. "Music
Database Retrieval Based on Spectral Similarity." In
International Symposium on Music Information Retrieval, October 2001.
- U. Srivastava, J. Widom, K.
Munagala, R. Motwani. Query
Optimization over Web Services. Technical Report, October 2005.
- Gyongyi, Zoltan; Garcia-Molina, Hector;
Pedersen, Jan. Combating Web Spam with TrustRank. Combating Web Spam with
TrustRank, International Conference on Very Large Databases (VLDB), Toronto,
Canada, August 29, 2004 (with Zoltan Gyongyi).
- Jenny Finkel, Shipra Dingare,
Huy Nguyen, Malvina Nissim, Christopher Manning, and Gail Sinclair. 2004. Exploiting
Context for Biomedical Entity Recognition: From Syntax to the Web.
Joint Workshop on Natural Language Processing in Biomedicine and its
Applications at Coling 2004.
- Shipra Dingare, Jenny Finkel,
Malvina Nissim, Christopher Manning, and Claire Grover. 2004. A System For
Identifying Named Entities in Biomedical Text: How Results From Two
Evaluations Reflect on Both the System and the Evaluations. In The
2004 BioLink meeting: Linking Literature, Information and Knowledge for
Biology at ISMB 2004. Republished as Shipra Dingare, Malvina Nissim, Jenny
Finkel, Christopher Manning, and Claire Grover. 2005. Comparative and
Functional Genomics 6: 77-85.
- Galen Andrew, Trond Grenager,
and Christopher Manning. 2004. Verb Sense and
Subcategorization: Using Joint Inference to Improve Performance on
Complementary Tasks. EMNLP 2004, pp. 150-157.
WWW
Sites relevant to the project include: DB Group home page, Infolab home page, NLP Group home page, Digital Libraries project home page.
Final Report
Progress Report,
2004
Progress Report,
2003
Progress Report,
2002
Progress Report,
2001
Last modified: Feb 28, 2006