Christopher Manning
Assistant Professor of Computer Science and Linguistics
Dept of Computer Science, Gates Building 4A, 353
Serra Mall, Stanford CA 94305-9040, USA
manning@cs.stanford.edu
When people see web pages, they understand their meaning. When computers see
web pages, they get only words and HTML tags. We'd like computers to see meanings
as well, so that computer agents could more intelligently process the web. These
desires have led to XML, RDF, agent markup languages, and a host of other technologies
which attempt to impose more syntax and semantics on the web -- to make life
easier for agents. Now, while some of these technologies are certain to see
a lot of use (XML), and some of the others may or may well not, I think their
proponents all rather miss the mark in believing that the solution lies in mandating
standards for semantics. This will fail for several reasons: (i) a lot of the
meaning in web pages (as in any communication) derives from the context -- what
is referred to in the philosophy of language tradition as pragmatics (ii) semantic
needs and usages evolve (like languages) more rapidly than standards (cf. the
Académie française), (iii) meaning transfer frequently has to occur across the
subcommunities that are currently designing *ML languages, and then all the
problems reappear, and the current proposals don't do much to help, and (iv)
a lot of the time people won't use the standards -- it's just like how newspaper
advertisements rarely contain spec sheets. I will argue that, yes, agents need
knowledge, ontologies, etc., to interpret web pages, but the aim necessarily
has to be to design agents that can interpret information in context, regardless
of the form in which it appears. And for that goal work in natural language
processing is of some use, because that field has long been dealing with the
uncertain contextual interpretation of ambiguous information. In case the abstract
so far hasn't made it obvious: I intend this as more a pontifical than a technical
talk, but will discuss a little relevant natural language processing technologies.
Christopher Manning is assistant professor of computer science and linguistics at Stanford University. Previously, he held faculty positions at Carnegie Mellon University and the University of Sydney. His research interests include probabilistic models of language and statistical natural language processing, constraint-based theories of grammar, parsing systems, computational lexicography, information extraction and text mining, and topics in syntactic theory and typology.