Report Number: CS-TR-72-264
Institution: Stanford University, Department of Computer Science
Title: An artificial intelligence approach to machine translation.
Author: Wilks, Yorick A.
Date: February 1972
Abstract: The paper describes a system of semantic analysis and generation, programmed in LISP 1.5 and designed to pass from paragraph length input in English to French via an interlingual representation. A wide class of English input forms will be covered, but the vocabulary will initially be restricted to one of a few hundred words. With this subset working, and during the current year (71-72), it is also hoped to map the interlingual representation onto some predicate calculus notation so as to make possible the answering of very simple questions about the translated matter. The specification of the translation system itself is complete, and its main points of interest that distinguish it from other systems are: i) It translated phrase by phrase -- with facilities for reordering phrases and establishing essential semantic connectivities between them -- by mapping complex semantic structures of "message" onto each phrase. These constitute the interlingual representation to be translated. This matching is done without the explicit use of a conventional syntax analysis, by taking as the appropriate matched structure the "most dense" of the alternative structures derived. This method has been found highly successful in earlier versions of this analysis system. ii) The French output strings are generated without the explicit use of a generative grammar. That is done by means of STEREOTYPES: strings of French words, and functions evaluating to French words, which are attached to English word senses in the dictionary and built into the interlingual representation by the analysis routines. The generation program thus receives an interlingual representation that already contains both French output and implicit procedures for assembling the output, since the stereotypes are in effect recursive procedures specifying the content and production of the ouput word strings. Thus the generation program at no time consults a word dictionary or inventory of grammar rules. It is claimed that the system of notation and translation described is a convenient one for expressing and handling the items of semantic information that are ESSENTIAL to any effective MT system, I discuss in some detail the semantic information needed to ensure the correct choice of output prepositions in French, a vital matter inadequately treated by virtually all previous formalisms and projects.