Report Number: CS-TR-97-1587
Institution: Stanford University, Department of Computer Science
Title: Ensembles for Supervised Classification Learning
Author: Matan, Ofer
Date: March 1997
Abstract: This dissertation studies the use of multiple classifiers (ensembles or committees) in learning tasks. Both theoretical and practical aspects of combining classifiers are studied. First we analyze the representational ability of voting ensembles. A voting ensemble may perform either better or worse than each of its individual members. We give tight upper and lower bounds on the classification performance of a voting ensemble as a function of the classification performances of its individual members. Boosting is a method of combining multiple "weak" classifiers to form a "strong" classifier. Several issues concerning boosting are studied in this thesis. We study SBA, a hierarchical boosting algorithm proposed by Schapire, in terms of its representation and its search. We present a rejection boosting algorithm that trades-off exploration and exploitation: It requires fewer pattern labels at the expense of lower boosting ability. Ensembles may be useful in gaining information. We study their use to minimize labeling costs of data and to enable improvements on performance over time. For that purpose a model for on-site learning is presented. The system learns by querying "hard" patterns while classifying "easy" ones.