Report Number: CS-TR-95-1560
Institution: Stanford University, Department of Computer Science
Title: Wrappers for Performance Enhancements and Oblivious Decision
Graphs.
Author: Kohavi, Ron
Date: September 1995
Abstract: In this doctoral dissertation, we study three basic problems
in machine learning and two new hypothesis spaces with
corresponding learning algorithms. The problems we
investigate are: accuracy estimation, feature subset
selection, and parameter tuning. The latter two problems are
related and are studied under the wrapper approach. The
hypothesis spaces we investigate are: decision tables with a
default majority rule (DTMs) and oblivious read-once decision
graphs (OODGs).
For accuracy estimation, we investigate cross-validation and
the~.632 bootstrap. We show examples where they fail and
conduct a large scale study comparing them. We conclude that
repeated runs of five-fold cross-validation give a good
tradeoff between bias and variance for the problem of model
selection used in later chapters.
We define the wrapper approach and use it for feature subset
selection and parameter tuning. We relate definitions of
feature relevancy to the set of optimal features, which is
defined with respect to both a concept and an induction
algorithm. The wrapper approach requires a search space,
operators, a search engine, and an evaluation function. We
investigate all of them in detail and introduce compound
operators for feature subset selection. Finally, we abstract
the search problem into search with probabilistic estimates.
We introduce decision tables with a default majority rule
(DTMs) to test the conjecture that feature subset selection
is a very powerful bias. The accuracy of induced DTMs is
surprisingly powerful, and we concluded that this bias is
extremely important for many real-world datasets. We show
that the resulting decision tables are very small and can be
succinctly displayed.
We study properties of oblivious read-once decision graphs
(OODGs) and show that they do not suffer from some inherent
limitations of decision trees. We describe a a general
framework for constructing OODGs bottom-up and specialize it
using the wrapper approach. We show that the graphs produced
are use less features than C4.5, the state-of-the-art
decision tree induction algorithm, and are usually easier for
humans to comprehend.
http://i.stanford.edu/pub/cstr/reports/cs/tr/95/1560/CS-TR-95-1560.pdf