Report Number: CS-TN-94-12
Institution: Stanford University, Department of Computer Science
Title: Cross-Validated C4.5: Using Error Estimation for Automatic
Parameter Selection
Author: John, George H.
Date: October 1994
Abstract: Machine learning algorithms for supervised learning are in
wide use. An important issue in the use of these algorithms
is how to set the parameters of the algorithm. While the
default parameter values may be appropriate for a wide
variety of tasks, they are not necessarily optimal for a
given task. In this paper, we investigate the use of
cross-validation to select parameters for the C4.5 decision
tree learning algorithm. Experimental results on five
datasets show that when cross-validation is applied to
selecting an important parameter for C4.5, the accuracy of
the induced trees on independent test sets is generally
higher than the accuracy when using the default parameter
value.
http://i.stanford.edu/pub/cstr/reports/cs/tn/94/12/CS-TN-94-12.pdf