In this paper we argue that, under a broad range of
circumstances, all data reduction techniques will result
in some decrease in tree size with little impact
on accuracy. Section 2 oers detailed empirical evidence
for the validity of this claim, but an intuitive
feeling for why it might be true can be grasped by
looking at Figure 1. The gure shows plots of tree
size and accuracy as a function of training set size for
the UC Irvine australian dataset. c4.5 was used to
generate the trees (Quinlan 1993) and each plot corresponds
to a dierent pruning mechanism: error-based
(ebp { the c4.5 default) (Quinlan 1993), reduced error
(rep) (Quinlan 1987), minimum description length
(mdl) (Quinlan & Rivest 1989), cost-complexity with
the 1se rule (ccp1se) (Breiman et al. 1984), and costcomplexity
without the 1se rule (ccp0se). On the
left-hand side of the graphs, no training instances are
available and the best one can do with test instances is
to assign them a class label at random. On the righthand
side of the graph, the entire dataset (excluding
test instances) is available to the tree building process.
Movement from the the left to the right corresponds
to the addition of randomly selected instances to the
training set. Alternatively, moving from the right to
the left corresponds to removing randomly selected instances
from the training set.