In all ve graphs in Figure 1, accuracy peaks with
small numbers of training instances, thereafter remaining
almost constant. Surprisingly, tree size continues
to grow nearly linearly in three of the graphs. Growth
continued despite two important facts: (1) accuracy
has ceased to increase; and (2) c4.5 is pruning the
trees to avoid overtting. The graphs clearly show that
overtting is occurring, and it gets worse as the size
of the training set increases. For example, with ebp,
accuracy peaks after only 25% of the available training
instances are seen. The tree at that point contains 22
nodes. When 100% of the available training instances
are used in tree construction, the resulting tree contains
64 nodes. Despite a 3-fold increase in size over
the tree built with 25% of the data, the accuracies of
the two trees are statistically indistinguishable.