There has been difficulty in finding natural theoretical
models that provide a precise yet useful language in which
to discuss the performance of topdown decision tree
learning heuristics, to compare variants of these heuristics,
and to compare them to learning algorithms that use entirely
different representations and approaches. For example,
even if we make the rather favorable assumption that there
is a small decision tree labeling the data and that the inputs
are distributed uniformly (thus, we are in the well-known
probably approximately correct or PAC model [18], with
the additional restriction to the uniform distribution), the
problem of finding any efficient algorithm with provably
nontrivial performance remains an elusive open problem in
computational learning theory. Furthermore, superpolynomial
lower bounds for this same problem have been proven
for a wide class of algorithms that includes the topdown
decision tree approach (and also all variants of this approach
that have been proposed to date) [2]. The positive results
for efficient decision tree learning in computational learning
theory all make extensive use of membership queries [4, 5,
11, 14] which provide the learning algorithm with blackbox
access to the target function (experimentation), rather than only an oracle for random examples. Clearly, the need
for membership queries severely limits the potential application
of such algorithms, and they seem unlikely to encroach
on the popularity of the topdown decision tree algorithms
without significant new ideas. In summary, it seems fair to
say that, despite their other successes, the models of computational
learning theory have not yet provided significant
insight into the apparent empirical success of programs like
C4.5 and CART