We reviewed and experimentally compared the main approaches for learning
probability trees including a novel variant based on the Bayesian Information
Criterion (BIC). We conclude that overall the C4.4-approach performs best, and
the C4.5-approach second best. However, trees are much smaller for the latter
than for the former. Interestingly, if the number of classes is low, BIC performs
equally well. An additional advantage of BIC is that its trees are considerably
smaller than trees for the C4.5- or C4.4-approaches. If the number of classes is
too high (≥ 8 in our experiments), BIC fails because trees are too small.