We present a new method for top-down induction of
decision trees (TDIDT) with multivariate binary splits
at the nodes. The primary contribution of this work
is a new splitting criterion called soft entropy, which
is continuous and differentiable with respect to the pa-
rameters of the splitting function. Using simple gradi-
ent descent to find multivariate splits and a novel prun-
ing technique, our TDIDT-SEH (Soft Entropy Hyper-
planes) algorithm is able to learn very small trees with
better accuracy than competing learning algorithms on
most datasets examined.
The process of finding a splitting function at a node
of a decision tree is a search problem, and we choose
to view it as unconstrained parametric function op-
timization over the space of hyperplane weight vec-
tors w E Rn. Our objective function is soft entropy, a
new continuous approximation to the entropy measure
(Quinlan 1986). Soft entropy was chosen for two rea-
sons. First, it is well-established that entropy is a good
splitting criterion (Buntine & Niblett 1992). Second,
softness is important to get good generalization in con-
tinuous spaces, as shown in Figure 1. Related work is
similar overall, but the OCl algorithm of Murthy et
al. (1993) uses entropy as a criterion, and Brodley and
Utgoff (1992) d escribe algorithms using error, also a
hard splitting criterion.
8