Several authors have considered the problem of constructing tree-structured classifiers that have linear discriminants [81] at each node. You and Fu [379] used a linear discriminant at each node in the decision tree, computing the hyperplane coefficients using the Fletcher-Powell descent method [104]. Their method requires that the best set of features at each node be prespecified by a human. Friedman [110] reported that applying Fisher's linear discriminants, instead of atomic features, at some internal nodes was useful in building better trees. Qing-Yun and Fu [289] also describe a method to build linear discriminant trees. Their method uses multivariate stepwise regression to optimize the structure of the decision tree as well as to choose subsets of features to be used in the linear discriminants. More recently, use of linear discriminants at each node is considered by Loh and Vanichsetakul [216]. Unlike in [379], the variables at each stage are appropriately chosen in [216] according to the data and the type of splits desired. Other features of the tree building algorithm in [216] are: (1) it yields trees with univariate, linear combination or linear combination of polar coordinate splits, and (2) allows both ordered and unordered variables in the same linear split. Use of linear discriminants in a decision tree is considered in the remote sensing literature in [158]. A method for building linear discriminant classification trees, in which the user can decide at each node what classes need to be split, is described in [350]. John [166] recently considered linear discriminant trees in the machine learning literature.
An extension of linear discriminants are linear machines [271], which are linear structures that can discriminate between multiple classes. In the machine learning literature, Utgoff et al. explored decision trees that used linear machines at internal nodes [32,79].