Mathematical Programming:
Linear programming has been used for building adaptive classifiers since late 1960s [156]. Given two possibly interesecting sets of points, Duda and Hart [81] proposed a linear programming formulation for finding the split whose distance from the misclassified points is minimized. More recently, Mangasarian and Bennett used linear and quadratic programming techniques to build machine learning systems in general and decision trees in particular [228,21,19,226,20]. Use of zero-one integer programming for designing vector quantizers can be found in [213]. Brown and Pittard [34] also employed linear programming for finding optimal multivariate splits at classification tree nodes. Almost all the above papers attempt to minimize the distance of the misclassified points from the decision boundary. In that sense, these methods are more similar to perceptron training methods [244], than to decision tree splitting criteria. Mangasarian [227] describes a linear programming formulation to minimize the number of misclassified points instead of the geometric distance.