Decision Tree Learning
Decision tree learning splits entire sample space recursively into smaller sub-sample space which is enough to be formulated by a simple model [12]. The root node (first node) in the tree holds entire sample space. Splitting sample space into smaller sub-sample space means forking root node into children nodes where each child node may be recursively split into leaf nodes (a node on which further split is not possible). The Nodes except leaf node in the tree, split sample space based on a set of condition(s) of the input attributes values and the leaf node assign an output value for those input attributes which are on the path from root to the leaf in the tree. The ultimate goal of sub-sample using decision tree method is to mitigate mixing of different outputs values and assign single output value for subsamples space. The splitting criteria of a node are an impurity measure (e.g. standard deviation used in ID3 algorithm; Gini-Index used in C4.5 algorithm) and Node size (number of data present on a nodde). There are many
algorithms to build decision tree are: CART [13], M5 [12], and M5-Prime [14]. All these algorithms are similar in tree generation procedure, but they differ in following aspects: first the impurity measure such as M5 uses standard deviation and CART uses variance. Second is prune rule used to avoid over-fitting of a model. Third is the leaf value assignment. M5 apply linear model at leaf nodes instead of constant value [12]. Furthermore, M5 is simple, smooth and more accurate than CART algorithm [15]. M5-Prime is subsequent version of M5 dealing with missing values and enumerated attributes [14].