5. Supervised Discretization Methods
Supervised discretization methods make use of the class label when partitioning the continuous features. Among the supervised discretization methods there are the simple ones like Entropy-based discretization, Interval Merging and Splitting using χ2 Analysis [10].
5.1. Entropy Based Discretization Method
One of the supervised discretization methods, introduced by Fayyad and Irani, is called the entropy- based discretization. An entropy-based method will use the class information entropy of candidate partitions to select boundaries for discretization. Class information entropy is a measure of purity and it measures the amount of information which would be needed to specify to which class an instance belongs. It considers one big interval containing all known values of a feature and then recursively partitions this interval into smaller subintervals until some stopping criterion, for example Minimum Description Length (MDL) Principle or an optimal number of intervals is achieved thus creating multiple intervals of feature. In information theory, the entropy function for a given set S, or the expected information needed to classify a data instance in S, Info(S) is calculated as Info(S) = - Σ pi log2 (pi) Where pi is the probability of class i and is estimated as Ci/S, Ci being the total number of data instances that is of class i. A log function to the base 2 is used because the information is encoded in bits. The entropy value is bounded from below by 0, when the model has no uncertainty at all, i.e. all data instances in S belong to one of the class pi =1, and other classes contain 0 instances pj =0, i≠j. And it is bounded from the top by log2 m, where m is the number of classes in S, i.e. data instances are uniformly distributed across k classes such that pi=1/m for all. Based on this entropy measure, J. Ross Quinlan developed an algorithm called Iterative Dichotomiser 3 (ID3) to induce best split point in decision trees. ID3 employs a greedy search to find potential split-points within the existing range of continuous values using the following formula: