max
is the maximum value among all qir values
(maximum value within the rth column of the quanta
matrix), i=1,2,...,S
M+r is the total number of continuous values of
attribute F that are within the interval (dr-1, dr].
The algorithm starts with a single interval that covers
all possible values of a continuous attribute, and
divides it iteratively. From all possible division
points that are tried it chooses the division boundary
that gives the highest value of the CAIM criterion
When the algorithm was tested on several well- known datasets and compared with six other state-of the-art discretization algorithms, the comparison showed that the CAIM algorithm generated discretization schemes with, on average, the lowest number of intervals and the highest dependence between class labels and discrete intervals, thus outperforming other discretization algorithms. The execution time of the CAIM algorithm is also much shorter than the execution time of some other supervised discretization algorithms. The analysis of performance of the CAIM algorithm shows that the algorithm that generates small number of intervals helps to reduce the size of the data and improves the accuracy and the number of subsequently generated rules.