Data mining can be defined as the non trivial process of identifying valid, novel, potentially useful, ultimately understandable patterns in data
Even though the modeling phase is the core of the process, the quality of the results relies heavily on data preparation which usually takes around 80% of the total time. An interesting method for data preparation is to discretize the input variables Discretization of continuous attributes plays an important role in knowledge discovery. Many algorithms related to
data mining require the training examples that contain only discrete values, and the rules generated by classification algorithms with discrete values are normally shorter and more understandable. Suitable discretization is useful to increase the generalization and accuracy of discovered knowledge. Discretization is the process of dividing the range of the continuous attribute into intervals. Every interval is labeled a discrete value, and then the original data will be mapped to the discrete values. Discretization of the continuous attributes is an important preprocessing approach for data mining and machine learning algorithm. An effective discretization method not only can reduce the demand of system memory and improve the efficiency of data mining and machine learning algorithm, but also make the knowledge extracted from the discretized dataset more compact, easy to be understand and used. Research shows that picking the best split points is a NP-complete problem. The result of discrimination is related not only with the discretization algorithm itself but also with the data distribution and the number of split points. When the same discretization algorithm is applied to different dataset, we may get different result. We can only know the effectiveness of the discretization method by the result of post processing. So whether the discretization method is good or not is also related with the induction algorithm adopted later.