3.1.2 Data Mining and Genetic Algorithm
Genetic algorithm have been used in [4], to reduce the actual data
size to get the optimal subset of attributed sufficient for heart
disease prediction. Classification is a supervised learning method
to extract models describing important data classes or to predict
future trends. Three classifiers e.g. Decision Tree, Naïve Bayes
and Classification via clustering have been used to diagnose the
presence of heart disease in patients. Classification via
clustering: Clustering is the process of grouping similar elements.
This technique may be used as a preprocessing step before feeding
the data to the classifying model. The attribute values need to be
normalized before clustering to avoid high value attributes
dominating the low value attributes. Further, classification is
performed based on clustering.
Experiments were conducted with Weka 3.6.0 tool. Data set of
909 records with 13 attributes. All attributes are made categorical
and inconsistencies are resolved for simplicity. To enhance the
prediction of classifiers, genetic search is incorporated.
Observations exhibit that the Decision Tree data mining technique
outperforms other two data mining techniques after incorporating
feature subset selection but with high model construction time.
Naïve Bayes performs consistently before and after reduction of
attributes with the same model construction time. Classification
via clustering performs poor compared to other two methods.
Table 4 shows the accuracy of the algorithm obtained from
experiment.