issues and requirements of the tools selected for
the preprocessing phase.
At this stage after consulting with
the domain expert a few transformations were
implemented on the dataset to make the data more
suitable for the data mining algorithms.
The other data transformation like attribute
selection was necessary to reduce the number of
features a classification algorithm has to examine
and reduce errors from irrelevant features. I have
used best first search method to select the best
attributes from 15 attributes that were available.
In the next step I have selected appropriate
data mining technique for developing a predictive
model. After thoroughly checking the available
algorithms in Weka machine learning software
the algorithms Decision Tree, Neural Network and
Bayesian Classifier were selected for this study.
To employ the selected classification
algorithms four experiments were designed and
the experiments were conducted on a full training
dataset containing 7,339 instances. In all of the
experiments two scenarios were considered, one
containing all 15 attributes and the other only 8
selected attributes. 10-Fold Cross Validation was
adopted for randomly sampling the training and
test data sets. The Weka 3.6.4 machine learning
software was used for these purposes.
All the models built were evaluated to see
how they fulfill data mining goals. Algorithms were
evaluated on the basis of classification accuracy,
area under the ROC curve and confusion matrix
table.
Experimentation
Keeping in view the goal of this study
to predict heart disease using classification
techniques, I have used three different supervised
machine learning algorithms i.e., Decision Tree
Classification, Bayesian Classifier and Neural
Network.
Four experiments were conducted for this
study and for all experiments two situations were
considered, one containing all the 15 attributes and