For this step, the collected data were prepared in
tables in a format that it is suitable for the used
data mining system. The data are cleansed by
removing the various inconsistent values using
the same standard value for all the data. The
cleaning also includes filling out the missing
values using the most majority data approach.
Since the collected attributes may have some
irrelevant attributes that may degrade the
performance of the classification model, a
feature selection approach is used to select the
most appropriate set of features. For this
purpose the WEKA toolkit is used and the
attributes are ranked and then 3 attributes are
eliminated by the feature selection approach.
Finally, the most significant attributes list
contains the following attributes presented in
descending order according to their ranks:
HSGrade, Fund, TDept, TDegree, HKind, Study-Type,
T-Gender, St-Depart, St-Gender.