Conclusion
The class imbalanced problem has got more attentions among data miners. There are
many techniques for handling such problem. However, traditional data mining techniques
are still unsatisfactory. We present an efficient technique called Safe-Level-
SMOTE to handle this class imbalanced problem.
The experiments show that the performance of Safe-Level-SMOTE evaluated by
precision and F-value are better than that of SMOTE and Borderline-SMOTE when
decision trees C4.5 are applied as classifiers. This comes from the fact that Safe-
Level-SMOTE carefully over-samples a dataset. Each synthetic instance is generated
in safe position by considering the safe level ratio of instances. In contrast, SMOTE
and Borderline-SMOTE may generate synthetic instances in unsuitable locations,
such as overlapping regions and noise regions. We can conclude that synthetic instances
generated in safe positions can improve prediction performance of classifiers
on the minority class.
Although the experimental results have provided evidence that Safe-Level-SMOTE
can be successful classified numeric datasets in the class imbalanced problem, there are
several future works left to be studied in this line of research. First, different definitions
to assign safe level would be valuable. Second, additional methods to classify datasets
which have nominal attributes are useful. Third, automatic determination of the amount
of synthetic instances generated by Safe-Level-SMOTE should be addressed.