SMOTE is a very popular method for generating synthetic samples that can potentially diminish the class-imbalance problem. We applied SMOTE to high-dimensional class-imbalanced data (both simulated and real) and used also some theoretical results to explain the behavior of SMOTE. The main findings of our analysis are:
• in the low-dimensional setting SMOTE is efficient in reducing the class-imbalance problem for most classifiers;
• SMOTE has hardly any effect on most classifiers trained on high-dimensional data;
• when data are high-dimensional SMOTE is beneficial for k-NN classifiers if variable selection is performed before SMOTE;
• SMOTE is not beneficial for discriminant analysis classifiers even in the low-dimensional setting;
• undersampling or, for some classifiers, cut-off adjustment are preferable to SMOTE for high-dimensional class-prediction tasks.
Even though SMOTE performs well on low-dimensional data it is not effective in the high-dimensional setting for the classifiers considered in this paper, especially in the situations where signal-to-noise ratio in the data is small.