majority of software modules are represented with non-faulty labels and the rest
are marked with faulty labels during the modeling phase. These kinds of datasets
are called imbalanced / unbalanced / skewed, and different performance metrics
exist to evaluate the performance of fault prediction techniques that are built on
these imbalanced datasets. The majority of these metrics are calculated by using a
confusion matrix, which will be explained in later sections. Furthermore, ROC
curves are very popular for performance evaluation. The ROC curve plots the
probability of a false alarm (PF) on the x-axis and the probability of detection
(PD) on the y-axis. The ROC curve was first used in signal detection theory to
evaluate how well a receiver distinguishes a signal from noise, and it is still used
in medical diagnostic tests [45].
In this study, we investigate 85 software fault prediction papers based on their
performance evaluation metrics. In this paper, these metrics are briefly outlined
and the current trend is reflected. We included papers in our review if the paper
describes research on software fault prediction and software quality prediction.
We excluded position papers that do not include experimental results. The
inclusion of papers was based on the degree of similarity of the study with our
fault prediction research topic. The exclusion did not take into account the
publication year of the paper or methods used. We categorized metrics into two
main groups: the first group of metrics are used to evaluate the performance of the
prediction system, which classifies the module into faulty or non-faulty class; the
second group of metrics are used to evaluate the performance of the system, which
predicts the number of faults in each module of the next release of a system.
Therefore, researchers can choose a metric from one of these groups according to
their research objectives. The first group of metrics are calculated by using a
confusion matrix. These metrics were identified through our literature review and
this set may not be a complete review of all the metrics. However, we hope that
this paper will cover the major metrics applied frequently in software fault
prediction studies. This paper is organized as follows: Section 2 describes the
software fault prediction research area. Section 3 explains the performance
metrics. Section 4 presents the conclusions and suggestions.