3 Performance Evaluation Metrics
According to the experimental studies, a majority of software modules do not
cause faults in software systems, and faulty modules are up to 20% of all the
modules. If we divide modules into two different types, faulty and non-faulty, the
majority of modules will belong to the non-faulty class and the rest will be
members of the faulty class. Therefore, datasets used in software fault prediction
studies are imbalanced. Accuracy parameter cannot be used for the performance
evaluation of imbalanced datasets. For example, a trivial algorithm, which marks
every module as non-faulty, can have 90% accuracy if the percentage of faulty
modules is 10%. Therefore, researchers use different metrics for the validation of
software fault prediction models. In this section, the metrics identified during our
literature review will be briefly outlined.