where n is the number of testing samples, i and 0
i are true
N-WER and predicted N-WER for each testing sample. The
MSE obtained for our system was 6.5%.
To evaluate the classification capability of the proposed
system, we firstly divided the testing data set to two parts:
high quality (positive) part and low quality (negative) part. If
the actual N-WER of the testing sample was bigger than a predefined
threshold T, it was labeled as a low quality (negative)
image. Otherwise, it was labeled as a high quality (positive)
image. During the classification, a test image was classified
as positive or negative according to its predict N-WER and
a threshold T0 which traversed from 0 to 1. Fig 4 illustrated
ROC curves of the classification results where the pre-defined
threshold T for testing sets were 0.1, 0.3, 0.5 and 0.7 respectively.
It can be observed that the proposed system achieved
better classification performance when threshold T was set to
be 0.3 and 0.5 for the testing data set.
In Table 3, we showed the equal error rates (EER) and
the corresponding optimal threshold T0 which was based on
the predicted N-WER for different classification tests. From
Table 3, we observed that the system had the best performance
and the optimal threshold T0 was consistent with the
pre-defined threshold when T = 0:3. By analyzing the distribution
of the N-WER, we can see that threshold T = 0:3
best describes the difference between good and bad OCR results
of the data sets, even though some document images
have “good” quality visually but have bad OCR and document
analysis capability.