where ri is the height-width ratio for each bounding box and
n is the total number of bounding boxes in the document.
The principle idea behind the using of height-width ratio
is based on the phenomenon that the degraded text are
smeared and tend to merge with its horizontal neighboring
text which cause lower height-width ratio.
Fig. 3 illustrates the examples of two types of features
where Fig. 3(a) shows the procedure of calculating the gradient
feature for an edge pixel and Fig. 3(b) shows the change
of height-width ratio from a high quality text to low quality
text.
3. REGRESSION AND CLASSIFICATION
To evaluate the quality of the document images according
to their features, we propose an OCR based method which
predicts the N-WER (Normalized Word Error Rate which is
defined by Equation 3) of each document image where high
WER indicates the low image quality. Assuming that there
are n high quality document images and a total number of
m degraded document images which are generated from high