Fig. 2. Pseudo-code for Gradient Searching
Another feature being used in this paper is the average
height-width ratio of characters/words. The external contours
of the Canny edge are extracted and the bounding box of the
contours are computed. Edges whose length is shorter than
a threshold tl and longer than a threshold th are eliminated
as noises or rule lines prior to the contour extraction. Thus,
the resultant bounding boxes typically represent characters or
words in the document. The average character/word heightwidth
ratio of a document image is computed as: