Though Hough transform has already been used by the research community for extracting the standard shapes within an image, its immense potential in line and word segmentations are utilised in the current work. The computational complexity is reduced by properly choosing
the direction of segmentation in the document image simply by tuning the Hough parameters. When applied over a large variety of document image dataset, during the process of line segmentation the current work has successfully segmented 88% lines, over segmented 10% lines and under segmented 2% lines. Though the rate of under segmentation is low but there is considerable
amount of over segmentation of line mainly because of the large and non-uniform separation of some of the words in handwritten document images. During word segmentation process the current method successfully segmented 85.7% words, over segmented 12.1% words and under segmented 2.2% words. Here also the nonuniformity of the inter-character spacing in a word makes some of the words over segmented in case of handwritten document images. It is observed that in some of the cases of handwritten document images the inter-character spacing is than the inter-word spacing. The algorithm fails to segment properly in case of very closely spaced lines. Some technique must be used as a post processing step to isolate touching text lines in the Hough image. The binarization may also play a crutial role in the process of segmentation. A good binarization technique may eliminate some of the document image segmentation problems. In case of BCR images more emphasis is given in word segmentation rather than line segmentation. The current technique has successfully segmented 94.6% words, over segmented 4.4% words and under segmented 1% words. The result appears to be very encouraging in automatic processing of business cards for database entry as it is providing higher accuracy even if there is huge variation of printed texts within the business cards. In case of LPR images according to the preimposed condition the intention was to segment the license plates of the vehicles from the surveillance camera image, the efficiency is calculated in terms of finding the license plate in the image as a text segment. The current technique successfully localizes 85.5% true license plates only. In 10% cases it localizes other text regions along with the license plate characters and in 4.5% cases it localizes texts other than the license plate characters. If the preimposed condition of finding the license plate only from the image is removed and the objective is to find any text within an
image then the efficiency of the proposed technique reaches 88%. As the proposed technique is mainly for segmentation of texts from the images, this result appears to be quite satisfactory as the general text segments appearing along with the license plate segment can be easily
removed from further consideration by incorporating a post processing module in the LPR system.
All the aforementioned results show that the proposed technique can be efficiently utilized in case of various domains of image segmentation which will be subsequently used in various domain specific OCR systems.