3.2 Video OCR for Lecture Videos
Texts in the lecture slides are closely related to the lecture
content, can thus provide important information for the
retrieval task. In our framework, we developed a novel
video OCR system for gathering video text.
For text detection, we developed a new localizationverification
scheme. In the detection stage, an edge-based
multi-scale text detector is used to quickly localize candidate
text regions with a low rejection rate. For the subsequent
text area verification, an image entropy-based
adaptive refinement algorithm not only serves to reject false
positives that expose low edge density, but also further
splits the most text- and non-text-regions into separate
blocks. Then Stroke Width Transform (SWT) [22]-based verification
procedures are applied to remove the non-text
blocks. Since the SWT verifier is not able to correctly identify
special non-text patterns such as sphere, windowblocks,
garden fence, we adopted an additional SVM classifier
to sort out these non-text patterns in order to further
improve the detection accuracy. For text segmentation and
recognition, we developed a novel binarization approach,
in which we utilize image skeleton and edge maps to