Tesseract contains relatively little linguistic
analysis. Whenever the word recognition module is
considering a new segmentation, the linguistic module
(mis-named the permuter) chooses the best available
word string in each of the following categories: Top
frequent word, Top dictionary word, Top numeric
word, Top UPPER case word, Top lower case word
(with optional initial upper), Top classifier choice