Thai language text presents unique challenges for
integration into large-scale multi-language statistical
machine translation (SMT) systems, largely
stemming from the nominal lack of punctuation and
inter-word space. We review our independent solutions
for Thai character sequence normalization, tokenization,
typed-entity identification, sentencebreaking,
and text re-spacing. We describe a general
maximum entropy-based classifier for sentence
breaking, whose algorithm can be easily extended
to other languages such as Arabic. After integration
of all components, we obtain a final translation
BLEU score of 0.19 for English to Thai and 0.21
for Thai to English.