Thai Romanization
Copyright (c) 2004
Version 1.3 - June 2009: add option for all capitalized
Version 1.26 - Mar 2009: fix some bugs on repeating ๆ, updated dictionary
Version 1.25 - May 2007: combine normal and unicode versions; can romanize a text file and output both Thai texts and romanized words.
Version 1.24 - Apr 2007: fix some mis-romanized words
Version 1.23 - Oct 2006: set ๆ to be the same as the previous word
Version 1.22 - July 2006: can minimize to the task bar
Version 1.21 - July 2006: fix bugs from the version 1.20
Version 1.20 - June 2006: fix bugs when Thai texts are mixed with English, update dictionary
Version 1.10 - Sep 2005: correct misspelling in training dictionary; add option to copy both Thai & Romanized texts
Version 1.09 - July 2005: add more training data
Version 1.08 - Nov 2004: fixing bugs in smoothing module
Version 1.07 - July 2004: errors caused from symbols ; % @ # are fixed
Version 1.06 - June 2004: errors caused from symbols ฯ ๆ are fixed
Version 1.05 - May 2004: users can specify Thai font; romanize all texts in a file; unicode version for non-Thai Windows system is released
Version 1.04 - Apr 2004: add an option to mark input as one word
Version 1.03 - Mar 2004: hyphenation and word separation are improved.
Version 1.02 - Feb 2004: show pronunciation in Thai; correct some mis-romanization
Version 1.01 - Jan 2004: first released of Thai romanization program
Introduction
==========
Version 1.01 : This program transcribes Thai words in according to the Royal Institute's guideline for the "transliteration of Thai characters into Latin characters" (1999). However, transcriptions of some special characters such as ฯ ๆ ฯลฯ, abbreviations, and numbers are not yet implemented in this version.
- The default option of the program is set for transcribing proper names. With this option, the first character of each word is capitalized. Deselect this option from the menu bar, if you want to transcribe normal texts.
- For words that do not follow the rules of romanization, users can add those words in the file "roman.except"
- The output of romanization depends on the recognition of word boundaries. In some cases, the output could be wrong because the program doesn't know that word. For example, "พุทธภาษิต" could be romanized as "phut phasit" if the program does not know that it is one word. To solve this problem, users can add new word in the file "user.dict". When the program recognizes it as one word, it will be romanized as "phutthaphasit".
- The program may take more than 15-30 seconds to start up, depending on the speed of the machine. Please be patient when starting the program.
In Version 1.02 : We have added an option "Show Pronunciation". This will show pronunciation of each syllable. This feature is already hidden in the romanization program because the romanization is done on the basis of transcription. So, the transcription is converted into Thai basic written form. However, to write the pronunciation in the form that is most familiar to all Thai users, we have to ignore the differences between short and long vowels in some syllables. For example, แม่น is pronounced with a short vowel while แมน is pronounced with a long vowel. To show this difference, the first one should be written as แม็่น. (Maikaikhu indicates the shortening of vowel) But this is not the normal way of writing in Thai. Thus, we will show the pronunciation as แม่น. Anyway, the lengths of vowels (and also the tones) do not affect the output of romanization because they are disregarded as stated in the Royal Institute's guideline.
- Users can use the option "Show Pronunciation" to verify whether the romanization is correct. If the pronunciation is not what it should be, then the romanization might be incorrect. Please report the mis-romanization to the author.
In Version 1.03 : We have adjusted the hyphenation and word separation to comply with the Royal Institute's guide line as much as possible. Most of the items should be romanized correctly by now. For proper names, the program will try to determine the components inside the name, and insert word separation between each component. For example, the name "ทวีวัฒนา" is composed of two words "ทวี" and "วัฒนา". The result of romanization then will be "Thawi Watthana". However, word separation could be wrong in some cases. For example, the name "คันนายาว" is romanized as "Khanna Yao" instead of "Khan Na Yao" as listed in the Royal Institute's documents. This is because "คันนา" is recognized as one word by the program. To avoid this problem, if you know that the word should be separated, please do so while entering the input word.
In Version 1.04 : More training data on proper names are added to the system. We also add an option to specify whether the input should be treated as one word. This should solve the problem of mispronunciation of proper names. If this option is checked, the word like "อรรถสาระ" will be viewed as one word and romanized as "atthasara". Otherwise, the system will analyze the input as composing of two known words "อรรถ" and "สาระ", and romanize it as "at sara".
In Version 1.05 : An option to romanize all Thai texts in a file is added. To do this, select "Roman-Text File" from the menu. The output will be saved in the same input file name but with an extension ".rom". This process takes time. You have to wait till the message box showing the output file name is dispalyed on the screen.
- An option to specify Thai font is also added in this version. If you are using the unicode supported version, you can change the font to be a Thai unciode one. But you must have Microsoft Form 2.0 object library (FM20.DLL), which is a part of Microsoft Office program installed in your computer.
In Version 1.06 : We have fixed errors caused by the occurrences of ๆ and ฯ in Thai texts. ฯ will be simply ignored while ๆ is substituted with the preceding romanized word. A few errors caused by typing errors in the pronunciation dictionary are also fixed.
In Version 1.07 : In previous versions, the program won't romanize Thai texts if they contain some symbols such as ; # @. This problem is fixed now. More training data are also added as usual.
In Version 1.10 : An option to copy both Thai and romanized texts are added. Some misspellings in training dictionary and corpus are corrected.
In Version 1.20 : Bugs when texts are mixed between Thai and English are fixed. English texts including html/xml tags are now left unchanged. Thai dictionary is also updated.
In Version 1.21 : The previous version is a major change. The program was reorganized. Consequently, Options that work in earlier versions do not work in version 1.20 Some words are also romanized incorrectly due to the program errors. These errors are not fixed.
In Version 1.22 : The program should be able to be minimized onto the task bar.
In Version 1.25 : The program can support both normal Thai character code (Windows-874) and Thai Unicode. When romanizing a text file, if "Option-Copy both Thai and Roman" is checked, the result file will show both Thai texts and romanized words.
NOTES
=======
Since the program version 1.07 is quite reliable now, we will not inform users of the updated version by email. If you encounter any problems, please check for the latest version from the website (http://www.arts.chula.ac.th/~tts/ or http://pioneer.chula.ac.th/~awirote/). If installing the latest version couldn't solve your problems, please report them to awirote@chula.ac.th
Bernd Nebel reported his problem and solution when using the unicode version. In short, if the program cannot romanize Thai texts, for windows XP system you should go to "control panel", click on "options for language and region" and choose "Thai" for the language-version of the programs which does _not_ support unicode.
Installation
=========
This program requires 7 Mbytes of disk space. If you're going to install over the older version, please make sure that you have backed up your old "user.dict", if you've modified it.
Thai Romanization License Agreement:
==================================
Copyright (C) 2004 Wirote Aroonmanakun. All rights reserved.
This program is free software. It is provided "as is", without any warranty. It may contain bugs. Use of this tool is at your own risk. We take no responsibility for any damage that may unintentionally be caused through its use.
Permission is granted to anyone to use this software for educational and personal uses provided that all redistributions must retain all occurrences of the above copyright notice.
If you want to embbed Thai romanization module in your application, or set up a romanization service of your own, please contact Chulalongkorn University Intellectual Propoerty Institute for obtaining a license. (http://www.ipi.chula.ac.th/)
Reporting Problems
=================
- Since the purpose of this program is to promote the standard of Thai romanization, proposed by the Royal Institute, please report any mis-romanization to us for further improvement.
If you encounter problems, please visit http://www.arts.chula.ac.th/~ling/tts/ and download the latest version to see if the issue has been resolved.
If not, please send a bug report to:
awirote@chula.ac.th
Acknowledgements
=================
Project Leader :
Asst.Prof.Wirote Aroonmanakun
Dept.of Linguistics, Faculty of Arts
Chulalongkorn University
Research Assistants :
Ms.Amornthip Kawinpanithan
Ms.Wanwara Chairoek
Mr.Taneth Ruangrajitpakorn
Mentors :
Professor Theraphan Luangthongkum Dept. of Linguistics, Faculty of Arts, Chulalongkorn University.
Assoc.Prof. Wanchai Rivepiboon Dept of Computer Engineering, Faculty of Engineering, Chulalongkorn University.
The project is supported by a grant from the Thailand Research Fund and the Commission on Higher Education (2003-2004)
MRG4680160