1.1 Rule-Based and Example-Based Approaches
The rule-based translation mostly consists of (1) a process of analyzing input sentences of a source
language morphologically, syntactically and/or semantically and (2) a process of generating output
sentences of a target language based on an internal structure or interlingua. Each process is controlled by
the dictionary and the rules. Meanwhile, the basic idea of example-based method is to translate a sentence
by using translation examples of similar sentences [1]. The primary steps of example-based method are 1)
collect examples in a database, 2) given an input, retrieve similar examples from the database, and 3)
adapt the results of the similar examples to the current input and obtain the output.
Utsuro et al. [2] propose an example retrieval method for avoiding full retrieval of examples. The
proposed method generates retrieval queries from similarities, retrieving examples through the tree
structure of a thesaurus and then using binary search along subsumption ordering of retrieval queries.
Cranias et al. [I] introduce a matching method that measures similarity according to both surface structure
and content. Another contribution involves the use of a clustering procedure to make the best matchingexample from the database. This method relies on the segmentation of sentences into coherent segments
and their alignment at the sub-sentential level.
1.2 The hybrid translation method
Many researchers apply both the rule-based and example-based methods as their own hybrid methods.
Shirai et al.[3] propose a new hybrid translation method that combines a rule-based with an example-based
method. An outline of the hybrid algorithm is: 1) find candidate sentences which are similar to the input
sentence, 2) select the template: (a) rank the candidates by similarity to the input sentence (b) cluster the
translations of the candidate sentences (c) select the highest ranked pair of the best cluster, 3) translate
input sentence by analogy to a selected template 4) output the adjusted sentence. For each difference, find
it and translate using the rule-based modules.
They point out that this hybrid system is a method selects the strongest features of rule-based and
example-based, while avoiding their weaknesses. The strengths of the rule-based method are that the
information can be obtained through introspection and analysis, while those of example-based are that
correspondences can be found from raw data. The weakness of the rule-based method is that the accuracy
of entire process is the product of the accuracy of each sub-stage. The weakness of the example-based
method is the difficulty in finding appropriate examples.
They also conclude that a useful example-based system should be able to accept loosely aligned corpora,
not those aligned at low levels. Their prototype. Japanese-English system tested by translating with a
corpus of 5,000 sentences, can use loosely aligned texts. It allows users to take advantage of any aligned
text they have by adding it to the set of sentences searched by the system.
Although these combined methods work successfully to a certain extent, it can not be applied directly to
the English to Thai sentence translation. Compared with Japanese and English, both of them have sentence
markers and each word in a sentence is segmented by a pause or space between the words. Therefore, the
task of sentence alignment can be performed efficiently. The translation process based on the principle of
analogy with a large corpora of parallel texts as database is quite successful and the resulting translations
are reliable. In contrast, one lingering linguistic problem of Thai is word segmentation, due to its run-on
sentences that have no boundary marker. It, therefore, causes the difficulty in sentence alignment. As a
result we have not a large enough volume of aligned sentence corpus as raw data for example-based