IV. SEMANTIC SIMILARITY BETWEEN SENTENCES
Sentences are made up of words, so it is reasonable to represent a sentence using the words in the sentence. Unlike classical methods that use a precompiled word list containing hundreds of thousands of words, our method dynamically forms the semantic vectors solely based on the compared sentences. Recent research achievements in semantic analysis are also adapted to derive an efficient semantic vector for a sentence. Given two sentences, T1 and T2, a joint word set is formed:
T = 21 TT
= {W1, W2… ,Wn}
The joint word set T contains all the distinct words from T1 and T2. Since inflectional morphology may cause a word to appear in a sentence with different forms that convey a specific meaning for a specific context, we use word form as it appears in the sentence. For example, boy and boys, woman