3.1.2.1.1. Semantic inference. The semantic inference exploits the
stemming mechanism and ontology to explain and to represent
the data of query sentences, as shown in Fig. 2.
Preprocess:
Preprocess translates the various query sentences to form a vector
space model. In the training stage, the query sentences are
based on the selected questions because a set of documents can
be presented by a word-by-document matrix P, as shown in Eq.
(1). Moreover, each query sentence corresponds to one selected
question (i.e., class). Let W be the number of occurrences of all
the words in the user questions, and Q be the number of occurrences
of all the collected query sentences.
P ¼ fpwqjw 2W; q 2 Qg;
where pwa is the frequency of the word win the query sentence q:
ð1Þ
Ontology:
This study considers the synonyms and homophones of a word
setW and extends the word set to generate the relevant term set T
based on the ontology shown in Fig. 3.
For synonym establishment, the WordNet library (Lo et al.,
2011) is used to determine the synonyms of words and to add
the relevant term set T (shown in Path (1) in Fig. 3).
For homophone establishment, the query sentences are spoken
by several users in a training session and are considered for the
generation of homophones for each word. For example, the phonetic
transcription of ‘‘擂茶” by Google Translator is ‘‘léi chá.”
When someone says ‘‘léi chá,” the homophones (e.g., ‘‘雷茶”, ‘‘類
茶”, and ‘‘累茶”) may appear through Google speech recognition.
Therefore, the homophones generated in the training stage will
be considered for inclusion in the relevant term set T (as shown
in Path (2) in Fig. 3).
The matrix O is generated in accordance with the relevant term
set T, as shown in (2). This matrix is used for answer inference.