The system has the capability to understand written
ideas by analyzing them with different algorithms. An
information extraction algorithm (IE), based upon the
bag-of-words model1
, representing the idea in a predefined
number of valued words, was integrated to
represent the NLP functionality an AI should possess
(see Figure 2, S2). This IE starts with a word count
algorithm, splitting the text into single words and
counting how often a word appears in the text. To filter
the discriminative power of the words, a stop word
removal algorithm removes all stop words based upon
the “Full-Text Stopwords” list developed by MySQL
for the English language [24]. As the word count
algorithm cannot see the similarity between different
lexemes e.g. plural and tense, a stemming algorithm
was implemented.