Pre-processing
Text pre-processing is a method in natural language processing to make the computer understand the structure or content of the text. It will allow us to make the text more readable and easy to use for later process. Text preprocessing involves processes such as stopwords removal, stemming, lemmatization and POS tagging. In this work, stopwords removal is applied to the question in order to make the text more readable for later process. Following this, each word will then be tagged using a tagger. In this research, NLTK tagger (Bird et al., 2009) is used to tag the exam questions. To illustrate the tagging process, consider the following sentence: “Outline how class ArrayList could be implemented using an array.”, The tagged output is: Outline/VB how/WRB class/NN ArrayList/NN could/MD be/VB implemented/VBN using/VBG an/DT array/NN./. The tagger will help to identify important nouns and verbs, which may be important in determining the question’s category. In addition, the sentence pattern may assist in the correct identification of the question’s category. After tagging, some rules will be applied according to question's structure. 6