Off-line Handwritten Thai Name Recognition for
Student Identification in an Automated Assessment
System
Hemmaphan Suwanwiwat, Vu Nguyen, and
Michael Blumenstein
School of Information and Communication Technology,
Griffith University, Australia
Umapada Pal
Computer Vision and Pattern Recognition Unit
Indian Statistical Institute
India
Abstract—In the field of pattern recognition, off-line
handwriting recognition is one of the most intensive areas of
study. This paper proposes an automatic off-line Thai language
student name identification system which was built as a part of a
completed off-line automated assessment system. There is limited
work undertaken in developing off-line automatic assessment
systems using handwriting recognition. To the authors’
knowledge, none of the work on the proposed system has been
performed on the Thai language. In addition the proposed system
recognises each Thai name by using an approach for whole word
recognition, which is different from the work found in the
literature as most perform character-based recognition. In this
proposed system, the Gaussian Grid Feature (GGF) and the
Modified Direction Feature (MDF) extraction techniques are
investigated on upper and lower contours, loops from full word
contour images of each name sample, and artificial neural
networks and support vector machine are used as classifiers. The
encouraging recognition rates for both feature extraction
techniques were achieved when applied on loop, upper and lower
contour images (99.27% accuracy rate was achieved using MDF
on artificial neural networks and 99.27% using GGF with a
support vector machine classifier).
Keywords—off-line handwriting recognition; automated
assessment system; student identification system; modified
directional feature; Gaussian grid feature
I. INTRODUCTION
Handwriting recognition is divided into 2 types, which are
off-line and on-line handwriting recognition. Off-line
handwriting recognition is one of the most difficult and
challenging tasks in pattern recognition. It performs
recognition of written documents by using a scanner. The hard
copy document is commonly transformed into a binary pattern
[1] which allows the recognition system to process the
binarised handwritten image.
Off-line handwriting recognition is considered more
difficult than its on-line counterpart as it cannot capture the
written information whilst performing the writing as is the case
in on-line handwriting recognition [2]. Nevertheless, many
applications benefit from off-line handwriting recognition
techniques, for example, postal address interpretation,
signature verification, and bank cheque verification. However,
there is only a small amount of research focusing on off-line
assessment systems [3], [4], [5], [6]. To the best of the authors’
knowledge, there is no off-line Thai language automated
assessment system proposed in the literature.
Manual assessment of handwritten examinations is a
complex task; it requires the marker’s attentiveness,
correctness and it is time consuming. An important part of the
examination paper, besides the exam questions and answers
themselves, are student name components, which are the name
and last name, and student number. Commonly for manual
assessment, when the marking of examination papers is
concluded, the marker has to rewrite each student’s mark into a
report marking sheet. One problem of transcribing the mark of
each student is that it could be error prone as the assessor may
mistakenly ascribe the examination mark against the wrong
student name.
This paper proposes a sub-system of an Off-line Automatic
Assessment System (OFLAAS) called Student Identification
System (SIS) similar to previous work the authors have
proposed earlier [6]. However, in the present work, the
experiments were performed on the Thai language to recognise
student name components. To the authors’ knowledge, there
has not been much work done on Thai whole word recognition,
and especially not on writer identification as per the work that
is proposed here. This is due to the nature of the Thai language
(please refer to sub-section II-B). Features used in the
proposed system are different from the previous work as in the
present work, the features were extracted from upper, lower,
and loop images rather than full contour images.
Also in this proposed system, Artificial Neural Networks
(ANNs) and Support Vector Machines (SVMs) were used as
classifiers to compare the recognition rates on both techniques,
rather than only those applied to ANNs as in the previous
work. It must be noted that the system proposed here has no
intention to verify student identities, only to identify them. The
SIS with the ability to verify students may be proposed in
future work.
It should be noted that the student number has not been
used in this research as a student identification and verification
system could be developed on student name components in the
future. That is, the future system would be able to verify if the
person who sat an examination is the same person who owns
the name recognised using the student handwritten name and
the last name. Also as normally students do not sign
examination papers, this research proposes the use of name
components in developing the SIS but not the student signature
or number.
For the proposed system, once the marking process (main
process of OFLAAS) is completed, the report on each student’s
mark is automatically produced. Having such a system would
reduce the chance of mismatching between the student’s names
and their marks. Another advantage of having the proposed
system is that the list of the students who are absent from the
examination can be produced automatically.
As stated above, the proposed system intends to identify
students from their handwritten name and last names but not
verifying if the written names have been forged. This research
investigates and compares the performance of the Modified
Direction Feature (MDF) and Gaussian Grid Feature (GGF)
extraction technique. Since the proposed system is used to
recognise Thai words, which have different characteristics
compared to English, different input images rather than using
only boundary images, were employed with ANNs and SVMs
to obtain the highest recognition rates possible.
As there is no suitable database of Thai name components
available, a new database was created to be used for
experimentation of the proposed system. The database used in
the present research consists of 2,060 handwritten name
components from a total number of 103 writers.
The remainder of this paper is organised into three sections.
Section II describes the methodology employed in this
research. Section III details the results obtained and puts
forward a discussion and analysis. Finally, conclusions are
drawn in Section IV and future work is also described.
II. METHODOLOGY
This section discusses the methodology and techniques
used in conducting the research. The topics in this section
include the proposed system (block diagrams), data collection,
Fig. 1a Block diagram illustrating a complete Off-Line Automatic Assessment
System (OFLAAS). Fig. 1b. A block diagram illustrating the proposed
Student Identification System (SIS)
nature of Thai language, proposed methodology, and the
experimental setup.
Significant research has been undertaken in the area of offline
character and handwriting recognition. Nevertheless, to the
authors’ knowledge, there has only been a limited amount of
work in the literature reporting the development and
investigation off-line automatic assessment systems. Also there
has not been any research undertaken for student identification
using Thai handwritten name components written on
examination papers.
The proposed methodology includes data collection, image
processing, effects of different input images to each feature
extraction technique, and the investigation of the MDF and
GGF techniques in conjunction with classifiers employing
different parameters in order to achieve the optimum results.
Classifiers used in conducting the proposed SIS are ANNs and
SVMs. Fig. 1a. illustrates a block diagram of a complete
OFLAAS which consists of a main component including a
short answer question automatic marking module and a SIS,
which is a sub-system of OFLAAS. The OFLAAS is used to
mark each student's examination paper, and is also used to
identify students from their name components. Once both
processes are completed, the full report containing a list of
students who attended the examination along with the marks
they achieved is produced.
The process of SIS begins with the data collection of the
students' name components. The scanning process is used to
transform raw data into digitised patterns. Binarisation and
preprocessing, including line and word segmentation, noise
removal, filling and skew correction are then applied to the
images.
The feature extraction techniques which were selected in
the proposed system are the MDF and GGF. The MDF and
GGF extraction techniques were chosen due to their ability to
successfully extract those important features from images,
which have enabled accurate recognition rates to be attained in
a number of applications [6], [7], [8]. After the feature vectors
are generated by employing each technique, the features are
then applied to the ANNs and SVMs for training, and testing
for the recognition/identification process. The SIS recognition
accuracy rate was evaluated once the results were obtained. A
proposed SIS block diagram can be found in Fig. 1b.
A. Data Collection
There is no publicly available dataset of Thai language
handwritten name components from examination papers; as a
result, a data collection process was performed to create a
custom dataset. The dataset collected for the proposed system
is the first database of its type in the Thai language.
In the research proposed here, the recognition of words was
based on one writer per name components. Although in some
cases, the