Database Management System (DBMS) has been evolving
constantly. Today, programming languages are being integrated
into database systems to help professional programmers develop
software quickly to meet deadlines. Therefore, the design of a
database must cater to both the needs of customers and the
efficiency of database processes.
In this paper, a database application, novelty detection, is used
to detect new documents for readers who do not want redundant
documents to be read again. This application needs a database to
store history and current documents. The objective of this
research is to optimize the database tables for up to 10 million
records. The experiments are done on both sentence level and
document level. In both levels, the investigation of data
optimization and the use of proper indexing are conducted. In
MYSQL, the MYSQL B-Tree index is used to speed up data
selection. In addition, the use of EXPLAIN enables us to properly
index the correct data column and to avoid redundant indexing.
Optimizing data types are also investigated to ensure no extra
work is done by MYSQL in selecting data. A technique known as
batching is also introduced to speed up results insertion after
novelty detection has been done. Overall, the combined
optimization improved the speed by up to 90%. Therefore, we
have successfully optimized the database for novelty detection,
and the techniques have been integrated into a real-time novelty
detection application.