The data preprocessing phase was performed using a reduced log file, which was “cleaned” by removing
all useless, irregular, and missing data from the original LMS common log file. After the initial
preprocessing, a session filter was applied to the reduced log file for feature extractions. The purpose of
the filter was to aggregate all user requests within a session into a single set of variables. For example, a
session typically started when a student logged in to the LMS, and ended when the student pressed the
exit button. However, sometimes the session would be terminated when a student accidentally closed
the web browser, or a student might stay idle with the course website open yet doing nothing. In the
latter instance, the LMS would terminate the session automatically after 20 minutes of inactivity. In these
cases, the related raw data stored were removed from the database to reflect only normal learning
events of the students as per the purposes of this study. Feature extractions filtered out the following
primary variables: user identifier, session identifier, session start date and time, session end date and
time, user’s hit count, and session duration in minutes (Mor et al., 2006). Derived variables (duration and
frequency of data of each student) were extracted through calculating or accumulating primary variable
data on a daily and weekly basis. These variables were transformed into fields, assigned with proper
data attributes, and stored in the database management system. All data fields were organized with
tables and formed a rational database. Table 1 shows a partial list of primary and derived variables from
server logs.
Table 1. An Example of Primary and Derived Variables
Variable
Name Description
ID User ID
LoginFre Total frequency of LMS logins
LastLog When was the last time logged into LMS
ClassFre Total frequency of accessing course materials
A user is defined as a single
individual that accesses files from web servers through a
browser. A web log sequentially records users’ activities
according to the time each occurred. In order to study the
actual user behavior, users in the log must be distinguished.
Figure 1 is a sample Web site where nodes are pages, edges
are hyperlinks, and node “/index.php” is the entry page of this
site. The edges are bi-directional because users can easily use
the back button on the browser to return to the previous page.