D. Outlier Detection
Outlier detection discovers data points that are
significantly different than the rest of the data [8]. In educational data mining outlier analysis can be used to detect students with learning problems [9]. In this paper, we used outlier analysis to detect outliers in the student data. Two outlier methods are used which are Distance-based Approach and Density-Based Approach.
1) Distance-based Approach
It Identifies the number of outliers in the given data set based on the distance to their k nearest neighbors, and the result of applying this method is to flag the records either to be outlier
or not, with true or false [6].
Figure 8 depicts a graphical representation of the outliers distribution with the red color after applying Single Value Decomposition method, which reduces the number of attributes to two in order to easily plot the outliers. The system detected 10 outliers in our data, by studying and checking some of the 10 outlier points, we found that outliers are not errors but rather it represents a rare event,
for example some of the outlier points are students with excellent degree in the matriculation GPA and also with excellent degree in the college which are different than the rest
of the students in The graduate students data set and they are few students.
2) Density-Based Approach
It Computes local densities of particular regions and
declare instances in low density regions as potential outliers.
The method used is Local Outlier Factor (LOF), the Basic idea of LOF is to compare the local density of a point with the densities of its neighbors, and the result of applying this
method is to flag the records with a percentage of outlier. The larger score means larger possibility of being outlier [6].
Figure 9 depicts a graphical representation of the outliers after applying Single Value Decomposition method. Figure 9: Outliers (LOF) distribution plot with SVD applied.
For each case, the college management can look at the outlier behavior of the student and try to find and understand why the irregularity happened and then analyzed for knowing
the cause.