However, our dataset has two typical problems that normally
appear in these types of educational data. On the one hand,
our data set has high dimensionality; that is, the number of
attributes or features becomes very large. Further, given a large
number of attributes, some will usually not be meaningful for
classification and it is likely that some attributes are correlated.
On the other hand, the data are imbalanced, that is the majority
of students (610) passed and minority (60) failed.