explaining the response variable in terms of the predictor variables. Sometimes the response variable is called the dependent variable and the predictor variables are called independent variables. The goal is to explain the dependent variable in terms of the independent variables. For example, we would like to predict the final result of a student in terms of the student’s course grades.
Techniques for supervised learning can be further subdivided into classification and regression depending on the type of response variable (categorical or numerical).
Classification techniques assume a categorical response variable and the goal is to classify instances based on the predictor variables. Consider, for example, Table 3.1. We would like to classify people into the class of smokers and the class of nonsmokers. Therefore, we select the categorical response variable smoker. Through classification we want to learn what the key differences between smokers and nonsmokers are. For instance, we could find that most smokers drink and die young. By applying classification to the second data set (Table 3.2) while using column result as a response variable, we could find the obvious fact that cum laude students have high grades. In Sect. 3.2, we will show how to construct a so-called decision tree using classification.