2.1. Sample
The sample contains 150021 policies observed during the period 2007-2009. We use 9
exogenous variables for every policy, as well as the total frequency of claims at fault that were
reported within the yearly period. Therefore, except the explained variable, the frequency of
claims, the other ones are considered risk factors that are known a priori by the insurer. In
comparison with similar empirical studies, we group the risk factors into three categories that
reflect the policyholder characteristics: age, occupation; the vehicle features: value, type,
category, use, GPS; the insurance policy characteristics: insurance policy duration, bonusmalus
coefficient. Table 1 summarizes the information available about each policyholder.