6.2 Results
The mean of the number of claims declared by one insured and in 1 year is about 5.16 and the variance is 102.29. It is clear that the mean is very small compared to the variance. This indicates the overdispersion of the data. Generally, a Poisson regression model is not an appropriate model to fit to the data in such a case. Our objective is to fit Poisson regression and Zero-inflated Poisson regression to our data. Note that in ZIP regression model, we use the same covariate matrix for estimating k and h. The models specifications are indicated in Table 2. First we fit the Poisson regression to the number of claims. Table 3 indicates the estimates of parameter. We remark that all explanatory variables are significant because the associated p value to each factor is less than 5 %. The goodness test shows that the residual deviance for the model without covariates is very high (around 978967); and it fall down to 862317 when add the factor size of family. The minimum is obtained when all variables are added (see Table 4). The size_family, Industrial_city, Services_activity, status_married and Industry_activity have a positive sign. A positive change in these factors induces then an increase in the number of claims. The percentage change of the factor status_married is 27 %; this means that the number of claims filed by the married persons is 27 % more than the others. Whereas the percentages change of the factor status_single is around -43 %. Hence, single persons are more profitable for the insurance and should be of a prime interest of the underwriting strategies. For industrial cities, the percentage change is around 42 %. Thus, persons in great cities are more exposed to sickness that the small cities where the industrial activities are less preponderant. The percentages change of Industry_activity and Services_activity are 72 and 110 %, respectively. Persons in industrial and services activities are very exposed to sickness; they should be a bad target for the underwriters. Second, we compute the test introduced by [4] for detecting overdispersion in the data (Table 5). The t-statistc of the parameter a of this test is z = 83.510 with p value2.2e-16. This indicates that the dependent count variable is overdispersed. A second measure allowing the dispersion test is the Probability integral transforms (PIT). The histogram of the PIT is bump shaped (Fig. 1). This indicates also an overdispersion of the response variable. Since the histogram (Fig. 2) is highly peaked at zero, we can state that the overdispersion is due to excess of zeroes.