3.5.1 Testing Model Goodness of Fit for Contingency Tables
For nonsparse contingency tables, it is possible to conduct a goodness-of-fit test of the null hypothesis that the model holds against the alternative hypothesis that it does not. The alternative is equivalent to the saturated model, which fits the data perfectly. The test statistics compare the observed counts in the cells of the contingency table to expected frequency estimates based on the model fit.
At a particular setting of the explanatory variables for which the observed multinomial sample has observations, let denote the observed cell counts for the response categories. Under the null hypothesis that the model holds, the corresponding expected frequency estimates based on the model estimates of equal
The Pearson statistic for testing goodness of fit is
The corresponding likelihood-ratio (deviance) statistic is
Under the null hypothesis that the model holds, and have large-sample chisquared distributions. Their degree of freedom equal the number of cumulative logits modeled minus the number of model parameters. The number of cumulative logits modeled equals the number of multinomial parameters in the saturated model: namely, c-1 times the number of settings of the explanatory variables.
For example, contingency table has c-1 multinomial parameters in each row, for a total of parameters. This is the number of parameters in the saturated model, for which the expected frequency estimates are merely the cell counts. The model (3.9) that treats the explanatory variable as quantitative,
Has a single association parameter and intercept parameters for the logits, a total of parameters. So the residual df for testing goodness of fit are This is one less than the for the independence model, which is the special case of this model with Model (3.14), which treats the explanatory variable as qualitative with row effects,
Has parameters. Its residual as noted by Simon (1974).
When the data are sparse or the model contains at least one continuous predictor, these global goodness-of-fit tests are not valid. Lipsitz et. al. (1996) proposed an alternative goodness-of-fit test for such cases. It generalizes the Hosmer-Lemeshow test for binary logistic regression, which constructs a Pearson statistic comparing observed and fitted counts for a partition of such values according to the estimated probabilities of “success” using the original ungrouped data. This method does not seem to be available in current software. Pulkstenis and Robinson (2004) suggested an alternative approach. This is an area that still deserves serious research attention, to evaluate proposed methods and possibly develop others, such as normal approximations for chi-squared statistics when the data are very sparse.