where DEV is deviance and K is the number of parameters in the model. As more parameters (structure) are added to the model, the fit will improve. If model selection were based only on this criterion,one would end up always selecting the model with the most possible parameters, which usually results in overfitting, especially with complex data sets. The second component, K, is the number of parameters in the model and serves as a “penalty” in which the penalty increases as the number of parameters increase. AIC thus
strikes a balance between overfitting and underfitting. Many software packages now compute AIC. In very general terms, the model
with the lowest AIC value is the “best” model, although other approaches
such as model averaging can be applied.