Model assessment
We labeled samples (i.e., subjects in a specific wave) as fallers if they reported at least one fall at the follow-up after the baseline assessment. Similarly, if they reported more than one fall, they were labeled as multiple fallers. All the samples and their associated predictions (history of falls, gait speed, SPPB, FRAT-up, Lasso) were used to evaluate the discriminative ability of the different tools (Fig 1).
We calculated Receiver Operating Characteristic (ROC) curves of the risk scores for fallers and multiple fallers. The ROC curves for the model fitted with Lasso were derived using the means μ of the predictive distributions. The discriminative ability was measured as the area under the ROC curve (AUC). The AUC 95% confidence intervals were calculated via the DeLong method [35]. The AUCs were compared with Delong tests for paired ROC curves [35].
The Lasso model was also evaluated for calibration (i.e. the agreement between its predictions and the observed number of falls) by means of a reliability diagram, marginal calibration plot, and probability integral transform (PIT). Reliability diagrams (also known as calibration plots or attribute diagrams) are generally used for dichotomous outcomes [36]. Here the reliability diagram was adapted for count data and used to plot the observed fall rate against the predicted fall rate. The marginal calibration plot shows the observed and predicted number of samples for each possible outcome [37]. PIT is used as diagnostics of probabilistic calibration. It detects whether the variance of the probabilistic predictions agrees with the dispersion of the observations (neutral dispersion), or whether it expresses too little or too much uncertainty (under-dispersion or over-dispersion, respectively) [30]. It was calculated according to the non-randomized procedure for count data described in [37].