Overall, the performance of the statistical models based on queries submitted to the Swedish Vårdguiden web site exceeded our expectations during the pandemic, especially because the models were trained on seasonal influenza. The curve produced by the web query-based sentinel model was very similar to the one obtained from the traditional surveillance the model is supposed to mimic.
We have shown that an independently developed and controlled system such as ours can be comparable in reliability to Google Flu Trends, a model that is trained on much larger data volumes. One downside is that our model has a higher variance, which becomes manifest in numerous small fluctuations of the model estimates in Figure 2, trend shifts that are not reflected in the reported sentinel data. Such false signals can be a cause for concern if the model is to be used to guide public health action, and means in practice that observed trend shifts cannot be trusted unless sustained for two weeks or more.
While others have indicated that the under-estimation of the influenza peak in Sweden of Google Flu Trends could be due to a limitation in the Swedish sentinel system [23], the fact that our model (in addition to other surveillance methods) shows the same pattern as the sentinel reports [17], rather indicates that it is Google Flu Trends that is lacking in the quantitative estimation.
The quantitative evaluation statistics also indicate good reliability. It is debatable, however, whether they are suitable for evaluating surveillance systems for communicable diseases. Such measurements tend to investigate the performance in estimating absolute levels of activity, and give equal weight to the entire period of investigation, including periods of low activity. In future work, it might be more important to look at how a surveillance system captures the dynamics of the disease, such as rapid increases in activity levels or the timing of peaks.
We have also described the results of a qualitative evaluation in which we interviewed four colleagues who were receiving the output from the statistical models. In summary, it was valuable for those working with the surveillance to have an additional source of information, as this increased their confidence in their estimates and predictions of the spread and the impact of the influenza A(H1N1)2009 virus.
One unknown factor here is the media impact on search behaviour. The interviewees explicitly asked for media activity to be incorporated in the statistical model. Such a model should intuitively perform better than a model without this information. We have performed some early experiments on including media activity in our web query-based statistical models. However, we have not yet found a satisfactory model to correct for the assumed impact of media reporting on peoples’ search behaviour.