These problems have motivated new surveillance techniques
based upon internet data sources such as search queries and social
media posts. Essentially, these methods use large-scale data mining
techniques to identify health-related activity traces within the data
streams, extract them, and transform them into some useful
metric. The basic approach is to train a statistical estimation
model against ground truth data, such as ministry of health disease
incidence records, and then apply the model to generate estimates
when the true data are not available, e.g., when forecasting or
when the true data have not yet been published. This has proven
effective and has spawned operational systems such as Google Flu
Trends (http://www.google.org/flutrends/). However, four key
challenges remain before internet-based disease surveillance
models can be reliably integrated into an decision-making toolkit