For prediction, we build an auto-regression with exogenous input (ARX) model where
ILI rate of previous weeks from CDC forms the autoregressive portion of the model, and
the OSN data serve as exogenous input. Our results show that while previous ILI data
from CDC offer a realistic (but delayed) measure of a flu epidemic, OSN data provides a
real-time assessment of the current epidemic condition and can be used to compensate for
the lack of current ILI data. The model using combination of Twitter and Facebook data
provides best prediction accuracy over Twitter and Facebook alone. We observe that the
OSN data are in fact highly correlated with the ILI data across the different regions within
United States.