Approach to solve the problem statement: Twitter provides its API (Application
Programming Interface) to researchers and other web developers, and hence allows
a web platform to access and share information from one another.
My approach towards solving this problem involves, with the help of these
Twitter APIs, collecting all the individual posts, filtered from the IDs based on the
Twitter search query which involves keywords ‘recently diagnosed’ and ‘diabetes’
and then sorting them as cases and then comparing them to a sample of randomly
selected subset of people without the attribute (the controls) [44]. The entire
7
Twitter history of these individuals was extracted, and a model was developed to
count the number of times they posted the symptoms (such as sleep, water, eye,
rash, tired, etc.) related to diabetes [45], which being the early symptoms of
diabetes, the trends for these symptoms match the curve obtained from searching
the keyword ‘diabetes’ in Google trends. The more the number of keywords
mentioned in the posts, over a period of time, the greater the probability of a
person being possibly diabetic.