Innovation in technology enables people to communicate, share information and
look for their needs by just sitting in rooms and going through some clicks. While social
media has played a very important role in connecting people worldwide, its potential has
stretched beyond the innovative idea of connecting people through their social networks.
While many thought there was no meeting point for the healthcare sector and social
media, it was a surprise when research and innovations have shown that social media
could lay a very significant role in the health care sector.
Research has been done in developing models that could use social media as the
data source for tracking diseases. Most of these analyses are based on models that
prioritize strong correlations with seasonal and pandemic kinds of diseases over the
health conditions of a specific individual user.
The aim of this research is to develop a diabetes detecting tool at the individual
level using a sample of Twitter IDs that have been collected from the Twitter search
using the query –‘recently diagnosed’ and ‘diabetes’. Based on text analysis of social
media posts using Fisher’s exact test, without any medical settings, this thesis
investigates the feasibility of diagnosing and classifying diabetes via machine learning
techniques, Naive Bayes and Random Forest classifiers. It was found that more than half
(20/30 ≈ 67%) of the users in the sample mentioned being tested positive for diabetes,
about 27% (8/30) of the users mentioned the symptoms and got involved in diabetes
related discussions, but did not mention about being tested positive and rest 4% had no
mention of symptoms or diabetes.