Conclusions
Social media messages are rich in content, capturing and reflecting many aspects of individual lives, experiences, behaviors, and reactions to a specific topic or event. Therefore, these messages can be used to monitor and track geopolitical and disaster events, support emergency response and coordination, and serve as a measure of public interest or concern about events. This work presents a coding schema for separating social media messages into different themes within different disaster stages. A number of standard text mining techniques are experimentally used to classify the collected tweets during a disaster, Hurricane Sandy in 2012. A logistic regression classifier is selected to train and automatically categorize the messages into our predefined categories. The classifier can achieve an overall precision of 0.647 on average. As introduced in Section 3.3, a few categories whose sample sizes are too small (less than 20 tweets) to train the classifier are discarded. Additionally, a few themes that include too small-sized samples (less than 20 tweets) to train the classifier are discarded (preparedness.plans). Some categories of similar topics are combined. In the future, a more sophisticated classification model that can handle unbalanced data may be developed to increase the classification accuracy. Different combinations of similar themes may be also tested to obtain better accuracy. Additionally, actionable information should be extracted for each disaster phase rather than response phase. For example, we could extract the open stores available for stocking up on disaster essentials and restoring daily supplies before and after disasters, which were less examined in previous studies.