The goal of this article is to investigate the nature of tweet content generated within the time span of a disaster, and define a list of content categories taking into consideration the information involved in
ISPRS Int. J. Geo-Inf. 2015, 4 1551
disaster phases including preparedness, emergency response, and recovery. In our work, tweets are separated into 47 themes, which, to our best knowledge, is the most detailed and complete coding schema for categorizing social media into different themes. The coding schema could be potentially useful to extract social media effectively during different disaster stages and gain a better picture of the complex environment in a time of crisis. We also identified keywords and topics for disaster impact, which is often useful for emergency response and recovery. Additionally, a list of keywords associated with messages of each class are manually extracted for each category and presented as the basis for similar research in the future. Those keywords can be used as a reference for other scholars that apply text pattern match method to mine tweets that belong to a specific category. This paper also introduces a framework that can process and mine social media data for disaster analysis of different stages. Using this framework, relevant tweets for each category can be extracted from the raw data. The following section of this paper is a general review of the research on using social media in a disaster to provide the broader context for our empirical study, followed by the third section describing the methodology for preparing and mining tweets for disaster analysis. Section four demonstrates how to apply the classification results for the disaster analysis. The paper is concluded with a discussion of the issues, challenges and future research directions of using social media data for disaster analysis and study.