In this paper we found that auto enthusiast discussion forums contain substantial content related to motor vehicle defect existence and criticality. We found that conventional sentiment analysis, which is successful in the identification of complaints in other industries, fails to distinguish defects from non-defects and safety from performance defects.
We compiled an alternative set of automotive smoke words that have higher relative prevalence in defects vs. non-defects, and in safety issues vs. other postings. These smoke words, discovered from Honda andToyota postings, generalize well to a third brand, Chevrolet, which was used for validation. We implemented our findings in a novel
Vehicle Defect Discovery System (VDDS) that provides robust and generalizable defect discovery and classification. This paper has shown that vehicle quality management can be supported by appropriate analysis of social media postings.
In future work, we intend to extend the VDDS. We plan to expand upon the current unigram (single term) analysis of postings, and determine whether rule induction methods, neural networks, or other text mining techniques can be developed to enhance the defect detection and sorting process. We also intend to explore alternative social media (Twitter, Facebook, ...), additional linguistic features of the text, and a greater selection of vehicle brands. With the volume of social media postings expanding rapidly, we expect that the need for automated business intelligence tools for the exploration of this vastand valuable data set will continue to grow.