The Boolean search logic grew very complicated eventually, but the data set still contained about 35 percent noise (during the month of November 2011, we retrieved 179 tweets, in which 63 were irrelevant to college students). Also, given that the data set was so small, we seemed to have ruled out many other relevant tweets together with the spam and irrelevant tweets.