The students also use regular expressions to locate the words in the email body and convert
them into a word vector, that is, counts of occurrences of all words in the corpus.
After converting the email into information that can be used for analysis, the students use na¨ıve
Bayes to calculate the likelihood a message is spam given its word vector.