In principle, Bayes' theorem enables optimal prediction
of the class label for a new instance given a vector
of attribute values. Unfortunately, straightforward application
of Bayes' theorem for machine learning is impractical
because inevitably there is insufficient training
data to obtain an accurate estimate of the full joint
probability distribution. Some independence assumptions
have to be made to make inference feasible. The
naive Bayes approach takes this to the extreme by assuming
that the attributes are statistically independent
given the value of the class attribute. Although
this assumption never holds in practice, naive Bayes
performs surprisingly well in many classification problems.
Furthermore, it is computationally efficient