We approach the task of identifying forums with rich product discussion as an information retrieval problem, ranking forums with respect to a category-brand query. We take a probabilistic language modeling approach to scoring the online forums with respect to the query. In our approach, we aggregate information from the lowest-level in the forum hierarchy, the message text, to the level we're interested in scoring, the forum. We rank forums by their conditional likelihood given the query. The estimation of this probability is shown below, first applying Bayes theorem and marginalizing over the message threads in the collection t.
Letting f be the forum, and q be the user's query: