The dataset used in the following study is a richly annotated document collection, as well as a product catalog and product category ontology. The documents used are a sample of webpages from online forums from the first tier of a commercial search engine index, excluding those documents identified as pornography and spam. The forum structure is extracted, as described above in Section 4.2. The document text is annotated with product mentions and those mentions are mapped into a product category ontology as described in Section 4.1. For the purposes of this study, we focus only on consumer electronics products. The final dataset contains over 3.5 million online forums, with almost 400 million messages organized into over 40 million message threads and contributed by over 45 million authors. Almost 40% of the message threads containing at least one product mention, and there are over 350 million total mentions in the collection, corresponding to 95 million unique category-brand pairs.