10.1. Describe how social media tags are similar to anchor text. How are they
different?
10.2. Implement two algorithms for measuring the similarity between two tags.
The first algorithm should use a standard retrieval model, such as language modeling.
The second algorithm should use the Web or another resource to expand
the tag representation. Evaluate the effectiveness of the two algorithms on a set of
10–25 tags. Describe the algorithms, evaluation metrics, tag set, and results.
10.3. Compute five iterations of HITS (see Algorithm 3) and PageRank (see Figure
4.11) on the graph in Figure 10.3. Discuss how the PageRank scores compare
to the hub and authority scores produced by HITS.
10.4. Describe two examples of online communities that were not already discussed
in this chapter. How can the community-finding algorithms presented in
this chapter be used to detect each?
10.5. Find a community-based question answering site on the Web and ask two
questions, one that is low-quality and one that is high-quality. Describe the answer
quality of each question.
10.6. Find two examples of document filtering systems on the Web. How do they
build a profile for your information need? Is the system static or adaptive?
10.7. List the basic operations an indexer must support to handle the following
tasks: 1) static filtering, 2) adaptive filtering, and 3) collaborative filtering.
10.8. Implement the nearest neighbor–based collaborative filtering algorithm.
Using a publicly available collaborative filtering data set, compare the effectiveness,
in terms of mean squared error, of the Euclidean distance and correlation
similarity.
10.9. Both the clustering and nearest neighbor–based collaborative filtering algorithms
described in this chapter make predictions based on user/user similarity.
Formulate both algorithms in terms of item/item similarity. How can the distance
between two items be measured?
10.10. Form a group of 2–5 people and use a publicly available collaborative
search system. Describe your experience, including the pros and cons of using such
a system.