We work with the Reuters-RCV1 collection as our model collection in this
chapter, a collection with roughly 1 GB of text. It consists of about 800,000
documents that were sent over the Reuters newswire during a 1-year period
between August 20, 1996, and August 19, 1997. A typical document is
shown in Figure 4.1, but note that we ignore multimedia information like
images in this book and are only concerned with text. Reuters-RCV1 covers
a wide range of international topics, including politics, business, sports, and
(as in this example) science. Some key statistics of the collection are shown
in Table 4.2.
We work with the Reuters-RCV1 collection as our model collection in this
chapter, a collection with roughly 1 GB of text. It consists of about 800,000
documents that were sent over the Reuters newswire during a 1-year period
between August 20, 1996, and August 19, 1997. A typical document is
shown in Figure 4.1, but note that we ignore multimedia information like
images in this book and are only concerned with text. Reuters-RCV1 covers
a wide range of international topics, including politics, business, sports, and
(as in this example) science. Some key statistics of the collection are shown
in Table 4.2.
การแปล กรุณารอสักครู่..
