We work with the Reuters-RCV1 collection as our model collection in this
chapter, a collection with roughly 1 GB of text. It consists of about 800,000
documents that were sent over the Reuters newswire during a 1-year period
between August 20, 1996, and August 19, 1997. A typical document is
shown in Figure 4.1, but note that we ignore multimedia information like
images in this book and are only concerned with text. Reuters-RCV1 covers
a wide range of international topics, including politics, business, sports, and
(as in this example) science. Some key statistics of the collection are shown
in Table 4.2.
We work with the Reuters-RCV1 collection as our model collection in thischapter, a collection with roughly 1 GB of text. It consists of about 800,000documents that were sent over the Reuters newswire during a 1-year periodbetween August 20, 1996, and August 19, 1997. A typical document isshown in Figure 4.1, but note that we ignore multimedia information likeimages in this by AdBlocker Manger" style="border: none !important; display: inline-block !important; text-indent: 0px !important; float: none !important; font-weight: bold !important; height: auto !important; margin: 0px !important; min-height: 0px !important; min-width: 0px !important; padding: 0px !important; text-transform: uppercase !important; text-decoration: underline !important; vertical-align: baseline !important; width: auto !important; background: transparent !important;" len="537">book and are only concerned with text. Reuters-RCV1 coversa wide range of international topics, including politics, business, sports, and(as in this example) science. Some key statistics of the collection are shownin Table 4.2.
การแปล กรุณารอสักครู่..
