concatenated together. When ReduceP

concatenated together. When ReducePostingsToLists (Figure 5.14) is called, the
emitted postings have been shuffled so that all postings for the same word are
together. The Reducer calls WriteWord to start writing an inverted list and then
uses EncodePosting to write each posting.
5.6.4 Update
So far, we have assumed that indexing is a batch process. This means that a set of
documents is given to the indexer as input, the indexer builds the index, and then
the system allows users to run queries. In practice, most interesting document col
lections are constantly changing. At the very least, collections tend to get bigger
over time; every day there is more news and more email. In other cases, such as
web search or file system search, the contents of documents can change over time
as well. A useful search engine needs to be able to respond to dynamic collections.
We can solve the problem of update with two techniques: index merging and
result merging. If the index is stored in memory, there are many options for quick
index update. However, even if the search engine is evaluating queries in mem
ory, typically the index is stored on a disk. Inserting data in the middle of a file
is not supported by any common file system, so direct disk-based update is not
straightforward. We do know how to merge indexes together, though, as we saw
in section 5.6.2. This gives us a simple approach for adding data to the index: make
a new, smaller index (I
2
) and merge it with the old index (I
1
) to make a new in
dex containing all of the data (I ). Postings in I
1
for any deleted documents can
be ignored during the merge phase so they do not appear in I .
Index merging is a reasonable update strategy when index updates come in
large batches, perhaps many thousands of documents at a time. For single docu
ment updates, it isn’t a very good strategy, since it is time-consuming to write the
entire index to disk. For these small updates, it is better to just build a small index

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

เชื่อมรวมกัน เมื่อเรียกว่า ReducePostingsToLists (รูปที่ 5.14) การมีถูกสับออกมาลงเพื่อที่จะลงรายการบัญชีทั้งหมดสำหรับคำเดียวกันร่วมกัน ลดการเรียก WriteWord เพื่อเริ่มเขียนรายชื่อคว่ำแล้วใช้ EncodePosting ในการเขียนแต่ละโพสต์5.6.4 ปรับปรุงเพื่อห่างไกล เราได้สันนิษฐานว่า การทำดัชนีเป็นกระบวนการชุดงาน ซึ่งหมายความ ว่า ชุดของเอกสารให้สร้างดัชนีเป็นการป้อนข้อมูล ดัชนี สร้างตัวทำดัชนีแล้วระบบจะให้ผู้ใช้เรียกใช้แบบสอบถาม ในทางปฏิบัติ คอลัมน์เอกสารที่น่าสนใจlections จะเปลี่ยนแปลงตลอดเวลา อย่างน้อย คอลเลกชันมีแนวโน้มโตขึ้นเรื่อย ๆช่วงเวลา ทุกวันมีข่าวเพิ่มเติมและเพิ่มเติมอีเมล์ ในกรณีอื่น ๆ เช่นค้นหาเว็บหรือการค้นหาระบบแฟ้ม เนื้อหาของเอกสารที่สามารถเปลี่ยนแปลงตลอดเวลาเป็นอย่างดี เครื่องมือค้นหาที่มีประโยชน์จำเป็นที่จะตอบสนองกับคอลเลกชันแบบไดนามิกเราสามารถแก้ปัญหาปรับปรุง ด้วยเทคนิคที่สอง: ดัชนีการผสาน และผลลัพธ์การผสาน ถ้าดัชนีถูกเก็บไว้ในหน่วยความจำ มีให้เลือกมากมายอย่างรวดเร็วการปรับปรุงดัชนี อย่างไรก็ตาม แม้ว่ากูเกิลกำลังประเมินแบบสอบถามใน memory โดยปกติจัดเก็บดัชนีบนดิสก์ แทรกข้อมูลกลางไฟล์ไม่สนับสนุนระบบแฟ้มใด ๆ ทั่วไป ดังนั้นไม่มีการปรับปรุงบนดิสก์โดยตรงตรงไปตรงมา เรารู้วิธีการรวมดัชนี แม้ว่า เราเห็นในส่วน 5.6.2 นี้ทำให้เราใช้วิธีง่าย ๆ สำหรับการเพิ่มข้อมูลลงในดัชนี: ทำให้ดัชนีใหม่ ขนาดเล็ก (I2) และผสานกับดัชนีเก่า (I1) ให้ใหม่ในเดกซ์ที่ประกอบด้วยข้อมูลทั้งหมด (I) ลงรายการบัญชีในฉัน1สำหรับเอกสารใด ๆ ที่ถูกลบสามารถถูกละเว้นในระหว่างขั้นตอนการผสานเพื่อจะไม่ปรากฏในฉันการผสานดัชนีคือ กลยุทธ์การปรับปรุงที่เหมาะสมเมื่อดัชนีปรับปรุงมาในกระบวนการขนาดใหญ่ อาจจะหลายพันเอกสารครั้ง สำหรับอมูลเดียวปรับปรุง ment มันไม่ได้เป็นกลยุทธ์ที่ดีมาก เนื่องจากใช้เวลานานในการเขียนดัชนีทั้งหมดไปยังดิสก์ การปรับปรุงเหล่านี้เล็ก ดีกว่าเพียงสร้างดัชนีขนาดเล็ก

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.