4.2 Determination of document relev

4.2 Determination of document relevance in VSM
Once the documents are indexed, a search system can rank and order the documents according to the calculated similarity to a query. The query is represented in the same fashion as the documents – by term vector with ratings for each stored term – except that the normalization of the vector is not essential.
The similarity between a single document and the query is calculated as a cosine similarity between two vectors. If the two vectors are displayed in the N dimensional Cartesian coordinate system (where N is the total number of terms in both vector, and each axis is representing the value of one term) then the cosinesimilarity would be equal to the cosine of the angle between the two vectors.
To calculate the cosine similarity, the weight of each term from one of the vectors is multiplied with the weight of the same term from other vector (zero weight is assumed if term does not exists), and then all values have to be summarised. Finally that value should be divided by the length of the first vector and by the length of the second vector.
As term vector for documents is normalized during the indexing, its length can be omitted as it is equal to 1 for all documents. The same applies to the query term vector – it can be normalized once.
The figure 1.3 shows an example of two normalized vectors and the cosine similarity between vectors V1 and V2 is calculated below.

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

4.2 Determination of document relevance in VSMOnce the documents are indexed, a search system can rank and order the documents according to the calculated similarity to a query. The query is represented in the same fashion as the documents – by term vector with ratings for each stored term – except that the normalization of the vector is not essential.The similarity between a single document and the query is calculated as a cosine similarity between two vectors. If the two vectors are displayed in the N dimensional Cartesian coordinate system (where N is the total number of terms in both vector, and each axis is representing the value of one term) then the cosinesimilarity would be equal to the cosine of the angle between the two vectors.To calculate the cosine similarity, the weight of each term from one of the vectors is multiplied with the weight of the same term from other vector (zero weight is assumed if term does not exists), and then all values have to be summarised. Finally that value should be divided by the length of the first vector and by the length of the second vector.As term vector for documents is normalized during the indexing, its length can be omitted as it is equal to 1 for all documents. The same applies to the query term vector – it can be normalized once.The figure 1.3 shows an example of two normalized vectors and the cosine similarity between vectors V1 and V2 is calculated below.

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

4.2 การกำหนดความเกี่ยวข้องในเอกสาร VSM
เมื่อเอกสารที่มีการจัดทำดัชนีระบบการค้นหาสามารถจัดอันดับและสั่งซื้อเอกสารตามที่คล้ายคลึงกันคำนวณแบบสอบถาม แบบสอบถามเป็นตัวแทนในแบบเดียวกับเอกสาร - โดยเวกเตอร์ยาวกับการให้คะแนนสำหรับแต่ละคำที่เก็บไว้ - ยกเว้นว่าการฟื้นฟูของเวกเตอร์ที่ไม่จำเป็น.
ความคล้ายคลึงกันระหว่างเอกสารฉบับเดียวและแบบสอบถามที่มีการคำนวณเป็นความคล้ายคลึงกันโคไซน์ระหว่างสอง เวกเตอร์ หากทั้งสองเวกเตอร์จะมีการแสดงใน N มิติ Cartesian ระบบพิกัด (โดยที่ N คือจำนวนของข้อตกลงทั้งเวกเตอร์และแกนแต่ละคนจะเป็นตัวแทนของค่าของคำ) แล้ว cosinesimilarity จะเท่ากับโคไซน์ของมุมระหว่าง ทั้งสองเวกเตอร์.
ในการคำนวณความคล้ายคลึงกันโคไซน์น้ำหนักของแต่ละเทอมจากหนึ่งในเวกเตอร์คูณกับน้ำหนักของระยะเดียวกันจากเวกเตอร์อื่น ๆ (ศูนย์น้ำหนักจะถือว่าถ้าระยะไม่อยู่) แล้วค่าทุกคนต้อง สรุป สุดท้ายค่าที่ควรจะแบ่งตามความยาวของเวกเตอร์แรกและโดยความยาวของเวกเตอร์ที่สอง.
ในฐานะที่เป็นเวกเตอร์ระยะสำหรับเอกสารที่เป็นปกติในระหว่างการจัดทำดัชนีความยาวของมันสามารถมองข้ามในขณะที่มันจะมีค่าเท่ากับ 1 สำหรับเอกสารทั้งหมด เช่นเดียวกับเวกเตอร์ระยะแบบสอบถาม - มันสามารถนัยหนึ่งครั้ง.
รูปที่ 1.3 แสดงตัวอย่างของสองเวกเตอร์ปกติและความคล้ายคลึงกันระหว่างเวกเตอร์โคไซน์ V1 และ V2 คำนวณด้านล่าง

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

4.2 กำหนดความเกี่ยวข้องของเอกสารใน vsm
เมื่อเอกสารที่มีการจัดทำดัชนี , ระบบค้นหาตำแหน่งและลำดับเอกสารตามค่าความคล้ายคลึงกับแบบสอบถาม แบบสอบถามจะแสดงในแฟชั่นเช่นเดียวกับเอกสาร–โดยในระยะเวกเตอร์คะแนนสำหรับแต่ละเก็บไว้ระยะยาว–ยกเว้นว่าบรรทัดฐานของเวกเตอร์ที่ไม่จำเป็น .
ความคล้ายคลึงกันระหว่างเอกสารฉบับเดียวและการคำนวณเป็นโคไซน์ความคล้ายคลึงกันระหว่างสองเวกเตอร์ ถ้าเวกเตอร์ทั้งสองจะแสดงในระบบพิกัดคาร์ทีเซียนมิติ ( โดยที่ n คือจำนวนของข้อตกลงทั้งสองเวกเตอร์ และแต่ละแกนจะแสดงคุณค่าของเทอม ) แล้ว cosinesimilarity จะเท่ากับโคไซน์ของมุมระหว่างเวกเตอร์ทั้งสอง .
เพื่อคำนวณค่าโคไซน์ ความเหมือน น้ำหนักของแต่ละเทอมหนึ่งของเวกเตอร์คูณกับน้ำหนักของระยะเดียวกันจากเวกเตอร์อื่น ๆ ( ศูนย์น้ำหนักถือว่าถ้าเงื่อนไขไม่ได้อยู่แล้ว ) แล้วค่าทั้งหมดต้องสรุป . ในที่สุดค่าว่าควรแบ่งตามความยาวของเวกเตอร์แรกและความยาวของเวกเตอร์
2เป็นเวกเตอร์ที่ยาวสำหรับเอกสารเป็นปกติในช่วงดัชนี ความยาวของมันสามารถละเว้นมันเท่ากับ 1 สำหรับเอกสารทั้งหมด เดียวกันกับแบบสอบถามซึ่งสามารถเวกเตอร์ระยะปกติเมื่อ .
รูปที่ 1.3 แสดงตัวอย่างสองรูปเวกเตอร์และโคไซน์ความเหมือนระหว่างเวกเตอร์ v1 และ v2 จะคำนวณดังนี้

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.