In this example, you can see that t

In this example, you can see that the text in blue circle
contains a hyperlink, for this case, because the traditional
DOM tree methods always use the text-link ratio (ratio of the
length of the text in the node and the length of the hyperlink
text in the node) to judge whether the node is a text node, the
text in blue circle will always be treated as a pure link which
has no sense, they will be thrown away wrongly too.
In our method, we’ll use the VIPS algorithm to overcome
this problem and improve the performance of the webpage
content extraction. For VIPS can divide the webpage into
some semantic blocks, it can get a whole view of the
webpage and get the position information of each block. In
order to recall the sentences which are thrown away, we’ll
keep the DOM tree node tag when using traditional method
to extract the content. The steps are as follows.
1. Using VIPS to divide the webpage into several
blocks and keep the coordinate information of each
block and the node tag in each block.
2. Using traditional method to extract the content of
the webpage and keep the html tag information of
each content node.
3. Using the coordinate information of each block to
determine which blocks should be content blocks.
4. Map the extracted content node tag sequence to the
content block according to the node tag and the
content itself. If some node tags in content block
don’t appear in extracted content node tag sequence,
we recall the node and the text in this node.

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

In this example, you can see that the text in blue circlecontains a hyperlink, for this case, because the traditionalDOM tree methods always use the text-link ratio (ratio of thelength of the text in the node and the length of the hyperlinktext in the node) to judge whether the node is a text node, thetext in blue circle will always be treated as a pure link whichhas no sense, they will be thrown away wrongly too.In our method, we’ll use the VIPS algorithm to overcomethis problem and improve the performance of the webpagecontent extraction. For VIPS can divide the webpage intosome semantic blocks, it can get a whole view of thewebpage and get the position information of each block. Inorder to recall the sentences which are thrown away, we’llkeep the DOM tree node tag when using traditional methodto extract the content. The steps are as follows.1. Using VIPS to divide the webpage into severalblocks and keep the coordinate information of eachblock and the node tag in each block.2. Using traditional method to extract the content ofthe webpage and keep the html tag information ofeach content node.3. Using the coordinate information of each block todetermine which blocks should be content blocks.4. Map the extracted content node tag sequence to thecontent block according to the node tag and thecontent itself. If some node tags in content blockdon’t appear in extracted content node tag sequence,เราเรียกคืนข้อความในโหนดนี้และโหนด

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

ในตัวอย่างนี้ คุณสามารถเห็นได้ว่าข้อความใน
วงกลมสีฟ้าที่มีเชื่อมโยงหลายมิติ สำหรับคดีนี้ เพราะดั้งเดิม
ดอมต้นไม้วิธีการมักจะใช้ข้อความลิงค์ Ratio ( อัตราส่วนของ
ความยาวของข้อความในโหนดและความยาวของการเชื่อมโยงหลายมิติ
ข้อความในโหนดโหนด ) ตัดสินว่า เป็นข้อความที่โหนด
ข้อความในวงกลมสีฟ้าจะถูกถือว่าเป็นบริสุทธิ์ลิงค์ที่
ไม่มีความรู้สึกพวกเขาจะทิ้งผิดเหมือนกัน . . . . . .
ในวิธีของเรา เราก็จะใช้วิธีที่จะเอาชนะปัญหานี้
วีไอพี และเพิ่มประสิทธิภาพของหน้าเว็บ
เนื้อหาการสกัด สำหรับวีไอพีสามารถแบ่งเว็บเพจเป็น
บางความหมายบล็อกมันสามารถได้รับมุมมองทั้งหมดของ
เว็บเพจและได้รับตำแหน่งข้อมูลของแต่ละบล็อก ใน
เพื่อให้ระลึกถึงประโยคที่ทิ้งเราจะ
เก็บแท็กต้นไม้โหนด DOM เมื่อใช้
วิธีดั้งเดิมเพื่อสกัดเนื้อหา ขั้นตอนมีดังนี้ .
1 การแบ่งเว็บเพจเป็นวีไอพีหลาย
บล็อก และเก็บข้อมูลพิกัดของแต่ละ
บล็อกและโหนดแท็กในแต่ละบล็อก .
2 การใช้วิธีแบบดั้งเดิมเพื่อแยกเนื้อหาของหน้าเว็บและเก็บ

เนื้อหาของแท็ก HTML ข้อมูลแต่ละโหนด .
3การประสานข้อมูลแต่ละบล็อก

ตรวจสอบที่บล็อกควรจะบล็อกเนื้อหา .
4 แผนที่แยกเนื้อหาโหนดแท็กลำดับ
บล็อกเนื้อหาตามโหนดแท็ก
เนื้อหาเอง ถ้าบางโหนดแท็กใน
เนื้อหาบล็อกไม่ปรากฏในการสกัดเนื้อหาโหนดแท็กลำดับ
เราเรียกคืนโหนดและโหนดข้อความในนี้

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.