Mittrapiyanurak and Sornlertlamvani

Mittrapiyanurak and Sornlertlamvanich (2000) presented an algorithm to extract
sentences from a paragraph by detecting sentence-break spaces.
The algorithm considered two consecutive strings with a space in between.
The strings were first segmented into word sequences with POS tagged to each word.
By exploiting a POS n-gram model,
it was verified whether the space was a sentence break or not.
The system was trained and tested with subsets of the ORCHID corpus
(Charoenporn et al., 1997), and 80% break detection and
9% false-break rates were achieved.
An extension of the algorithm was proposed by Charoenpornsawat and Sornlertlamvanich (2001).
Not only the POSs of surrounding
words but also collocations of surrounding words and
lengths of surrounding token texts were used as the features
for determining whether space characters were sentence
boundaries.
These features were confirmed to be useful.
In (Charoenpornsawat and Sornlertlamvanich, 2001),
these features were extracted automatically by machine learning using the system Winnow.
Winnow was also used for sentence break detection.
Compared to the POS n-gram model,
a 1.7% improvement of break-detection rate
and a 79% reduction of false-break rate were achieved.
Although these gains are substantial,
the algorithms depend strongly on word segmentation and POS tagging.
A larger POS tagged corpus is needed to improve all these components

Mittrapiyanurak and Sornlertlamvanich (2000) presented an algorithm to extract
sentences from a paragraph by detecting sentence-break spaces. 
The algorithm considered two consecutive strings with a space in between. 
The strings were first segmented into word sequences with POS tagged to each word. 
By exploiting a POS n-gram model, 
it was verified whether the space was a sentence break or not. 
The system was trained and tested with subsets of the ORCHID corpus
(Charoenporn et al., 1997), and 80% break detection and
9% false-break rates were achieved. 
An extension of the algorithm was proposed by Charoenpornsawat and Sornlertlamvanich (2001). 
Not only the POSs of surrounding
words but also collocations of surrounding words and
lengths of surrounding token texts were used as the features
for determining whether space characters were sentence
boundaries. 
These features were confirmed to be useful.
In (Charoenpornsawat and Sornlertlamvanich, 2001), 
these features were extracted automatically by machine learning using the system Winnow. 
Winnow was also used for sentence break detection. 
Compared to the POS n-gram model, 
a 1.7% improvement of break-detection rate 
and a 79% reduction of false-break rate were achieved. 
Although these gains are substantial, 
the algorithms depend strongly on word segmentation and POS tagging. 
A larger POS tagged corpus is needed to improve all these components

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

Mittrapiyanurak และ Sornlertlamvanich (2000) เสนอขั้นตอนวิธีการแยกประโยคจากย่อหน้าด้วยประโยคแบ่งพื้นที่การตรวจสอบ อัลกอริทึมที่ถือสายสองติดต่อกัน ด้วยช่องว่างระหว่าง สายอักขระถูกครั้งแรกเมื่อแบ่งลำดับคำกับ POS เพื่อแต่ละคำที่ติดแท็ก โดย exploiting แบบ n กรัม POS จะถูกตรวจสอบว่า พื้นที่ถูกแบ่งประโยค หรือไม่ ระบบการฝึกอบรม และทดสอบกับชุดย่อยของคอร์พัสคริออร์คิด(Charoenporn et al., 1997), และหาตัวแบ่ง 80% และราคาเท็จแบ่ง 9% ความสำเร็จ ส่วนขยายของอัลกอริทึมถูกนำเสนอ โดย Charoenpornsawat และ Sornlertlamvanich (2001) ไม่เฉพาะ POSs ล้อมรอบคำแต่ยังให้ล้อมรอบคำ และความยาวล้อมรอบข้อความโทเค็นถูกใช้เป็นสำหรับการกำหนดว่า การเว้นวรรคได้ประโยคขอบเขตการ คุณลักษณะเหล่านี้ได้ยืนยันว่า จะเป็นประโยชน์(Charoenpornsawat และ Sornlertlamvanich, 2001), มีสกัดคุณลักษณะเหล่านี้โดยอัตโนมัติ โดยการเรียนรู้ของเครื่องที่ใช้ระบบ Winnow Winnow ยังใช้สำหรับการตรวจหาการแบ่งประโยค เมื่อเทียบกับรุ่น n กรัม POS การปรับปรุง 1.7% ของอัตราการตรวจจับทำลาย และลด 79% เท็จแบ่งอัตราความสำเร็จ ถึงแม้ว่ากำไรเหล่านี้จะพบ อัลกอริทึมพึ่งขอแบ่งคำและติดป้าย POS คอร์พัสคริติดแท็ก POS มีขนาดใหญ่ที่จำเป็นสำหรับการปรับปรุงคอมโพเนนต์เหล่านี้

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

Mittrapiyanurak และ Sornlertlamvanich (2000)
นำเสนอขั้นตอนวิธีการที่จะดึงประโยคจากวรรคโดยการตรวจสอบพื้นที่ประโยคแบ่ง.
ขั้นตอนวิธีพิจารณาสองสายติดต่อกันด้วยช่องว่างระหว่าง.
สตริงถูกแบ่งครั้งแรกในลำดับคำที่มี POS ติดแท็กแต่ละคำ.
โดย การใช้ประโยชน์จากรูปแบบ POS n-กรัม,
มันคือการตรวจสอบว่าพื้นที่ที่เป็นตัวแบ่งประโยคหรือไม่.
ระบบที่ได้รับการฝึกอบรมและการทดสอบด้วยการย่อยของร่างกายกล้วยไม้
(เจริญพร et al., 1997) และ 80% การตรวจสอบการหยุดพักและ
9% อัตราเท็จแบ่งกำลังประสบความสำเร็จ.
เป็นส่วนหนึ่งของขั้นตอนวิธีที่เสนอโดย Charoenpornsawat และ Sornlertlamvanich (2001).
ไม่เพียง แต่ poss
รอบคำแต่ยัง collocations รอบคำและความยาวของรอบตำราโทเค็นถูกนำมาใช้เป็นคุณสมบัติในการพิจารณาว่าพื้นที่ตัวอักษรเป็นประโยคขอบเขต. คุณสมบัติเหล่านี้ได้รับการยืนยันที่จะเป็นประโยชน์. ใน (Charoenpornsawat และ Sornlertlamvanich, 2001), คุณสมบัติเหล่านี้ถูกสกัดโดยอัตโนมัติโดยการเรียนรู้เครื่องที่ใช้ระบบโปรย. โปรยก็ยังใช้สำหรับการตรวจสอบประโยคแบ่ง. เมื่อเทียบกับ POS ละลาย รูปแบบกรัมอัตราการปรับปรุงการตรวจสอบแบ่ง1.7% และลดลง 79% ของราคาที่ผิดพลาดแบ่งกำลังประสบความสำเร็จ. แม้ว่ากำไรเหล่านี้มีความสำคัญขั้นตอนวิธีการขึ้นอยู่อย่างมากในการตัดคำและการติดแท็ก POS. คลังข้อมูลแท็ก POS ที่มีขนาดใหญ่เป็นสิ่งจำเป็นเพื่อ ปรับปรุงองค์ประกอบเหล่านี้ทั้งหมด

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

ีย์ เคียนสันเทียะ และ sornlertlamvanich ( 2000 ) ได้เสนอวิธีการสกัด
ประโยคจากย่อหน้าประโยคโดยการแบ่งเป็น .
ขั้นตอนวิธีพิจารณาสองสายติดต่อกัน มีช่องว่างระหว่างกัน
" ถูกแบ่งออกเป็นลำดับคำกับ POS ติดแท็กในแต่ละคำ
โดย exploiting POS n-gram รูปแบบ
มันถูกยืนยันว่าพื้นที่เป็นประโยคที่แตก หรือ ไม่
ระบบการฝึกอบรมและทดสอบกับชุดย่อยของกล้วยไม้คลังข้อมูล
( charoenporn et al . , 1997 ) และ 80% แบ่งการตรวจและ
9 % แบ่งเป็นเท็จอัตรารับ
เป็นส่วนขยายของอัลกอริทึมที่เสนอโดย charoenpornsawat และ sornlertlamvanich ( 2001 )
ไม่เพียง แต่รูปแบบของรอบ
คำ แต่ยัง collocations รอบคำและ
ความยาวของรอบสัญญาณข้อความที่ใช้เป็นคุณสมบัติสำหรับการกำหนดว่าตัวอักษรพื้นที่
มีขอบเขตประโยค

คุณสมบัติเหล่านี้ได้รับการยืนยันที่จะเป็นประโยชน์ .
( charoenpornsawat และ sornlertlamvanich , 2001 ) ,
คุณสมบัติเหล่านี้ถูกสกัดโดยอัตโนมัติโดยเครื่องการเรียนรู้โดยใช้ระบบเป้าหมาย .
ฝัดยังใช้ประโยคทำลายการตรวจสอบ
เมื่อเทียบกับ n-gram POS แบบ
1.7 % การปรับปรุงแบ่งตรวจสอบอัตรา
และร้อยละ 79 อัตราการแบ่งเท็จถูกความ
ถึงแม้ว่าผลประโยชน์เหล่านี้เป็นรูปธรรม
อัลกอริทึมขึ้นอยู่อย่างมากในการตัดคำ และ ระบบการติดแท็ก
POS , คลังข้อมูลขนาดใหญ่ จะต้องปรับปรุงองค์ประกอบเหล่านี้ทั้งหมด

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.