bstract
Background: Terminologies that account forvariation in languageuse by linking synonym sandab breviations to their corresponding concept are important enablers of high-quality information extraction from medical texts. Due to the use of specialized sub-languages in the medical domain, manual construction of semantic resources that accurately reflect language use is both costly and challenging, often resulting in low coverage. Although models of distributional semantics applied to large corpora provide a potential means of supporting development of such resources, their ability to isolate synonymy from other semantic relations is limited. Their application in the clinical domain has also only recently begun to be explored. Combining distributional models and applying them to different types of corpora may lead to enhanced performance on the tasks of automatically extracting synonyms and abbreviation-expansion pairs.
Results: Acombinationoftwodistributionalmodels–RandomIndexingandRandomPermutation–employedin conjunction with a single corpus outperforms using either of the models in isolation. Furthermore, combining semantic spaces induced from different types of corpora – a corpus of clinical text and a corpus of medical journal articles – further improves results, outperforming a combination of semantic spaces induced from a single source, as well as a single semantic space induced from the conjoint corpus. A combination strategy that simply sums the cosine similarity scores of candidate terms is generally the most profitable out of the ones explored. Finally, applying simple post-processing filtering rules yields substantial performance gains on the tasks of extracting abbreviation-expansion pairs, but not synonyms. The best results, measured as recall in a list of ten candidate terms, for the three tasks are: 0.39 for abbreviations to long forms, 0.33 for long forms to abbreviations, and 0.47 for synonyms.
Conclusions: Thisstudydemonstratesthatensemblesofsemanticspacescanyieldimprovedperformanceonthe tasks of automatically extracting synonyms and abbreviation-expansion pairs. This notion, which merits further exploration, allows different distributional models – with different model parameters – and different types of corpora to be combined, potentially allowing enhanced performance to be obtained on a wide range of natural language processing tasks.
bstractพื้นหลัง: Terminologies ที่บัญชี forvariation ใน languageuse โดยการเชื่อมโยง breviations sandab เหมือนแนวคิดของพวกเขาเกี่ยวข้อง อยู่ enablers สำคัญของข้อมูลคุณภาพสูงสกัดจากแพทย์ เนื่องจากการใช้ภาษาย่อยเฉพาะในโดเมนทางการแพทย์ ก่อสร้างด้วยตนเองทรัพยากรความหมายที่บอกถึงการใช้ภาษาได้ ท้าทาย มักเกิดความครอบคลุมต่ำสุด และค่าใช้จ่าย แม้ว่ารูปแบบของความหมายขึ้นกับ corpora ใหญ่ให้หมายถึงมีศักยภาพสนับสนุนพัฒนาทรัพยากรดังกล่าว ความสามารถในการแยก synonymy จากความสัมพันธ์ความหมายอื่นไม่จำกัด สมัครในโดเมนทางคลินิกเท่านั้นยังเพิ่งได้เริ่มสำรวจ รวมแบบจำลองขึ้น และนำไปใช้กับชนิดต่าง ๆ ของ corpora อาจทำให้ประสิทธิภาพเพิ่มขึ้นงานของการตัดคำเหมือนและคู่ย่อขยายอัตโนมัติResults: Acombinationoftwodistributionalmodels–RandomIndexingandRandomPermutation–employedin conjunction with a single corpus outperforms using either of the models in isolation. Furthermore, combining semantic spaces induced from different types of corpora – a corpus of clinical text and a corpus of medical journal articles – further improves results, outperforming a combination of semantic spaces induced from a single source, as well as a single semantic space induced from the conjoint corpus. A combination strategy that simply sums the cosine similarity scores of candidate terms is generally the most profitable out of the ones explored. Finally, applying simple post-processing filtering rules yields substantial performance gains on the tasks of extracting abbreviation-expansion pairs, but not synonyms. The best results, measured as recall in a list of ten candidate terms, for the three tasks are: 0.39 for abbreviations to long forms, 0.33 for long forms to abbreviations, and 0.47 for synonyms.Conclusions: Thisstudydemonstratesthatensemblesofsemanticspacescanyieldimprovedperformanceonthe tasks of automatically extracting synonyms and abbreviation-expansion pairs. This notion, which merits further exploration, allows different distributional models – with different model parameters – and different types of corpora to be combined, potentially allowing enhanced performance to be obtained on a wide range of natural language processing tasks.
การแปล กรุณารอสักครู่..
