IV. INFORMATION EXTRACTION AND RELATED TECHNOLOGIES Information extraction is among the most important text mining module, in fact, many articles have these two as the same concept, of course, in fact, they are not equal. Purpose is to scan information extraction from text and extracting facts needed. In information extraction, the dictionary to discover the relationship between the facts and their order information extraction, there are many different techniques to obtain dictionary particular field. Reference[13] provides an overview of information extraction technology, lists three basic stages of information extraction. A. Fact Extraction At this stage, the concern is how to find independent facts in the text, so the domain knowledge is very important. According to the text in fact possible pattern recognition system can be built. The fact that the main extraction techniques have pattern matching, lexical analysis, syntactic and semantic structures.
B. Facts Integration The main problem to be solved by the fact that the integration is to prevent mutual explanation. Each one needs to look at the facts independently, and then see if they are mixed together will constitute the expression of meaning. Solve the problem of a sentence repeated explanation is relatively low stage. Relatively high level in terms of it should be integrated, you can use the concept of fusion events. But such treatment is actually for coders is very difficult, will involve many fuzzy recognition problems. C. Knowledge Representation This is the third phase of information extraction, information extraction technology but also an important part. Author’s article only talks about how to fill the template stored in the database so that it can be a problem. V. COMPARISON OF OPEN SOURCE TEXT MINING TOOLS Four representative source text mining tools for detailed analysis in the data format features three modules and user experience. Weka comprehensive algorithm which has been favored by many data mining staff, LingPipe is specifically developed for natural language processing toolkit, LIBSVM is SVM pattern recognition and regression toolkit, ROSTCM major colleges and universities in the face of very wide application of Chinese the support is best. A. Data Format Open source tools usually do as a business tool as data on a variety of formats provide good support, but there will be a certain format restrictions, or even require their own proprietary data formats. When selecting tools, you should first consider whether the data meets or after conversion tool can meet the requirements, while, if the results of the analytical tools but also for subsequent processing, it should also take into account the output format previously used tools are common or can NO is converted to a common format, to support the work of the late. Weka
สกัดข้อมูล IV สกัดและข้อมูลที่เกี่ยวข้องเทคโนโลยีเป็นสำคัญที่สุดข้อความการทำเหมืองโม ในความเป็นจริง บทความมากมีสองเหล่านี้เป็นแนวคิดเดียวกัน แน่นอน ในความเป็นจริง พวกเขาจะไม่เท่ากัน วัตถุประสงค์คือการส แกนข้อมูลแยกจากข้อความและขยายข้อเท็จจริงที่จำเป็น ในการสกัดข้อมูล พจนานุกรมให้ค้นพบความสัมพันธ์ระหว่างข้อเท็จจริงและสกัดข้อมูลผู้สั่ง มีเทคนิคต่าง ๆ มากมายรับพจนานุกรมเฉพาะฟิลด์ [13] อ้างอิงถึงภาพรวมของเทคโนโลยีการบีบอัดข้อมูล แสดงสามขั้นตอนพื้นฐานของการแยกข้อมูล A. ความจริงสกัดในขั้นตอนนี้ ความกังวลเป็นวิธีการหาข้อเท็จจริงที่เป็นอิสระในข้อ ดังนั้นความรู้โดเมนมีความสำคัญมาก ตามข้อความ ในความเป็นจริงระบบการรู้จำรูปแบบได้สามารถสร้างขึ้น ความจริงที่ว่า เทคนิคการสกัดหลักมีรูปแบบตรง การวิเคราะห์เกี่ยวกับคำศัพท์ โครงสร้างทางไวยากรณ์ และความหมาย B. Facts Integration The main problem to be solved by the fact that the integration is to prevent mutual explanation. Each one needs to look at the facts independently, and then see if they are mixed together will constitute the expression of meaning. Solve the problem of a sentence repeated explanation is relatively low stage. Relatively high level in terms of it should be integrated, you can use the concept of fusion events. But such treatment is actually for coders is very difficult, will involve many fuzzy recognition problems. C. Knowledge Representation This is the third phase of information extraction, information extraction technology but also an important part. Author’s article only talks about how to fill the template stored in the database so that it can be a problem. V. COMPARISON OF OPEN SOURCE TEXT MINING TOOLS Four representative source text mining tools for detailed analysis in the data format features three modules and user experience. Weka comprehensive algorithm which has been favored by many data mining staff, LingPipe is specifically developed for natural language processing toolkit, LIBSVM is SVM pattern recognition and regression toolkit, ROSTCM major colleges and universities in the face of very wide application of Chinese the support is best. A. Data Format Open source tools usually do as a business tool as data on a variety of formats provide good support, but there will be a certain format restrictions, or even require their own proprietary data formats. When selecting tools, you should first consider whether the data meets or after conversion tool can meet the requirements, while, if the results of the analytical tools but also for subsequent processing, it should also take into account the output format previously used tools are common or can NO is converted to a common format, to support the work of the late. Weka
การแปล กรุณารอสักครู่..
