Design and implementation of inform

Design and implementation of information retrieval
system based ontology
Lachtar Nadia
Preparatory School for sciences and techniques
Annaba, Algeria nadia _ ishak2002@yahoo.fr

Abstract-Nowadays, the resources available on the web increases significantly. It then has a large volume of information, but without mastery of content. In this immense data warehouse research of current information retrieval systems do not allow users to obtain results to their requests that meet exactly their needs. This is due in large part to indexing techniques (key words, thesaurus). The result is that the user of the web wasting much of his time to examine a large number of Web page by searching for what he needs, because the Web does not provide service in this direction. The Semantic Web is the solution; this new vision of the web is to make web resources not only understandable by humans but also by machines. To improve the relevance of information retrieval, we propose in this paper an approach based on the use of domain ontology for indexing a collection of documents and the use of semantic links between documents in the collection to allow the inference of all relevant documents. The work involves the implementation of a system based on the use of OWL ontology for research pedagogical documents. In this case, the descriptors are not directly chosen in the documents but in the ontology and are indexed by concepts that reflect their meaning rather than words are often ambiguous. To perform a search based on meaning, documents and their descriptors are stored in OWL ontologies describing the documentary features of a document. The objective is to design two types of OWL ontologies: document ontology reserved for storage of all pedagogical documents and domain ontology reserved for well-structured of documents stored in the level of the document ontology and each document is indexed by its keywords and their synonyms.
Keywords-component; Pedagogical document; Information retrieval; ontology; sematic web; indexation
I. INTRODUCTION
The information retrieval (lR) is an ancient discipline; it dates back to the 50s. His problematic can be seen as the satisfaction of a need for information of user, which is expressed by a query on a collection of documents called the corpus or collection [14, 12] .The information retrieval systems (IRS) allows you to automate the task of IR. The evaluation of such systems appears to be a necessity. This evaluation is based on the concept of relevance. So, to improve the relevance of IR in IRS, several studies have been made at various levels. Thus, there have been proposed several IR models:
The Boolean model, Boolean queries are composed of words and Boolean operators (AND, OR, NOT).
Documentalists have more control over this type of query that is often difficult to formulate for the uninitiated user. This type of query is the most used for access to specialized databases (Pascal), is also available for many search engines on the web such as Google and Yahoo from advanced search interfaces.
The vector model [11], in this model, documents and queries are represented as vectors in the space of words from indexing. The documents are then ordered from their similarity to the query. Several measures (scalar product, Measurement Dice, Jaccard measure, ... ) are used to calculate the similarity between the two calculations corresponding to the distance between the two vectors.

Design and implementation of information retrieval 
system based ontology 
Lachtar Nadia 
Preparatory School for sciences and techniques 
Annaba, Algeria nadia _ ishak2002@yahoo.fr 
 
Abstract-Nowadays, the resources available on the web increases significantly. It then has a large volume of information, but without mastery of content. In this immense data warehouse research of current information retrieval systems do not allow users to obtain results to their requests that meet exactly their needs. This is due in large part to indexing techniques (key words, thesaurus). The result is that the user of the web wasting much of his time to examine a large number of Web page by searching for what he needs, because the Web does not provide service in this direction. The Semantic Web is the solution; this new vision of the web is to make web resources not only understandable by humans but also by machines. To improve the relevance of information retrieval, we propose in this paper an approach based on the use of domain ontology for indexing a collection of documents and the use of semantic links between documents in the collection to allow the inference of all relevant documents. The work involves the implementation of a system based on the use of OWL ontology for research pedagogical documents. In this case, the descriptors are not directly chosen in the documents but in the ontology and are indexed by concepts that reflect their meaning rather than words are often ambiguous. To perform a search based on meaning, documents and their descriptors are stored in OWL ontologies describing the documentary features of a document. The objective is to design two types of OWL ontologies: document ontology reserved for storage of all pedagogical documents and domain ontology reserved for well-structured of documents stored in the level of the document ontology and each document is indexed by its keywords and their synonyms. 
Keywords-component; Pedagogical document; Information retrieval; ontology; sematic web; indexation 
I.  INTRODUCTION 
The information retrieval (lR) is an ancient discipline; it dates back to the 50s. His problematic can be seen as the satisfaction of a need for information of user, which is expressed by a query on a collection of documents called the corpus or collection [14, 12] .The information retrieval systems (IRS) allows you to automate the task of IR. The evaluation of such systems appears to be a necessity. This evaluation is based on the concept of relevance. So, to improve the relevance of IR in IRS, several studies have been made at various levels. Thus, there have been proposed several IR models: 
The Boolean model, Boolean queries are composed of words and Boolean operators (AND, OR, NOT). 
Documentalists have more control over this type of query that is often difficult to formulate for the uninitiated user. This type of query is the most used for access to specialized databases (Pascal), is also available for many search engines on the web such as Google and Yahoo from advanced search interfaces. 
The vector model [11], in this model, documents and queries are represented as vectors in the space of words from indexing. The documents are then ordered from their similarity to the query. Several measures (scalar product, Measurement Dice, Jaccard measure, ... ) are used to calculate the similarity between the two calculations corresponding to the distance between the two vectors.

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

ออกแบบและการดำเนินงานของการเรียกข้อมูล ระบบภววิทยา นาเดีย Lachtar โรงเรียนเตรียมวิทยาศาสตร์และเทคนิค แรน Annaba แอลจีเรียนาเดีย_ ishak2002@yahoo.fr นามธรรมปัจจุบัน ทรัพยากรบนเว็บเพิ่มมากขึ้น ไดรฟ์ข้อมูลขนาดใหญ่ ของข้อมูล แต่ไม่ มีเนื้อหาในวิชานั้นได้ ในงานวิจัยนี้สินค้าข้อมูลอันยิ่งใหญ่ของการเรียกข้อมูลปัจจุบัน ระบบไม่อนุญาตให้ผู้ใช้ได้รับผลการร้องขอที่ตอบสนองตรงความต้องการ นี่คือเนื่องจากส่วนใหญ่มีการทำดัชนีเทคนิค (คำสำคัญ อรรถาภิธาน) ผลลัพธ์คือ ว่า ผู้ใช้ของเว็บเสียมากเวลาของเขาเพื่อตรวจสอบจำนวนมากของเว็บเพจ โดยการค้นหาสิ่งที่เขาต้องการ เนื่องจากเว็บไม่ได้ให้บริการในทิศทางนี้ เว็บเชิงความหมายเป็นคำตอบ วิสัยทัศน์นี้ใหม่ของเว็บคือการ ทำให้ทรัพยากรบนเว็บไม่เพียงเข้าใจมนุษย์แต่ยัง โดยเครื่อง เพื่อปรับปรุงความเกี่ยวข้องของการเรียกข้อมูล เรานำเสนอในเอกสารนี้วิธีการที่อิงการใช้ภววิทยาโดเมนสำหรับการทำดัชนีเอกสารต่าง ๆ และการใช้ตรรกการเชื่อมโยงระหว่างเอกสารในคอลเลกชันเพื่อให้อ้างอิงเอกสารทั้งหมดที่เกี่ยวข้อง งานเกี่ยวข้องกับการใช้งานของระบบที่อิงการใช้ภววิทยานกฮูกสำหรับเอกสารสอนวิจัย ในกรณีนี้ ตัวบอกลักษณะไม่ตรงเลือก ในเอกสาร แต่ ในภววิทยา และมีการทำดัชนี โดยแนวคิดที่สะท้อนความหมายของพวกเขามากกว่าคำมักไม่ชัดเจน เพื่อทำการค้นหาตามความหมาย เอกสารและตัวแสดงรายละเอียดของพวกเขาถูกเก็บไว้ใน ontologies นกฮูกที่อธิบายลักษณะการทำงานสารคดีของเอกสาร วัตถุประสงค์คือการ ออกแบบสองชนิดคือนกฮูก ontologies: เอกสารภววิทยาที่สงวนไว้สำหรับเก็บเอกสารทั้งหมดสอนและภววิทยาโดเมนสำหรับดีโครงสร้างของเอกสารที่จัดเก็บในระดับของภววิทยาเอกสาร และเอกสารการจัดทำดัชนีของคำสำคัญและคำพ้องความหมายของพวกเขา ส่วนประกอบสำคัญ เอกสารสอน การเรียกข้อมูล ภววิทยา sematic web การจัดทำดัชนี I. บทนำ วินัยการโบราณ เป็นการเรียกข้อมูล (lR) มันย้อนกลับไปยุค 50 เขามีปัญหาคุณจะได้เป็นของจำเป็นต้องมีข้อมูลของผู้ใช้ แสดง โดยแบบสอบถามในการเก็บรวมรวมเอกสารที่เรียกว่า corpus หรือคอลเลกชัน [14, 12] ระบบการเรียกข้อมูล (IRS) ช่วยให้คุณสามารถทำให้งานของ IR. การประเมินผลของระบบดังกล่าวดูเหมือนจะ เป็นสิ่งจำเป็น การประเมินนี้เป็นไปตามแนวคิดของความเกี่ยวข้อง ดังนั้น เพื่อปรับปรุงความเกี่ยวข้องของ IR ใน IRS ได้ทำการศึกษาหลายระดับต่าง ๆ ดังนั้น มีได้รับการเสนอหลายรุ่น IR: แบบบูลีน บูลีนแบบสอบถามประกอบด้วยคำและโอเปอเรเตอร์บูลีน (AND, OR ไม่) Documentalists มีชนิดของแบบสอบถามที่มักจะยากที่จะกำหนดสำหรับผู้ใช้ uninitiated ควบคุมเพิ่มเติม แบบสอบถามชนิดนี้จะใช้มากที่สุดสำหรับการเข้าถึงฐานข้อมูลเฉพาะ (Pascal) มีหลายเครื่องมือค้นหาบนเว็บเช่น Google และ Yahoo จากอินเทอร์เฟซการค้นหาขั้นสูง รูปแบบเวกเตอร์ [11], ในรุ่นนี้ เอกสาร และแบบสอบถามจะแสดงเป็นเวกเตอร์ในช่องว่างของคำจากการทำดัชนี เอกสารแล้วสั่งจากความคล้ายคลึงกันของพวกเขาในแบบสอบถาม มาตรการหลายอย่าง (สเกลาร์ วัดลูกเต๋า Jaccard คูณ,...) จะใช้เพื่อคำนวณความคล้ายคลึงกันระหว่างการคำนวณสองที่สอดคล้องกับระยะทางระหว่างสองเวกเตอร์

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

การออกแบบและการดำเนินงานของการดึงข้อมูล
ระบบอภิปรัชญาตาม
Lachtar นาเดีย
โรงเรียนเตรียมอุดมศึกษาวิทยาศาสตร์และเทคนิค
Annaba, แอลจีเรีย Nadia _ ishak2002@yahoo.fr

นามธรรมปัจจุบันทรัพยากรที่มีอยู่บนเว็บเพิ่มขึ้นอย่างมีนัยสำคัญ จากนั้นก็มีปริมาณมากของข้อมูล แต่ไม่มีการเรียนรู้ของเนื้อหา ในการวิจัยครั้งนี้คลังข้อมูลอันยิ่งใหญ่ของระบบการดึงข้อมูลที่เป็นปัจจุบันไม่อนุญาตให้ผู้ที่จะได้รับผลการร้องขอของพวกเขาที่ตอบสนองความต้องการของพวกเขาว่า เพราะนี่คือในส่วนใหญ่จะใช้เทคนิคการจัดทำดัชนี (คำสำคัญพจนานุกรม) ผลที่ได้คือการใช้งานของเว็บการสูญเสียมากเวลาเขาในการตรวจสอบเป็นจำนวนมากของหน้าเว็บโดยการค้นหาสิ่งที่เขาต้องการเพราะเว็บไม่ได้ให้บริการในทิศทางนี้ หมายเว็บเป็นวิธีการแก้; นี้วิสัยทัศน์ใหม่ของเว็บคือการทำให้ทรัพยากรบนเว็บไม่เพียง แต่เข้าใจได้โดยมนุษย์ แต่ยังโดยเครื่อง เพื่อปรับปรุงความเกี่ยวข้องของการดึงข้อมูลที่เรานำเสนอในบทความนี้วิธีการขึ้นอยู่กับการใช้งานของอภิปรัชญาโดเมนสำหรับการทำดัชนีคอลเลกชันของเอกสารและการใช้ความหมายของการเชื่อมโยงระหว่างเอกสารในคอลเลกชันที่จะช่วยให้การอนุมานของเอกสารที่เกี่ยวข้องทั้งหมด การทำงานที่เกี่ยวข้องกับการดำเนินงานของระบบที่อยู่บนพื้นฐานของการใช้งานของนกฮูกอภิปรัชญาสำหรับเอกสารการเรียนการสอนการวิจัย ในกรณีนี้อธิบายไม่ได้รับการแต่งตั้งโดยตรงในเอกสาร แต่ในอภิปรัชญาและมีการจัดทำดัชนีโดยแนวคิดที่สะท้อนให้เห็นถึงความหมายของพวกเขามากกว่าคำพูดมักจะคลุมเครือ เพื่อดำเนินการค้นหาเป็นไปตามความหมายเอกสารและอธิบายของพวกเขาจะถูกเก็บไว้ในจีส์นกฮูกอธิบายคุณลักษณะสารคดีของเอกสาร โดยมีวัตถุประสงค์คือการออกแบบทั้งสองประเภทของออนโทโลนกฮูก: อภิปรัชญาเอกสารสงวนไว้สำหรับการจัดเก็บเอกสารการเรียนการสอนและอภิปรัชญาโดเมนสงวนไว้สำหรับทั้งโครงสร้างของเอกสารที่เก็บไว้ในระดับของอภิปรัชญาเอกสารและเอกสารที่แต่ละคนจะได้จัดทำดัชนีโดยคำหลักและคำพ้องความหมายของพวกเขา
คำสำคัญองค์ประกอบ; เอกสารการสอน; การดึงข้อมูล; อภิปรัชญา; เว็บ sematic; indexation
I. บทนำ
ข้อมูลที่ดึง (LR) เป็นวินัยโบราณ มันย้อนกลับไปในยุค 50 ปัญหาของเขาสามารถมองเห็นเป็นความพึงพอใจของความต้องการสำหรับข้อมูลของผู้ใช้ซึ่งจะแสดงโดยแบบสอบถามในการเก็บรวบรวมเอกสารที่เรียกว่าคอร์ปัสหรือการจัดเก็บ [14 12] ได้โดยเริ่มต้นระบบการดึงข้อมูล (IRS) ช่วยให้คุณได้โดยอัตโนมัติที่ งานของ Ir การประเมินผลของระบบดังกล่าวดูเหมือนจะเป็นสิ่งที่จำเป็น การประเมินผลนี้จะขึ้นอยู่กับแนวคิดของความเกี่ยวข้อง ดังนั้นเพื่อปรับปรุงความเกี่ยวข้องของ IR ในกรมสรรพากร, การศึกษาหลายแห่งได้รับการทำในระดับต่างๆ ดังนั้นจึงมีได้รับการเสนอรูปแบบ IR หลาย
. รูปแบบบูลีนแบบสอบถามแบบบูลที่มีองค์ประกอบของคำและผู้ประกอบการบูลีน (AND, OR, NOT)
Documentalists มีการควบคุมที่มากกว่าชนิดของแบบสอบถามที่มักจะเป็นเรื่องยากที่จะกำหนดสำหรับผู้ใช้มือใหม่ . ชนิดของแบบสอบถามนี้จะใช้มากที่สุดสำหรับการเข้าถึงฐานข้อมูลผู้เชี่ยวชาญ (ปาสคาล) ยังสามารถใช้ได้กับเครื่องมือค้นหาจำนวนมากบนเว็บเช่น Google และ Yahoo จากอินเตอร์เฟซการค้นหาขั้นสูง.
รูปแบบเวกเตอร์ [11] ในรูปแบบนี้เอกสารและ คำสั่งจะแสดงเป็นพาหะในพื้นที่ของคำจากการจัดทำดัชนี เอกสารที่มีการสั่งซื้อแล้วจากความคล้ายคลึงกันของพวกเขาในแบบสอบถาม หลายมาตรการ (คูณวัดลูกเต๋าวัด Jaccard, ... ) ถูกนำมาใช้ในการคำนวณความคล้ายคลึงกันระหว่างทั้งสองที่สอดคล้องกับการคำนวณระยะห่างระหว่างสองเวกเตอร์ที่

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.