5. Other research topics in CLIR5.1

5. Other research topics in CLIR
5.1. Pivot language approach
So many languages are spoken in the world, and it is not always possible to obtain the bilingual resources
we need for a particular pair of languages. A promising technique to circumvent the problem of
limited availability of linguistic resources would be the pivot language approach, in which an intermediate
language acts as a mediator between two languages for which no bilingual resource is available. Suppose
that a CLIR task between Japanese and Dutch is requested by a user. In this case, machine-readable
resources of Japanese–Dutch pairs may be unavailable, and it would be easier to find Japanese–English and
Dutch–English resources since English is such a widely used language. Thus CLIR between Japanese
and Dutch can be performed via English (as an intermediary) without direct bilingual resources of Japanese
and Dutch.
The pivot language approach may also alleviate the problem of explosive combinations of languages,
i.e., if we have to perform CLIR between each pair of n languages, O(n2) resources are needed. However,
the pivot language approach enables us to handle the complex tasks with only O(n) resources (Gey, 2001).
A basic way of using the pivot language approach would be a transitive translation of a query using two
bilingual dictionaries (Ballesteros, 2000). In the case of search from Japanese to Dutch via English, if Japanese–
English and English–Dutch dictionaries are available, CLIR can be performed by replacing Japanese
query terms with the corresponding English equivalents and successively substituting the English equivalents
with the Dutch equivalents. Of course, if Japanese–English and English–Dutch MT systems can be
used, a similar transitive translation is also feasible.
With the case of dictionary-based transitive translation, translation ambiguity can become a more serious
problem. It is possible that resulting translations become doubly ambiguous if each replacement stage
yields ambiguity: (1) from the source language to the intermediate language and (2) from the intermediate
language to the target language. Suppose, for example, that a Japanese source query consists of four words,
and every word has four English equivalents. In addition, if every English equivalent has four Dutch equivalents,
simple replacements are going to produce 64 (=43) search terms in total from only 4 source terms,
which would inevitably contain some irrelevant translations. To solve this problem, Ballesteros (2000) attempted
to apply the disambiguation methods mentioned above (co-occurrence-based method, query
expansion, etc.) to transitive translation and attained a substantial improvement in search performance.
Gollins and Sanderson (2001) also proposed a technique called ‘‘lexical triangulation’’ to alleviate the translation
ambiguity problem in which two pivot languages are used independently and removal of erroneous
translations is attempted by taking only translations in common from two ways of transitive translation
using two pivot languages.
The pivot language approach has been utilized in TREC, NTCIR, and CLEF due to unavailability of
bilingual resources. For example, the following transitive combinations of languages have been explored:
• English>French>German (Franz et al., 1999)
• French>English>German, etc. (Gey, Jiang, Chen, & Larson, 1999)
K. Kishida / Information Processing and Management 41 (2005) 433–455 445
• German>English>Italian (Hiemstra & Kraaij, 1999)
• Japanese>English>Chinese (Lin & Chen, 2003)
• Chinese>English>Japanese (Chen & Gey, 2003)
In particular, Franz et al. (1999) proposed some interesting techniques for searching German documents
with English queries:
(1) Convolution of translation probability: Estimating translation probability from an English term e to a
German term g through French terms f such that
PðgjeÞ ¼
X
f
Pðgjf ÞPðf jeÞ:
(2) Automatic query generation from the intermediate language corpus: Generating French queries
automatically by simply merging all non-stopwords in the top-ranked French documents searched
by the English–French CLIR system, and putting the French query into the French–German CLIR
system.
5.2. Merging strategy for multilingual information retrieval
Suppose that we have a multilingual document collection in which two or more languages are mixed (not
a parallel corpus), and a user wishes to search the collection for a query expressed in a single language. This
task is more complicated than simple bilingual CLIR. In CLEF and NTCIR, multilingual CLIR has been
adopted as a research task, and many research groups have worked on the issue.
Basically, there are two approaches for multilingual IR (Lin & Chen, 2003):
• Distributed architecture in which the document collection is separated by language, and each part is
indexed and retrieved independently.
• Centralized architecture in which the document collection in various languages is viewed as a single document
collection and is indexed in one huge index file.
In distributed architectures, a standard bilingual search is repeatedly performed for each separate
language sub-collection respectively, and several ranked document lists are generated by each run.
Then the problem becomes how to merge the results of each run into a single ranked list so that all relevant
documents in any language are successfully ranked. Essentially, the merging strategy is a general research
issue of IR when searching distributed resources (i.e., distributed IR), in which it is inevitably necessary to
merge ranked lists obtained from each resource. In CLIR, the following merging strategies have been
investigated:
• Raw score: straightforwardly using document scores estimated in each run.
• Round robin: interleaving each document list in a round robin fashion by assuming that distribution of
relevant documents is identical among the lists.
• Normalized score: normalizing document scores by each run in order to remove effects of collectiondependent
statistics on estimation of the scores.
• Rank-based score: mathematically converting ranks in each run into scores by assuming a relationship
between the rank and probability of relevance.
• Modified score: modifying raw scores in each run so as to reduce effects of collection-size dependency,
translation ambiguity, etc.
446 K. Kishida / Information Processing and Management 41 (2005) 433–455
If the retrieval model employed for each run can estimate relevance probability of each document correctly,
it would be reasonable to re-rank all documents together according to values of the probability (i.e.,
raw scores). For example, Chen and Gey (2003) simply merged the results from Chinese, Japanese and English
collections according to values of probability of relevance estimated by the logistic regression model.
However, in most cases, it would be difficult to consider each document score to be a pure probability of
relevance even if a probabilistic retrieval model was actually used. In this case, if we can assume that relevant
documents are distributed in the same way in every separate language sub-collection, a simple strategy
is round robin-based merging, in which only the rank of each document is taken into account.
Otherwise, an alternative method is to use normalized document scores such that
v ¼ ðv vminÞ=ðvmax vminÞ;
where v is a raw score, and vmin and vmax are the minimum and maximum in each run respectively
(Powell, French, Callan, Connell, & Viles, 2000). Savoy (2002) has empirically compared search performance
among the four strategies of round robin, raw score, normalized score and the CORI approach (see
Callan et al., 1995 for details) using the CLEF test collection and reported that normalized score is dominant
among them. Similarly, Moulinier and Molina-Salgado (2002) tried to conduct comparisons among
round robin, raw score, CORI, normalized score and collection-weighted normalized score (a variation of
normalized score), and reported that collection-weighted normalized score showed higher mean average
precision.
Other techniques for estimating optimal scores for merging ranked lists have been proposed. Franz et al.
(2000) empirically found a linear relationship between log of rank and precision at the rank and used scores
that are converted according to the relationship for merging results from each run. Similarly, the strategy of
rank-based scoring was investigated in Kraaij et al. (2000). Hiemstra et al. (2001) also examined the effectiveness
of modifying raw scores so as to remove effects of collection-size dependency in the process of estimating
raw scores. Meanwhile, Lin and Chen (2003) proposed a method of modifying raw scores based on
the degree of ambiguity when each source query was translated, according to an assumption that a good
translation may give much more relevant documents. Savoy (2003a) tested a logistic regression formula
for predicting a relevance probability of a document given a rank and a score of the document.
On the other hand, for the centralized architecture, the set of multilingual documents is not divided into
sub-collections for each language. In order to search such a heterogeneous collection, we need either
(1) to translate the source query into all languages included in the document collection and to merge all
translations into a single query, or
(2) to translate the documents into a single language used in the query.
Gey et al. (1999), Chen (2002) and Nie and Jin (2003) employed the first method for searching the CLEF
test collection. With this method, it may be necessary to adjust idf factors because documents in a language
having fewer documents may take advantage of weighting by document frequency (Lin & Chen, 2003).
5.3. Combination of some language resources
Needless to say, quality and coverage of language resources for translation significantly affect search performance
of CLIR. Specifically, in the case of searches between two unrelated la

5. Other research topics in CLIR
5.1. Pivot language approach
So many languages are spoken in the world, and it is not always possible to obtain the bilingual resources
we need for a particular pair of languages. A promising technique to circumvent the problem of
limited availability of linguistic resources would be the pivot language approach, in which an intermediate
language acts as a mediator between two languages for which no bilingual resource is available. Suppose
that a CLIR task between Japanese and Dutch is requested by a user. In this case, machine-readable
resources of Japanese–Dutch pairs may be unavailable, and it would be easier to find Japanese–English and
Dutch–English resources since English is such a widely used language. Thus CLIR between Japanese
and Dutch can be performed via English (as an intermediary) without direct bilingual resources of Japanese
and Dutch.
The pivot language approach may also alleviate the problem of explosive combinations of languages,
i.e., if we have to perform CLIR between each pair of n languages, O(n2) resources are needed. However,
the pivot language approach enables us to handle the complex tasks with only O(n) resources (Gey, 2001).
A basic way of using the pivot language approach would be a transitive translation of a query using two
bilingual dictionaries (Ballesteros, 2000). In the case of search from Japanese to Dutch via English, if Japanese–
English and English–Dutch dictionaries are available, CLIR can be performed by replacing Japanese
query terms with the corresponding English equivalents and successively substituting the English equivalents
with the Dutch equivalents. Of course, if Japanese–English and English–Dutch MT systems can be
used, a similar transitive translation is also feasible.
With the case of dictionary-based transitive translation, translation ambiguity can become a more serious
problem. It is possible that resulting translations become doubly ambiguous if each replacement stage
yields ambiguity: (1) from the source language to the intermediate language and (2) from the intermediate
language to the target language. Suppose, for example, that a Japanese source query consists of four words,
and every word has four English equivalents. In addition, if every English equivalent has four Dutch equivalents,
simple replacements are going to produce 64 (=43) search terms in total from only 4 source terms,
which would inevitably contain some irrelevant translations. To solve this problem, Ballesteros (2000) attempted
to apply the disambiguation methods mentioned above (co-occurrence-based method, query
expansion, etc.) to transitive translation and attained a substantial improvement in search performance.
Gollins and Sanderson (2001) also proposed a technique called ‘‘lexical triangulation’’ to alleviate the translation
ambiguity problem in which two pivot languages are used independently and removal of erroneous
translations is attempted by taking only translations in common from two ways of transitive translation
using two pivot languages.
The pivot language approach has been utilized in TREC, NTCIR, and CLEF due to unavailability of
bilingual resources. For example, the following transitive combinations of languages have been explored:
• English>French>German (Franz et al., 1999)
• French>English>German, etc. (Gey, Jiang, Chen, & Larson, 1999)
K. Kishida / Information Processing and Management 41 (2005) 433–455 445
• German>English>Italian (Hiemstra & Kraaij, 1999)
• Japanese>English>Chinese (Lin & Chen, 2003)
• Chinese>English>Japanese (Chen & Gey, 2003)
In particular, Franz et al. (1999) proposed some interesting techniques for searching German documents
with English queries:
(1) Convolution of translation probability: Estimating translation probability from an English term e to a
German term g through French terms f such that
PðgjeÞ ¼
X
f
Pðgjf ÞPðf jeÞ:
(2) Automatic query generation from the intermediate language corpus: Generating French queries
automatically by simply merging all non-stopwords in the top-ranked French documents searched
by the English–French CLIR system, and putting the French query into the French–German CLIR
system.
5.2. Merging strategy for multilingual information retrieval
Suppose that we have a multilingual document collection in which two or more languages are mixed (not
a parallel corpus), and a user wishes to search the collection for a query expressed in a single language. This
task is more complicated than simple bilingual CLIR. In CLEF and NTCIR, multilingual CLIR has been
adopted as a research task, and many research groups have worked on the issue.
Basically, there are two approaches for multilingual IR (Lin & Chen, 2003):
• Distributed architecture in which the document collection is separated by language, and each part is
indexed and retrieved independently.
• Centralized architecture in which the document collection in various languages is viewed as a single document
collection and is indexed in one huge index file.
In distributed architectures, a standard bilingual search is repeatedly performed for each separate
language sub-collection respectively, and several ranked document lists are generated by each run.
Then the problem becomes how to merge the results of each run into a single ranked list so that all relevant
documents in any language are successfully ranked. Essentially, the merging strategy is a general research
issue of IR when searching distributed resources (i.e., distributed IR), in which it is inevitably necessary to
merge ranked lists obtained from each resource. In CLIR, the following merging strategies have been
investigated:
• Raw score: straightforwardly using document scores estimated in each run.
• Round robin: interleaving each document list in a round robin fashion by assuming that distribution of
relevant documents is identical among the lists.
• Normalized score: normalizing document scores by each run in order to remove effects of collectiondependent
statistics on estimation of the scores.
• Rank-based score: mathematically converting ranks in each run into scores by assuming a relationship
between the rank and probability of relevance.
• Modified score: modifying raw scores in each run so as to reduce effects of collection-size dependency,
translation ambiguity, etc.
446 K. Kishida / Information Processing and Management 41 (2005) 433–455
If the retrieval model employed for each run can estimate relevance probability of each document correctly,
it would be reasonable to re-rank all documents together according to values of the probability (i.e.,
raw scores). For example, Chen and Gey (2003) simply merged the results from Chinese, Japanese and English
collections according to values of probability of relevance estimated by the logistic regression model.
However, in most cases, it would be difficult to consider each document score to be a pure probability of
relevance even if a probabilistic retrieval model was actually used. In this case, if we can assume that relevant
documents are distributed in the same way in every separate language sub-collection, a simple strategy
is round robin-based merging, in which only the rank of each document is taken into account.
Otherwise, an alternative method is to use normalized document scores such that
v ¼ ðv  vminÞ=ðvmax  vminÞ;
where v is a raw score, and vmin and vmax are the minimum and maximum in each run respectively
(Powell, French, Callan, Connell, & Viles, 2000). Savoy (2002) has empirically compared search performance
among the four strategies of round robin, raw score, normalized score and the CORI approach (see
Callan et al., 1995 for details) using the CLEF test collection and reported that normalized score is dominant
among them. Similarly, Moulinier and Molina-Salgado (2002) tried to conduct comparisons among
round robin, raw score, CORI, normalized score and collection-weighted normalized score (a variation of
normalized score), and reported that collection-weighted normalized score showed higher mean average
precision.
Other techniques for estimating optimal scores for merging ranked lists have been proposed. Franz et al.
(2000) empirically found a linear relationship between log of rank and precision at the rank and used scores
that are converted according to the relationship for merging results from each run. Similarly, the strategy of
rank-based scoring was investigated in Kraaij et al. (2000). Hiemstra et al. (2001) also examined the effectiveness
of modifying raw scores so as to remove effects of collection-size dependency in the process of estimating
raw scores. Meanwhile, Lin and Chen (2003) proposed a method of modifying raw scores based on
the degree of ambiguity when each source query was translated, according to an assumption that a good
translation may give much more relevant documents. Savoy (2003a) tested a logistic regression formula
for predicting a relevance probability of a document given a rank and a score of the document.
On the other hand, for the centralized architecture, the set of multilingual documents is not divided into
sub-collections for each language. In order to search such a heterogeneous collection, we need either
(1) to translate the source query into all languages included in the document collection and to merge all
translations into a single query, or
(2) to translate the documents into a single language used in the query.
Gey et al. (1999), Chen (2002) and Nie and Jin (2003) employed the first method for searching the CLEF
test collection. With this method, it may be necessary to adjust idf factors because documents in a language
having fewer documents may take advantage of weighting by document frequency (Lin & Chen, 2003).
5.3. Combination of some language resources
Needless to say, quality and coverage of language resources for translation significantly affect search performance
of CLIR. Specifically, in the case of searches between two unrelated la

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

หัวข้อการวิจัยอื่น ๆ ใน CLIR5.1. วิธีภาษาสาระสำคัญมีพูดหลายภาษาในโลก และไม่เสมอสามารถรับทรัพยากรสองภาษาเราต้องการคู่เฉพาะภาษา เทคนิคสัญญาเพื่อหลีกเลี่ยงปัญหามีอยู่จำกัดของทรัพยากรภาษาศาสตร์จะสาระสำคัญภาษาวิธี ที่กลางภาษาทำหน้าที่เป็นตัวกลางระหว่างสองภาษาซึ่งทรัพยากรสองภาษาไม่มี สมมติว่าที่งาน CLIR ระหว่างญี่ปุ่นและดัตช์จะร้องขอ โดยผู้ใช้ ในกรณีนี้ machine-readableทรัพยากรของญี่ปุ่นดัตช์คู่อาจจะไม่พร้อมใช้งาน และมันจะง่ายต่อการค้นหาภาษาญี่ปุ่น – ภาษาอังกฤษ และทรัพยากรดัตช์ – ภาษาอังกฤษเนื่องจากภาษาอังกฤษ เช่นภาษาที่ใช้กันอย่างแพร่หลาย ดังนั้น CLIR ระหว่างญี่ปุ่นและดัตช์สามารถกระทำผ่านภาษาอังกฤษ (เป็นกลาง) ไม่ มีทรัพยากรสองภาษาโดยตรงของญี่ปุ่นและดัตช์วิธีภาษาสาระสำคัญยังอาจบรรเทาปัญหาของผสมระเบิดของภาษาเช่น หากเรามีการ CLIR ระหว่างแต่ละคู่ของภาษา n, O(n2) ทรัพยากรจำเป็น อย่างไรก็ตามวิธีภาษาสาระสำคัญช่วยให้เราสามารถจัดการงานที่ซับซ้อน มีเพียง O(n) ทรัพยากร (Gey, 2001)วิธีพื้นฐานของการใช้วิธีภาษาสาระสำคัญจะเป็นการแปลแบบสอบถามที่ใช้สองสกรรมกริยาพจนานุกรมสองภาษา (Ballesteros, 2000) ในกรณีที่ค้นหาจากญี่ปุ่นดัตช์ผ่านอังกฤษ ถ้าญี่ปุ่น –พจนานุกรมภาษาอังกฤษและอังกฤษดัตช์มี CLIR สามารถกระทำได้ โดยการแทนที่ญี่ปุ่นสอบถามเงื่อนไขกับเทียบเท่าภาษาอังกฤษที่เกี่ยวข้องและติด ๆ กันแทนเทียบเท่าภาษาอังกฤษมีรายการเทียบเท่าภาษาดัตช์ สามารถเป็นระบบของหลักสูตร ถ้าญี่ปุ่นอังกฤษและอังกฤษดัตช์ MTใช้ การแปลสกรรมกริยาเหมือนจะยังเป็นไปกับกรณีตามพจนานุกรมแปลสกรรมกริยา ย่อแปลสามารถเป็น รุนแรงมากขึ้นปัญหา เป็นไปได้ว่า แปลได้กลายเป็น เปลี่ยนสองเหตุการณ์ชัดเจนหากแต่ละขั้นตอนก่อให้เกิดความคลุมเครือ: (1) จากภาษาต้นฉบับเป็นภาษากลาง และ (2) ในระดับปานกลางภาษาภาษาเป้าหมาย สมมติว่า เช่น ว่า แบบสอบถามต้นฉบับภาษาญี่ปุ่นประกอบด้วยคำ 4และทุกคำมีราคาเทียบเท่าภาษาอังกฤษ 4 นอกจากนี้ ถ้าทุกที่เทียบเท่ากับภาษาอังกฤษได้เทียบเท่าสี่ดัตช์แทนเรื่องกำลังการผลิต 64 (= 43) เงื่อนไขการค้นหารวมจากเงื่อนไขต้นเพียง 4ซึ่งย่อมจะประกอบด้วยความเกี่ยวข้องแปล การแก้ปัญหา Ballesteros (2000) พยายามการใช้วิธีการแก้ความกำกวมดังกล่าวข้างต้น (บริษัท-occurrence ตามวิธี แบบสอบถามขยายตัว ฯลฯ) การแปลสกรรมกริยา และได้พบการปรับปรุงประสิทธิภาพการค้นหาGollins Sanderson (2001) นอกจากนี้การนำเสนอและเทคนิคที่เรียกว่า ''เกี่ยวกับคำศัพท์สามนิ้วเพื่อบรรเทาการแปลปัญหาความคลุมเครือในสาระสำคัญที่สองภาษาใช้ได้อย่างอิสระและกำจัดข้อผิดพลาดดำเนินการแปล โดยการแปลกันจากสองวิธีแปลสกรรมกริยาการใช้ภาษาสาระสำคัญ 2มีการใช้วิธีภาษาสาระสำคัญใน TREC, NTCIR และ CLEF เนื่องจากไม่พร้อมใช้งานของทรัพยากรสองภาษา ตัวอย่าง ชุดสกรรมกริยาต่อไปนี้ภาษามีการสำรวจ:•อังกฤษ > ฝรั่งเศส > เยอรมัน (Franz et al., 1999)•ฝรั่งเศส > ภาษาอังกฤษ > เยอรมัน ฯลฯ (Gey เจียง เฉิน และ Larson, 1999)คุณ Kishida / ประมวลผลข้อมูลและการจัดการ 41 (2005) 433-455 445•เยอรมัน > ภาษาอังกฤษ > อิตาลี (Hiemstra & Kraaij, 1999)•ญี่ปุ่น > ภาษาอังกฤษ > ภาษาจีน (หลินและเฉิน 2003)•จีน > ภาษาอังกฤษ > ญี่ปุ่น (เฉินและ Gey, 2003)โดยเฉพาะอย่างยิ่ง Franz et al. (1999) เสนอเทคนิคบางอย่างที่น่าสนใจสำหรับการค้นหาเอกสารภาษาเยอรมันด้วยแบบสอบถามภาษาอังกฤษ:(1) convolution ความน่าเป็นการแปล: ประเมินความน่าเป็นการแปลจาก e เป็นคำภาษาอังกฤษเพื่อการG ระยะเยอรมันผ่าน f ฝรั่งเศสเงื่อนไขเช่นว่าPðgjeÞ ¼XfPðgjf ÞPðf jeÞ:(2) สร้างแบบสอบถามโดยอัตโนมัติจากคอร์พัสคริภาษากลาง: สร้างแบบสอบถามที่ฝรั่งเศสโดยอัตโนมัติ โดยรวมทั้งหมดไม่ใช่-stopwords ในเอกสารฝรั่งเศสอันดับเพียงค้นหาโดยระบบ CLIR อังกฤษฝรั่งเศส และวางแบบสอบถามที่ฝรั่งเศสใน CLIR ฝรั่งเศสเยอรมันระบบ5.2 การผสานกลยุทธ์สำหรับเรียกข้อมูลหลายภาษาสมมติว่า เรามีชุดเอกสารหลายภาษาที่สอง หรือมากกว่าสองภาษาผสม (ไม่การขนานคอร์พัสคริ), และผู้ใช้ต้องการค้นหาคอลเลกชันสำหรับแบบสอบถามที่แสดงในภาษาเดียวกัน นี้งานมีความซับซ้อนมากขึ้นกว่า CLIR อย่างสองภาษา CLEF และ NTCIR, CLIR พูดได้หลายภาษาได้หมายถึงเป็นงานวิจัย และหลายงานวิจัยได้ทำงานเกี่ยวกับเรื่องทั่วไป มีสองวิธีสำหรับ IR หลายภาษา (Lin และเฉิน 2003):•กระจายสถาปัตยกรรมที่คั่นชุดเอกสาร ด้วยภาษา และแต่ละส่วนเป็นดัชนี และเรียกอย่างอิสระส่วนกลาง•สถาปัตยกรรมที่ดูคอลเลกชันเอกสารในภาษาต่าง ๆ เป็นเอกสารเดียวคอลเลกชัน และดัชนีในแฟ้มดัชนีขนาดใหญ่ในสถาปัตยกรรมแบบกระจาย ค้นหาสองมาตรฐานซ้ำทำการแยกแต่ละชุดภาษาย่อยตามลำดับ และรับเอกสารหลายรายการจะถูกสร้างขึ้น โดยรันแต่ละแล้ววิธีการผสานผลการรันแต่ละรายการอันดับเดียวกลายเป็นปัญหาที่เกี่ยวข้องทั้งหมดเอกสารในภาษาเรียบร้อยแล้วได้จัดอันดับ หลัก กลยุทธ์ผสานเป็นวิจัยทั่วไปปัญหาของ IR เมื่อค้นกระจายทรัพยากร (เช่น กระจาย IR), ซึ่งย่อมจำเป็นต้องรวมรายรับที่ได้รับจากแต่ละทรัพยากร ใน CLIR กลยุทธ์ผสานต่อไปนี้ได้ตรวจสอบ:•คะแนนดิบ: ดี ๆ ใช้เอกสารคะแนนประเมินแต่ละแผน•เวียนรอบ: แทรกสลับข้อมูลแต่ละรายการเอกสารในการวน โดยสมมติว่าการกระจายของเอกสารที่เกี่ยวข้องกันระหว่างรายการได้•ขีด: normalizing เอกสารคะแนน โดยรันแต่ละเพื่อเอาผลของ collectiondependentสถิติการประเมินของคะแนน•ตามอันดับคะแนน: mathematically ยศในการรันแต่ละแปลงเป็นคะแนน โดยสมมติว่าความสัมพันธ์ระหว่างตำแหน่งและความน่าเป็นความเกี่ยวข้อง•แก้ไขคะแนน: ปรับเปลี่ยนคะแนนดิบในแต่ละรันเพื่อลดผลกระทบของขนาดของคอลเลกชันอ้างอิงแปลย่อ เป็นต้น446 คุณ Kishida / ประมวลผลข้อมูลและการจัดการ 41 (2005) 433-455ถ้าเรียกใช้งานสำหรับแต่ละแบบเรียก สามารถประเมินความน่าเป็นความเกี่ยวข้องของแต่ละเอกสารที่ถูกต้องมันจะเหมาะสมกับการจัดอันดับเอกสารทั้งหมดเข้าด้วยกันตามค่าของความน่าเป็น (เช่นผลคะแนนดิบ) ตัวอย่าง เฉินและ Gey (2003) เพียงผสานผลลัพธ์จากจีน ญี่ปุ่น และภาษาอังกฤษชุดตามค่าของความน่าเป็นของความเกี่ยวข้องในการประเมิน โดยใช้แบบจำลองการถดถอยโลจิสติกอย่างไรก็ตาม ส่วนใหญ่ มันจะยากที่จะพิจารณาคะแนนแต่ละเอกสารจะ มีความบริสุทธิ์ของถ้าใช้แบบเรียก probabilistic จริงแม้เกี่ยวข้อง ในกรณีนี้ ถ้าเราสามารถสมมติที่เกี่ยวข้องมีการแจกจ่ายเอกสารในลักษณะเดียวกันในทุกภาษาที่แยกย่อยคอลเลกชัน กลยุทธ์อย่างเป็นตามวนผสาน ที่เฉพาะลำดับของเอกสารแต่ละฉบับจะนำมาพิจารณาอื่น อีกวิธีคือการ ใช้เอกสารมาตรฐานคะแนนให้v ¼ ðv vminÞ ðvmax vminÞ =โดยที่ v คือ คะแนนดิบ และ vmin และ vmax จะต่ำสุดและสูงสุดในแต่ละรันตามลำดับ(พาวเวล ฝรั่งเศส Callan, Connell, & Viles, 2000) ซาวอย (2002) มี empirically เปรียบเทียบประสิทธิภาพการค้นหาระหว่างกลยุทธ์สี่ของวน คะแนนดิบ ขีด และ CORI จะเข้า (ดูCallan et al., 1995 สำหรับรายละเอียด) ใช้ CLEF ทดสอบเก็บรวบรวม และรายงานที่ขีดเป็นหลักในหมู่พวกเขา ในทำนองเดียวกัน Moulinier และ Molina Salgado (2002) พยายามทำการเปรียบเทียบระหว่างวน คะแนนดิบ CORI ตามปกติคะแนนและคะแนนมาตรฐานชุดถ่วงน้ำหนัก (ความผันแปรของตามปกติคะแนน), และรายงานที่ รวบรวมถ่วงน้ำหนักคะแนนมาตรฐานพบว่าเฉลี่ยสูงหมายถึงความแม่นยำเทคนิคสำหรับการประเมินคะแนนสูงสุดสำหรับรวมรายการการจัดอันดับได้รับการเสนอชื่อ Franz et al(2000) พบความสัมพันธ์เชิงเส้นระหว่างบันทึกตำแหน่งและความแม่นยำที่คะแนนอันดับ และใช้ empiricallyที่จะถูกแปลงตามความสัมพันธ์สำหรับผลรวมจากการเรียกใช้แต่ละ ในทำนองเดียวกัน กลยุทธ์ของตามลำดับคะแนนตรวจสอบใน Kraaij et al. (2000) Hiemstra et al. (2001) นอกจากนี้ยังตรวจสอบประสิทธิภาพการปรับเปลี่ยนคะแนนดิบเพื่อเอาผลของการอ้างอิงขนาดของคอลเลกชันในกระบวนการประเมินคะแนนดิบ ในขณะเดียวกัน หลินและเฉิน (2003) เสนอวิธีการปรับเปลี่ยนคะแนนดิบตามระดับของความคลุมเครือเมื่อแบบสอบถามแต่ละฉบับถูกแปล ตามสมมติฐานการที่ดีการแปลจะให้เอกสารที่เกี่ยวข้องมากขึ้น ซาวอย (2003a) ทดสอบสูตรถดถอยโลจิสติกสำหรับการคาดการณ์ความน่าเป็นความเกี่ยวข้องของเอกสารลำดับและคะแนนของเอกสารบนมืออื่น ๆ สำหรับสถาปัตยกรรมแบบรวมศูนย์ ชุดของเอกสารหลายภาษาไม่แบ่งออกเป็นชุดย่อยสำหรับแต่ละภาษา ค้นหาชุดแตกต่างกัน เราต้องการ(1) การแปลแบบสอบถามต้นฉบับเป็นภาษาทั้งหมดที่รวมอยู่ในชุดเก็บรวบรวมเอกสาร และรวมทั้งหมดแปลเป็นแบบสอบถามเดียว หรือ(2) ให้แปลเอกสารเป็นภาษาเดียวที่ใช้ในการสอบถามGey et al. (1999), เฉิน (2002) และ Nie และจิน (2003) วิธีการแรกหา CLEF ที่จ้างทดสอบชุด ด้วยวิธีนี้ อาจมีความจำเป็นในการปรับปรุงปัจจัย idf เนื่องจากเอกสารในภาษามีเอกสารน้อยลงอาจใช้ประโยชน์จากน้ำหนักตามความถี่เอกสาร (Lin และเฉิน 2003)5.3 การรวมทรัพยากรบางภาษาจำเป็นต้องพูด คุณภาพและความครอบคลุมของข้อมูลภาษาสำหรับแปลอย่างมีนัยสำคัญต่อผลการค้นหาปฏิบัติของ CLIR ในกรณีค้นหาระหว่างลาสองไม่เกี่ยวข้องโดยเฉพาะ

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

5. หัวข้อการวิจัยอื่น ๆ ใน CLIR
5.1 Pivot วิธีภาษา
หลายภาษาจะพูดในโลกและมันเป็นไปไม่ได้เสมอที่จะได้รับทรัพยากรภาษา
ที่เราต้องการสำหรับคู่โดยเฉพาะอย่างยิ่งภาษา เทคนิคที่มีแนวโน้มที่จะหลีกเลี่ยงปัญหาของ
มีจำนวน จำกัด ของทรัพยากรทางภาษาจะเป็นวิธีการภาษาเดือยซึ่งในกลาง
ภาษาทำหน้าที่เป็นสื่อกลางระหว่างสองภาษาที่ไม่มีทรัพยากรที่สามารถใช้ได้สองภาษา สมมติ
ว่าเป็นงานที่ CLIR ระหว่างญี่ปุ่นและเนเธอร์แลนด์มีการร้องขอโดยผู้ใช้ ในกรณีนี้เครื่องอ่าน
ทรัพยากรของคู่ญี่ปุ่นดัตช์อาจจะใช้งานไม่ได้และมันจะง่ายต่อการค้นหาและญี่ปุ่นเป็นภาษาอังกฤษ
ทรัพยากรดัตช์ภาษาอังกฤษตั้งแต่ภาษาอังกฤษเป็นเช่นภาษาที่ใช้กันอย่างแพร่หลาย ดังนั้น CLIR ระหว่างญี่ปุ่น
และภาษาดัตช์สามารถดำเนินการผ่านทางภาษาอังกฤษ (เป็นตัวกลาง) ไม่มีทรัพยากรภาษาโดยตรงของญี่ปุ่น
และเนเธอร์แลนด์.
วิธีการภาษาเดือยนอกจากนี้ยังอาจทำให้เกิดปัญหาของการรวมกันระเบิดของภาษา
เช่นถ้าเรามีการดำเนินการ CLIR ระหว่างกัน คู่ของภาษา n, O (n2) ทรัพยากรที่มีความจำเป็น อย่างไรก็ตาม
วิธีการภาษาเดือยช่วยให้เราสามารถจัดการกับงานที่ซับซ้อนที่มีเพียง O (n) ทรัพยากร (Gey, 2001).
วิธีการขั้นพื้นฐานของการใช้วิธีการหมุนภาษาจะแปลสกรรมกริยาของแบบสอบถามโดยใช้สอง
พจนานุกรมสองภาษา (บาเยสเตรอ, 2000) ในกรณีของการค้นหาจากญี่ปุ่นดัตช์ภาษาอังกฤษผ่านถ้าญี่ปุ่น
ภาษาอังกฤษและพจนานุกรมภาษาอังกฤษดัตช์ที่มี CLIR สามารถดำเนินการได้โดยการแทนที่ญี่ปุ่น
แง่แบบสอบถามกับรายการเทียบเท่าภาษาอังกฤษที่สอดคล้องและต่อเนื่องแทนเทียบเท่าภาษาอังกฤษ
กับรายการเทียบเท่าดัตช์ แน่นอนถ้าญี่ปุ่นเป็นภาษาอังกฤษและระบบมอนแทนาภาษาอังกฤษดัตช์สามารถ
ใช้การแปลสกรรมกริยาที่คล้ายกันนอกจากนี้ยังเป็นไปได้.
กับกรณีของการแปลสกรรมกริยาตามพจนานุกรมคลุมเครือแปลจะกลายเป็นรุนแรงมากขึ้น
ปัญหา เป็นไปได้ว่าการแปลผลเป็นทวีคูณถ้าไม่ชัดเจนในแต่ละขั้นตอนการเปลี่ยน
อัตราผลตอบแทนที่คลุมเครือ (1) จากภาษาต้นฉบับภาษากลางและ (2) จากกลาง
ภาษาภาษาเป้าหมาย ตัวอย่างเช่นสมมติว่าแหล่งที่มาของญี่ปุ่นแบบสอบถามประกอบด้วยสี่คำ
และทุกคำพูดมีสี่รายการเทียบเท่าภาษาอังกฤษ นอกจากนี้หากทุกเทียบเท่าภาษาอังกฤษมีสี่รายการเทียบเท่าดัตช์,
เปลี่ยนง่ายจะผลิต 64 (= 43) คำค้นหาทั้งหมดจากเพียง 4 แง่แหล่งที่มา
ซึ่งย่อมจะมีคำแปลที่ไม่เกี่ยวข้องบาง เพื่อแก้ปัญหานี้บาเยสเตรอ (2000) พยายาม
ที่จะใช้วิธีการแก้ความกำกวมดังกล่าวข้างต้น (วิธีร่วมเกิดขึ้นตามแบบสอบถาม
การขยายตัว ฯลฯ ) เพื่อ transitive แปลและบรรลุการปรับปรุงประสิทธิภาพอย่างมากในการค้นหา.
Gollins และ Sanderson (2001) นอกจากนี้ยังมี เสนอเทคนิคที่เรียกว่า '' สมคำศัพท์ '' เพื่อบรรเทาแปล
ปัญหาความคลุมเครือในการที่ทั้งสองภาษาเดือยจะใช้เป็นอิสระและการกำจัดของที่ผิดพลาด
แปลพยายามโดยการแปลเฉพาะในการร่วมกันจากสองวิธีของการแปลสกรรมกริยา
ใช้สองภาษาเดือย.
หมุน วิธีการใช้ภาษาได้รับการใช้ใน TREC, NTCIR และโน๊ตอันเนื่องมาจากความไม่พร้อมของ
ทรัพยากรสองภาษา ยกตัวอย่างเช่นการรวมกันดังต่อไปนี้สภาษาได้รับการสำรวจ:
(. ฟรานซ์, et al, 1999) •ภาษาอังกฤษ> ฝรั่งเศส> ภาษาเยอรมัน
•ฝรั่งเศส> ภาษาอังกฤษ> ภาษาเยอรมัน ฯลฯ (Gey เจียงเฉินและ Larson, 1999)
เค Kishida / ข้อมูลการประมวลผลและการจัดการ 41 (2005) 433-455 445
•ภาษาเยอรมัน> อังกฤษ> ภาษาอิตาลี (Hiemstra และ Kraaij, 1999)
•ญี่ปุ่น> Thai> ภาษาจีน (เฉินหลินและ 2003)
•จีน> Thai> ญี่ปุ่น (เฉินและ Gey 2003)
โดยเฉพาะอย่างยิ่ง et al, ฟรานซ์ (1999) ได้เสนอเทคนิคที่น่าสนใจบางอย่างสำหรับการค้นหาเอกสารที่เยอรมัน
มีคำสั่งภาษาอังกฤษ:
(1) บิดของความน่าจะแปล: การประมาณความน่าจะเป็นการแปลจากคำภาษาอังกฤษที่ e เพื่อ
กรัมระยะเยอรมันผ่านแง่ฝรั่งเศสฉดังกล่าวที่
PðgjeÞ¼
X
ฉ
PðgjfÞPðfjeÞ:
(2) การสร้างแบบสอบถามอัตโนมัติจากคลังภาษากลาง: การสร้างคำสั่งภาษาฝรั่งเศส
โดยอัตโนมัติโดยเพียงแค่การรวมทั้งหมด stopwords ไม่ใช่ในติดอันดับเอกสารภาษาฝรั่งเศสค้นหา
โดยระบบ CLIR ภาษาอังกฤษฝรั่งเศสและวางแบบสอบถามภาษาฝรั่งเศสเป็นภาษาฝรั่งเศสเยอรมัน CLIR
ระบบ.
5.2 ผสานกลยุทธ์สำหรับการดึงข้อมูลพูดได้หลายภาษา
สมมติว่าเรามีการเก็บเอกสารที่พูดได้หลายภาษาที่สองหรือภาษาอื่น ๆ ที่ผสม (ไม่
คลังขนาน) และผู้ใช้ที่มีความประสงค์ที่จะค้นหาคอลเลกชันสำหรับการค้นหาแสดงในภาษาเดียว นี้
เป็นงานที่ซับซ้อนมากขึ้นกว่า CLIR ภาษาที่เรียบง่าย ใน Clef และ NTCIR, CLIR พูดได้หลายภาษาได้รับ
. นำมาเป็นงานวิจัยและกลุ่มวิจัยจำนวนมากได้ทำงานเกี่ยวกับเรื่องนี้
โดยทั่วไปมีสองวิธีสำหรับ IR พูดได้หลายภาษา (เฉินหลินและ 2003):
•สถาปัตยกรรมแบบกระจายที่รวบรวมเอกสาร จะถูกคั่นด้วยภาษาและแต่ละส่วนมีการ
จัดทำดัชนีและดึงอิสระ.
•สถาปัตยกรรมส่วนกลางซึ่งในการเก็บรวบรวมเอกสารในภาษาต่างๆถูกมองว่าเป็นเอกสารฉบับเดียว
การเก็บเงินและการจัดทำดัชนีในแฟ้มดัชนีขนาดใหญ่หนึ่ง.
ในสถาปัตยกรรมการกระจายการค้นหาภาษามาตรฐานคือ ทำซ้ำสำหรับแต่ละแยก
ภาษาย่อยคอลเลกชันตามลำดับและอีกหลายรายการเอกสารการจัดอันดับจะถูกสร้างโดยการทำงานในแต่ละ.
จากนั้นจะกลายเป็นปัญหาวิธีการรวมผลของการทำงานในแต่ละรายชื่ออันดับที่เดียวเพื่อให้ที่เกี่ยวข้องทั้งหมด
เอกสารในภาษาใด ๆ มีการจัดอันดับที่ประสบความสำเร็จ . เป็นหลักกลยุทธ์การควบรวมคือการวิจัยทั่วไป
ปัญหาของ IR เมื่อค้นหากระจายทรัพยากร (เช่นการกระจาย IR) ซึ่งมันเป็นสิ่งที่หลีกเลี่ยงไม่ได้จำเป็นที่จะต้อง
รวมรายชื่อที่ได้รับจากการจัดอันดับแต่ละทรัพยากร ใน CLIR กลยุทธ์การควบรวมกิจการต่อไปได้รับการ
ตรวจสอบ:
•คะแนนดิบ: ตรงไปตรงมาโดยใช้คะแนนเอกสารประมาณในการทำงานในแต่ละ.
•โรบินรอบ: interleaving แต่ละรายการเอกสารในแฟชั่นรอบโรบินโดยสมมติว่าการกระจายของ
เอกสารที่เกี่ยวข้องเป็นเหมือนกันในหมู่รายการ
•คะแนนปกติ: คะแนนเอกสาร normalizing โดยการทำงานในแต่ละเพื่อที่จะเอาผลของการ collectiondependent
สถิติเกี่ยวกับการประมาณค่าของคะแนน.
•คะแนนลำดับตาม: คณิตศาสตร์แปลงการจัดอันดับในแต่ละทำงานเป็นคะแนนโดยสมมติว่าความสัมพันธ์
ระหว่างตำแหน่งและความน่าจะเป็นของความเกี่ยวข้อง
•คะแนน Modified: การปรับเปลี่ยนคะแนนดิบในการทำงานในแต่ละเพื่อที่จะลดผลกระทบของการพึ่งพาคอลเลกชันขนาด
คลุมเครือแปลและอื่น ๆ
446 เค Kishida / ข้อมูลการประมวลผลและการจัดการ 41 (2005) 433-455
ถ้าแบบดึงใช้สำหรับการทำงานในแต่ละ สามารถประเมินความน่าจะเป็นความเกี่ยวข้องของแต่ละเอกสารอย่างถูกต้อง
ก็จะมีความเหมาะสมอีกครั้งยศเอกสารทั้งหมดเข้าด้วยกันตามค่าความน่าจะเป็น (เช่น
คะแนนดิบ) ตัวอย่างเช่นเฉินและ Gey (2003) เพียงรวมผลที่ได้จากจีนญี่ปุ่นและภาษาอังกฤษ
คอลเลกชันตามค่าความน่าจะเป็นของความเกี่ยวข้องประมาณโดยรูปแบบการถดถอยโลจิสติก.
อย่างไรก็ตามในกรณีส่วนใหญ่ก็จะเป็นเรื่องยากที่จะต้องพิจารณาในแต่ละคะแนนเอกสาร เป็นความน่าจะเป็นของบริสุทธิ์
เกี่ยวข้องแม้ว่ารูปแบบการดึงความน่าจะถูกนำมาใช้จริง ในกรณีนี้ถ้าเราสามารถสรุปได้ว่าเกี่ยวข้องกับ
เอกสารที่มีการกระจายในทางเดียวกันในทุกภาษาแยกย่อยคอลเลกชัน, กลยุทธ์ง่าย
กลมผสานโรบินตามซึ่งมีเพียงตำแหน่งของแต่ละเอกสารจะถูกนำเข้าบัญชี.
มิฉะนั้น วิธีทางเลือกคือการใช้คะแนนเอกสารปกติดังกล่าวว่า
วี¼ DV? vminÞ = ðvmax? vminÞ;
โวลต์ที่เป็นคะแนนดิบและ vmin และ VMax เป็นต่ำสุดและสูงสุดในการทำงานในแต่ละตามลำดับ
(พาวเวล, ฝรั่งเศส, Callan, คอนเนลล์และ Viles, 2000) ซาวอย (2002) มีผลการดำเนินงานการค้นหาเมื่อเทียบสังเกตุ
ในหมู่สี่กลยุทธ์ของโรบินรอบคะแนนดิบคะแนนปกติและวิธีการ CORI (ดู
Callan et al., 1995 สำหรับรายละเอียด) โดยใช้คอลเลกชันทดสอบ Clef และรายงานว่าคะแนนปกติเป็นที่โดดเด่น
ในหมู่ พวกเขา ในทำนองเดียวกัน Moulinier และ Molina-Salgado (2002) พยายามที่จะดำเนินการเปรียบเทียบระหว่าง
โรบินรอบคะแนนดิบ CORI คะแนนปกติและให้คะแนนปกติคอลเลกชันถ่วงน้ำหนัก (รูปแบบของ
คะแนนปกติ) และรายงานว่าคอลเลกชันถ่วงน้ำหนักคะแนนปกติแสดงให้เห็นว่าค่าเฉลี่ยที่สูงขึ้น เฉลี่ย
ความแม่นยำ.
เทคนิคอื่น ๆ สำหรับการประเมินคะแนนที่ดีที่สุดสำหรับการรวมรายชื่ออันดับที่ได้รับการเสนอ ฟรานซ์ et al.
(2000) พบว่าสังเกตุความสัมพันธ์เชิงเส้นระหว่างการเข้าสู่ระบบของการจัดอันดับและความแม่นยำในการจัดอันดับและใช้คะแนน
ที่ได้รับการดัดแปลงให้เป็นไปตามความสัมพันธ์สำหรับการรวมผลที่ได้จากการทำงานในแต่ละ ในทำนองเดียวกันกลยุทธ์ของการ
ให้คะแนนตามอันดับที่ได้รับการตรวจสอบใน Kraaij et al, (2000) Hiemstra et al, (2001) นอกจากนี้ยังมีการตรวจสอบประสิทธิภาพ
ของการปรับเปลี่ยนคะแนนดิบเพื่อที่จะเอาผลของการพึ่งพาคอลเลกชันขนาดในกระบวนการของการประเมิน
คะแนนดิบ ขณะที่หลินและ Chen (2003) ได้เสนอวิธีการในการปรับเปลี่ยนคะแนนดิบขึ้นอยู่กับ
ระดับของความคลุมเครือเมื่อแต่ละแบบสอบถามแหล่งที่มาได้รับการแปลเป็นไปตามสมมติฐานที่ว่าดี
อาจจะให้แปลเอกสารอื่น ๆ อีกมากมายที่เกี่ยวข้อง ซาวอย (2003a) การทดสอบสูตรการถดถอยโลจิสติก
ในการทำนายความน่าจะเป็นความเกี่ยวข้องของเอกสารที่ได้รับการจัดอันดับและคะแนนของเอกสาร.
ในทางกลับกันสำหรับสถาปัตยกรรมส่วนกลางชุดของเอกสารพูดได้หลายภาษาที่ไม่ได้แบ่งออกเป็น
คอลเลกชันย่อยสำหรับ แต่ละภาษา เพื่อที่จะค้นหาเช่นคอลเลกชันที่แตกต่างกันเราต้องอย่างใดอย่างหนึ่ง
(1) ในการแปลแบบสอบถามแหล่งที่มาในทุกภาษารวมอยู่ในคอลเลกชันเอกสารและการควบรวมทั้งหมด
แปลเป็นคำเดียวหรือ
(2) ในการแปลเอกสารเป็นภาษาเดียว ที่ใช้ในการสอบถาม.
Gey et al, (1999), เฉิน (2002) และ Nie และจิน (2003) การจ้างงานวิธีแรกสำหรับการค้นหา Clef
คอลเลกชันทดสอบ ด้วยวิธีนี้ก็อาจจะจำเป็นต้องปรับปัจจัย IDF เพราะเอกสารในภาษาที่
มีเอกสารน้อยลงอาจใช้ประโยชน์จากน้ำหนักตามความถี่เอกสาร (เฉินหลินและ 2003).
5.3 การรวมกันของบางแหล่งข้อมูลภาษา
จำเป็นต้องพูดที่มีคุณภาพและความครอบคลุมของทรัพยากรสำหรับการแปลภาษาส่งผลกระทบต่อประสิทธิภาพการค้นหา
ของ CLIR โดยเฉพาะในกรณีของการค้นหาที่ไม่เกี่ยวข้องกันระหว่างสองลา

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

5 . หัวข้อวิจัยอื่นๆ ใน clir
5.1 ภาษาแกนเข้าหา
ดังนั้นหลายภาษาเป็นภาษาพูดในโลก และมันก็เป็นไปได้ที่จะได้รับทรัพยากรภาษา
เราต้องคู่โดยเฉพาะของภาษา เทคนิคสัญญาเพื่อหลีกเลี่ยงปัญหาความพร้อมของทรัพยากรที่ จำกัด ของ
ภาษาศาสตร์จะหมุนทางภาษา ซึ่งเป็นสื่อกลาง
ภาษาทำหน้าที่เป็นคนกลางระหว่างสองภาษาสองภาษาทรัพยากรที่ไม่สามารถใช้ได้ สมมติว่า clir
งานระหว่างญี่ปุ่นและดัตช์ถูกร้องขอโดยผู้ใช้ ในกรณีนี้รหัส
ทรัพยากรของญี่ปุ่นและดัตช์คู่อาจจะไม่สามารถใช้งานได้ และมันจะง่ายต่อการค้นหาสำหรับภาษาอังกฤษและภาษาญี่ปุ่น
ดัตช์–ทรัพยากรภาษาอังกฤษใช้กันอย่างแพร่หลายเนื่องจากภาษาอังกฤษเป็นภาษาดังนั้น clir ระหว่างญี่ปุ่นและสามารถดำเนินการผ่านทาง
ดัตช์ภาษาอังกฤษ ( คนกลาง ) โดยตรงภาษาทรัพยากรของญี่ปุ่น

เดือยและภาษาดัตช์ ภาษาแบบอาจบรรเทาปัญหาของการระเบิดของภาษา
เช่น ถ้าเราต้องแสดง clir ระหว่างแต่ละคู่ของภาษา , O ( n2 ) ทรัพยากร ต้องการ อย่างไรก็ตาม
การหมุนตัวทางภาษาช่วยให้เราสามารถจัดการกับงานที่ซับซ้อน มีเพียง O ( n ) ทรัพยากร ( เกย์ , 2001 ) .
วิธีพื้นฐานของการใช้ภาษาแบบหมุนจะมีการแปลของแบบสอบถามที่ใช้สองภาษาพจนานุกรม
( Ballesteros , 2000 ) ในกรณีของการค้นหาจากญี่ปุ่น - ดัตช์ผ่านภาษาอังกฤษ ถ้าญี่ปุ่น–
ภาษาอังกฤษและดัตช์พจนานุกรม– พร้อมใช้งานclir สามารถดำเนินการโดยเปลี่ยนเงื่อนไขการญี่ปุ่น
ที่สอดคล้องกับภาษาอังกฤษเทียบเท่าและกระชั้นชิดแทน
> ภาษาอังกฤษเทียบเท่ากับชาวดัตช์ แน่นอน ถ้าญี่ปุ่น–ภาษาอังกฤษ–ดัตช์ตันระบบสามารถ
ใช้คล้ายกันสกรรมกริยาแปลยังเป็นไปได้ .
กับกรณีของพจนานุกรมแปลภาษาสกรรมกริยาตามโปรแกรม แปล สามารถกลายเป็นปัญหาร้ายแรง
เพิ่มเติม เป็นไปได้ว่า การแปลผลกลายเป็นคลุมเครือทวีคูณถ้าแต่ละขั้นตอนแทน
ผลผลิตความคลุมเครือ ( 1 ) จากแหล่งที่มาภาษาภาษากลาง และ ( 2 ) จากภาษากลาง
ในภาษาเป้าหมาย ตัวอย่างเช่นสมมติว่าแบบสอบถามแหล่งญี่ปุ่นประกอบด้วยสี่คำ
ทุกถ้อยคำได้สี่เทียบเท่าภาษาอังกฤษ นอกจากนี้ ถ้าภาษาอังกฤษเทียบเท่าได้สี่เทียบเท่าดัตช์ , เปลี่ยนง่าย
จะผลิต 64 ( 43 ) ค้นหาข้อความทั้งหมดจากเงื่อนไขเพียง 4 แหล่ง ซึ่งย่อมจะมีความเกี่ยวข้อง
แปล เพื่อแก้ปัญหานี้ Ballesteros ( 2000 ) พยายาม
ให้ใช้แก้ความกำกวมวิธีการดังกล่าวข้างต้น ( CO ที่เกิดขึ้นตามวิธีการค้นหา
ขยาย ฯลฯ ) ในการแปลและมีการปรับปรุงอย่างมากในผลการค้นหา .
เ ล คอลลิน และ แซนเดอร์สัน ( 2001 ) ยังได้เสนอเทคนิคที่เรียกว่า ' รูปสามเหลี่ยม ' ' 'lexical บรรเทาแปล
ความคลุมเครือ ปัญหาที่สองเดือยภาษาใช้อิสระและการกำจัดการแปลที่ผิดพลาด
มีความพยายามโดยการแปลเท่านั้นเหมือนกัน จากสองวิธีของการใช้สองภาษาแปล

เดือยเดือย . ภาษาแบบได้ถูกใช้ใน trec ntcir , กุญแจเสียง , เนื่องจาก unavailability ของ
ทรัพยากรภาษา ตัวอย่างเช่นต่อไปนี้การรวมกันของภาษาได้รับการสํารวจ :
- ภาษาอังกฤษ เป็น ภาษาฝรั่งเศสภาษาเยอรมัน ( Franz et al . , 1999 )
- > ภาษาอังกฤษ > ภาษาฝรั่งเศส ภาษาเยอรมัน ฯลฯ ( เกย์ , เจียง , เฉิน& Larson , 1999 )
k . คิชิดะ / การประมวลผลข้อมูลและการจัดการ 41 ( 2005 ) ที่ 455 445
- จำกัด ภาษาเยอรมัน เป็น ภาษาอังกฤษ ภาษาอิตาลี ( hiemstra & kraaij , 1999 )
- ญี่ปุ่น > ภาษาอังกฤษ > จีน ( เฉินหลิน& , 2003 )
ภาษาอังกฤษ > > ญี่ปุ่น - จีน ( เฉิน& เกย์ , 2003 )
โดยฟรานซ์ et al . ( 1999 ) ได้เสนอที่น่าสนใจเทคนิคสำหรับการค้นหา
เอกสารเยอรมันสอบถามภาษาอังกฤษ :
( 1 ) สังวัตนาการของการแปล : การประมาณความน่าจะเป็นความน่าจะเป็น แปลจากภาษาอังกฤษเป็นภาษาเยอรมันระยะยาวระยะ E
G ผ่านฝรั่งเศสแง่ F เช่น
P ð Deutsch Þ¼
x
f
P ð gjf Þ P ð F . Þ :
( 2 ) การสร้างแบบสอบถามโดยอัตโนมัติจากคลังข้อมูลภาษาระดับกลาง : การสร้างแบบสอบถามโดยอัตโนมัติ โดยเพียงแค่การฝรั่งเศส
stopwords ไม่ใช่ทั้งหมดในด้านบนการจัดอันดับเอกสารภาษาฝรั่งเศสค้นหา
โดยภาษาอังกฤษ–ฝรั่งเศส clir ระบบและเก็บแบบสอบถาม ฝรั่งเศส ในฝรั่งเศส เยอรมนี ระบบ clir
.
. . การผสานกลยุทธ์สำหรับการดึงข้อมูลหลายภาษา
สมมติว่าเรามีหลายภาษาเอกสารคอลเลกชันที่มากกว่าสองภาษาผสม ( ไม่
เป็นคลังข้อมูลแบบขนาน ) และผู้ใช้ต้องการค้นหาคอลเลกชันสำหรับแบบสอบถามที่แสดงในภาษาเดียว งานนี้
มีความซับซ้อนมากกว่า clir ภาษาง่าย ๆ และใน ntcir กุญแจเสียง , พูดได้หลายภาษา มี clir
ประกาศใช้งานวิจัย และงานวิจัยหลายกลุ่มได้ทำงานเกี่ยวกับปัญหา .

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.