The clean-upAlexis Stamatakis, a bi

The clean-up

Alexis Stamatakis, a bioinformatician at the Heidelberg Institute for Theoretical Studies in Germany, is used to complaints from his biologist colleagues about mislabeled sequences. A few years ago, he decided to do something about the issue. He and his group members have developed an algorithm to root out mislabeled sequences. “Right now the method is not fully automatic,” he said. “We have a half-automatic method to facilitate the curation process that will then provide a list of putative mislabeled sequences to the curator.” It is the user’s job to decide whether the sequence does in fact belong to a different organism.

The developers have not yet published their algorithm, but Pelin Yilmaz, a postdoc at the Max Planck Institute for Marine Microbiology in Bremen, Germany, has taken it for a test drive. She is a member of the SILVA database, a curated collection of ribosomal RNA sequence data. Every month she gets a handful of questions from users asking about potentially mislabeled sequences. She applied Stamatakis’s software to a group of organisms consisting of only cyanobacteria. Using taxonomy from GenBank, “out of 1,000 [sequences] I found 150 mislabeled, which is not that bad,” she said. Two other datasets, Greengenes and the Ribosomal Database Project, each showed up with 90 potentially mislabeled sequences, while the SILVA taxonomy had 30.

“It would have been really hard to find mislabels like this,” Yilmaz said. “If I had to do it manually I suppose I would have to build phylogenetic trees over and over again. This is much better.”

The success for the algorithm starts to break down at the species level, but at genus it’s quite accurate, identifying mislabeled sequences with up to 98 percent precision, said Alexey Kozlov, a graduate student in Stamatakis’s lab. At present, the program can handle about 10,000 sequences, so it’s best applied to smaller datasets. Kozlov said scaling up the number of sequences is a future goal.

Meanwhile, NCBI is making some efforts to clean up misidentified sequences in GenBank. The agency has been working internally and with outside groups to develop a curated set of 16S sequences linked to type strains and of internal transcribed spacer (ITS) sequences—another widely used marker—in fungi. “Those are particularly important sequences to curate and get cleaned-up sets because they’re used by so many to classify their organisms,” said Lipman.

Lipman said he’s pleased to learn of developers like Stamatakis who are working to automate the process of scrubbing genetic databases. He’d like to see such tools applied across GenBank, particularly at the point of submission. “So largely it means rather than the database looking at each record as it comes in at the back end, then having to get back to the submitter, if we get these consensus models ahead of time . . . ultimately, you can see how this would save us a lot of time.”

It’s especially important for GenBank to prioritize such efforts given how researchers now use the database, he added. “It has to do with this transition that sequencing is now done for comparative purposes, therefore, we should be doing a good job to clean it up and so we can very rapidly give a much more informative response to a user.”

The clean-up

Alexis Stamatakis, a bioinformatician at the Heidelberg Institute for Theoretical Studies in Germany, is used to complaints from his biologist colleagues about mislabeled sequences. A few years ago, he decided to do something about the issue. He and his group members have developed an algorithm to root out mislabeled sequences. “Right now the method is not fully automatic,” he said. “We have a half-automatic method to facilitate the curation process that will then provide a list of putative mislabeled sequences to the curator.” It is the user’s job to decide whether the sequence does in fact belong to a different organism.

The developers have not yet published their algorithm, but Pelin Yilmaz, a postdoc at the Max Planck Institute for Marine Microbiology in Bremen, Germany, has taken it for a test drive. She is a member of the SILVA database, a curated collection of ribosomal RNA sequence data. Every month she gets a handful of questions from users asking about potentially mislabeled sequences. She applied Stamatakis’s software to a group of organisms consisting of only cyanobacteria. Using taxonomy from GenBank, “out of 1,000 [sequences] I found 150 mislabeled, which is not that bad,” she said. Two other datasets, Greengenes and the Ribosomal Database Project, each showed up with 90 potentially mislabeled sequences, while the SILVA taxonomy had 30.

“It would have been really hard to find mislabels like this,” Yilmaz said. “If I had to do it manually I suppose I would have to build phylogenetic trees over and over again. This is much better.”

The success for the algorithm starts to break down at the species level, but at genus it’s quite accurate, identifying mislabeled sequences with up to 98 percent precision, said Alexey Kozlov, a graduate student in Stamatakis’s lab. At present, the program can handle about 10,000 sequences, so it’s best applied to smaller datasets. Kozlov said scaling up the number of sequences is a future goal.

Meanwhile, NCBI is making some efforts to clean up misidentified sequences in GenBank. The agency has been working internally and with outside groups to develop a curated set of 16S sequences linked to type strains and of internal transcribed spacer (ITS) sequences—another widely used marker—in fungi. “Those are particularly important sequences to curate and get cleaned-up sets because they’re used by so many to classify their organisms,” said Lipman.

Lipman said he’s pleased to learn of developers like Stamatakis who are working to automate the process of scrubbing genetic databases. He’d like to see such tools applied across GenBank, particularly at the point of submission. “So largely it means rather than the database looking at each record as it comes in at the back end, then having to get back to the submitter, if we get these consensus models ahead of time . . . ultimately, you can see how this would save us a lot of time.”

It’s especially important for GenBank to prioritize such efforts given how researchers now use the database, he added. “It has to do with this transition that sequencing is now done for comparative purposes, therefore, we should be doing a good job to clean it up and so we can very rapidly give a much more informative response to a user.”

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

การล้างAlexis Stamatakis, bioinformatician ไฮเดลเบิร์กสถาบันศึกษาทฤษฎีในเยอรมนี ใช้ในการร้องเรียนจากเพื่อนร่วมงานของเขานักชีววิทยาเกี่ยวกับลำดับ mislabeled กี่ปีที่ผ่านมา เขาตัดสินใจที่จะทำบางสิ่งบางอย่างเกี่ยวกับปัญหา เขาและสมาชิกในกลุ่มของเขาได้พัฒนาอัลกอริทึมการรากออกลำดับ mislabeled "ตอนนี้วิธีไม่อัตโนมัติ เขากล่าวว่า "เรามีวิธีครึ่งอัตโนมัติเพื่ออำนวยความสะดวกการ curation ที่แล้วจะมีรายการของลำดับ putative mislabeled ภัณฑารักษ์ที่" มันเป็นงานของผู้ใช้ในการตัดสินใจว่า ลำดับที่ในความเป็นจริงเป็นสิ่งมีชีวิตที่แตกต่างกันนักพัฒนาไม่มียังเผยอัลกอริทึมของพวกเขา แต่ Pelin Yilmaz, postdoc ที่ สถาบันสูงสุดของพลังค์จุลชีววิทยาทางทะเลใน Bremen เยอรมนี มีนำมันสำหรับไดรฟ์ทดสอบ เธอเป็นสมาชิกของฐานข้อมูล SILVA คอลเลกชัน curated ข้อมูลลำดับ ribosomal อาร์เอ็นเอ ทุกเดือนเธอจะหยิบคำถามจากผู้ถามเกี่ยวกับลำดับอาจ mislabeled เธอใช้ซอฟต์แวร์ Stamatakis เป็นกลุ่มของสิ่งมีชีวิตที่ประกอบด้วยเฉพาะ cyanobacteria ใช้ระบบภาษีจาก GenBank "จาก 1000 [ลำดับ] พบ 150 mislabeled ซึ่งไม่ใช่ที่เลว เธอกล่าว สองอื่น ๆ datasets, Greengenes และโครงการฐานข้อมูล Ribosomal ละพบขึ้นกับลำดับอาจ mislabeled 90 ในขณะที่ระบบภาษี SILVA ได้ 30"ก็จะได้รับจริง ๆ ยากที่จะค้นหา mislabels เช่นนี้, " Yilmaz กล่าวว่า "ถ้าผมจะทำได้ด้วยตนเอง ฉันคิดว่า ฉันจะต้องสร้างต้นไม้ phylogenetic เล่า นี้เป็นดี"ความสำเร็จสำหรับอัลกอริทึมเริ่มทำลายลงในระดับสปีชีส์ แต่ที่สกุลจึงค่อนข้าง ถูกต้อง ระบุลำดับ mislabeled มีถึง 98 เปอร์เซ็นต์ความแม่นยำ กล่าวว่า Alexey Kozlov นักศึกษาบัณฑิตศึกษาในห้องปฏิบัติการของ Stamatakis ปัจจุบัน โปรแกรมสามารถจัดการลำดับประมาณ 10000 ดังนั้นมันมีส่วนกับ datasets ขนาดเล็ก Kozlov กล่าวว่า การปรับมาตราส่วนค่าหมายเลขลำดับเป็นเป้าหมายในอนาคตในขณะเดียวกัน NCBI กำลังทำบางอย่างพยายามที่ล้างข้อมูลลำดับสถานใน GenBank หน่วยงานที่มีการทำงานภายใน และกับกลุ่มภายนอกในการพัฒนา เป็น curated ตั้งลำดับ 16S กับชนิดสายพันธุ์ และเป็นตัวเว้นวรรคทับภายในลำดับ (ของ) คือเครื่องหมายที่ใช้กันอย่างแพร่หลายอีก — ในเชื้อรา "ผู้มีความสำคัญโดยเฉพาะอย่างยิ่งลำดับ curate และรับทำความสะอาดสายชุดเนื่องจากพวกเขากำลังใช้มากในการจัดประเภทของสิ่งมีชีวิต กล่าวว่า LipmanLipman กล่าวว่า เขามีความยินดีที่จะเรียนรู้ของนักพัฒนาเช่น Stamatakis ที่ทำงานโดยอัตโนมัติกระบวนการขัดฐานข้อมูลพันธุกรรม อยากจะดูเครื่องมือดังกล่าวใช้ใน GenBank โดยเฉพาะอย่างยิ่งในหน้าร้านส่ง "ดังนั้นส่วนใหญ่ก็ แทนที่มองที่แต่ละฐานข้อมูลระเบียน ตามมาในท้ายหลัง ต้องกลับไปผู้ ถ้าเรารับรุ่นนี้มติล่วงหน้า...สุด คุณสามารถดูวิธีนี้จะบันทึกเรามากขึ้น"เขาจึงเป็นสิ่งสำคัญอย่างยิ่งสำหรับ GenBank เพื่อจัดลำดับความสำคัญของความพยายามดังกล่าวระบุว่านักวิจัยใช้ฐานข้อมูล เพิ่ม "มีกับช่วงนี้ที่ขณะนี้มีทำลำดับสำหรับการเปรียบเทียบ ดังนั้น เราควรจะทำงานดีทำความสะอาดขึ้น และเพื่อให้ เราสามารถให้การตอบสนองมากขึ้นข้อมูลผู้ใช้อย่างรวดเร็ว"

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

สะอาดขึ้นอเล็กซิส Stamatakis, bioinformatician ที่สถาบันไฮเดลเบิร์กเพื่อการศึกษาเชิงทฤษฎีในประเทศเยอรมนีจะใช้ในการร้องเรียนจากเพื่อนร่วมงานของเขาเกี่ยวกับชีววิทยาลำดับเรียกไม่ถูก ไม่กี่ปีที่ผ่านมาเขาตัดสินใจที่จะทำบางสิ่งบางอย่างเกี่ยวกับปัญหา เขาและสมาชิกในกลุ่มของเขาได้พัฒนาอัลกอริทึมที่จะออกรากลำดับเรียกไม่ถูก "ตอนนี้วิธีการที่ไม่ได้อย่างอัตโนมัติ" เขากล่าว "เรามีวิธีการครึ่งอัตโนมัติเพื่ออำนวยความสะดวกในกระบวนการ curation ที่จะให้รายชื่อของลำดับเรียกไม่ถูกสมมุติให้กับผู้ดูแล." มันเป็นงานของผู้ใช้ในการตัดสินใจว่าลำดับไม่ในความเป็นจริงเป็นสิ่งมีชีวิตที่แตกต่างกัน. นักพัฒนามี ยังไม่ได้เผยแพร่ขั้นตอนวิธีการของพวกเขา แต่ Pelin Yilmaz, postdoc ที่สถาบันมักซ์พลังค์จุลชีววิทยาทางทะเลในเบรเมินเยอรมนีได้เอามันสำหรับไดรฟ์ทดสอบ เธอเป็นสมาชิกของฐานข้อมูล SILVA, คอลเลกชัน curated ของข้อมูลลำดับโซมอล rna ทุกเดือนเธอได้รับกำมือของคำถามจากผู้ใช้ลำดับถามเกี่ยวกับการติดฉลากผิดที่อาจเกิดขึ้น เธอใช้ซอฟต์แวร์ของ Stamatakis ไปยังกลุ่มของสิ่งมีชีวิตที่ประกอบด้วยไซยาโนแบคทีเรียเท่านั้น ใช้อนุกรมวิธานจาก GenBank "จาก 1,000 [ลำดับ] ผมพบ 150 เรียกไม่ถูกซึ่งไม่ใช่ว่าไม่ดี" เธอกล่าว สองชุดข้อมูลอื่น ๆ Greengenes และโครงการฐานข้อมูลไรโบโซมแต่ละปรากฏตัวขึ้นพร้อมกับลำดับ 90 เรียกไม่ถูกที่อาจเกิดขึ้นในขณะที่อนุกรมวิธาน SILVA 30 มี. "มันจะได้รับยากที่จะหา mislabels เช่นนี้" Yilmaz กล่าวว่า "ถ้าผมจะทำมันด้วยตัวเองฉันคิดว่าฉันจะต้องสร้างต้นไม้ซ้ำแล้วซ้ำอีก นี่คือ. ดีมาก " ความสำเร็จสำหรับขั้นตอนวิธีการเริ่มต้นที่จะทำลายลงในระดับสายพันธุ์ แต่ในประเภทมันค่อนข้างถูกต้องระบุลำดับเรียกไม่ถูกที่มีถึงความแม่นยำ 98 เปอร์เซ็นต์กล่าวว่าอเล็กซ์ Kozlov, นักศึกษาระดับบัณฑิตศึกษาในห้องปฏิบัติการของ Stamatakis ปัจจุบันโปรแกรมสามารถจัดการประมาณ 10,000 ลำดับดังนั้นจึงดีที่สุดที่จะนำมาใช้ชุดข้อมูลที่มีขนาดเล็ก Kozlov กล่าวปรับขึ้นจำนวนของลำดับเป็นเป้าหมายในอนาคต. ขณะเดียวกัน NCBI จะทำให้ความพยายามบางอย่างเพื่อทำความสะอาดลำดับ misidentified ใน GenBank หน่วยงานที่ได้รับการทำงานภายในและกับกลุ่มข้างนอกเพื่อพัฒนาชุด curated ของลำดับ 16S เชื่อมโยงกับการพิมพ์สายพันธุ์และ spacer คัดลอกภายใน (ITS) ลำดับ-อื่นใช้กันอย่างแพร่หลายในเครื่องหมายเชื้อรา "ผู้ที่มีลำดับความสำคัญโดยเฉพาะอย่างยิ่งพระและได้รับการทำความสะอาดขึ้นชุดเพราะพวกเขากำลังใช้โดยจำนวนมากดังนั้นการจำแนกสิ่งมีชีวิตของพวกเขา" ลิปแมนกล่าวว่า. ลิปแมนกล่าวว่าเขายินดีที่จะเรียนรู้ของนักพัฒนาเช่น Stamatakis ที่กำลังทำงานเพื่อทำให้กระบวนการของการขัดถู ฐานข้อมูลทางพันธุกรรม เขาต้องการที่จะเห็นเครื่องมือดังกล่าวนำมาใช้ทั่ว GenBank โดยเฉพาะอย่างยิ่งที่จุดของการส่ง "ดังนั้นส่วนใหญ่ก็หมายความว่ามากกว่าฐานข้อมูลกำลังมองหาที่แต่ละระเบียนในขณะที่มันมาในที่ปลายด้านหลังแล้วมีที่จะได้รับกลับไปยังผู้ส่งถ้าเราได้รับฉันทามติโมเดลเหล่านี้ก่อนเวลา . . ในที่สุดคุณจะเห็นว่าวิธีนี้จะช่วยเรามากเวลา. " มันเป็นสิ่งสำคัญโดยเฉพาะอย่างยิ่งสำหรับ GenBank จัดลำดับความสำคัญความพยายามดังกล่าวที่กำหนดวิธีการในขณะนี้นักวิจัยใช้ฐานข้อมูลเขาเพิ่ม "มันมีจะทำอย่างไรกับการเปลี่ยนแปลงนี้ลำดับที่ทำตอนนี้เพื่อการเปรียบเทียบดังนั้นเราควรจะทำผลงานที่ดีในการทำความสะอาดขึ้นและเพื่อให้เราอย่างรวดเร็วสามารถให้การตอบสนองต่อข้อมูลที่มากขึ้นให้กับผู้ใช้".

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

ความสะอาด

อเล็ก stamatakis , bioinformatician ที่ไฮเดลเบิร์กสถาบันเพื่อการศึกษาทางทฤษฎีในเยอรมัน ใช้ในการร้องเรียนจากนักชีววิทยาเพื่อนร่วมงานเกี่ยวกับ mislabeled ลำดับ ไม่กี่ปีที่ผ่านมา , เขาตัดสินใจที่จะทำบางสิ่งบางอย่างเกี่ยวกับปัญหา เขาและสมาชิกในกลุ่มของเขาได้พัฒนาขั้นตอนวิธีการค้นหา mislabeled ลำดับ " ตอนนี้วิธีการที่ไม่อัตโนมัติเต็ม" เขากล่าวว่า " เรามีวิธีกึ่งอัตโนมัติเพื่อความสะดวกในกระบวนการ curation ว่าแล้วจะให้รายชื่อของลำดับกรดอะมิโนปลอมกับผู้ดูแล " เป็นผู้ใช้งานเพื่อตัดสินใจว่าลำดับในความเป็นจริงของสิ่งมีชีวิตที่แตกต่างกัน

นักพัฒนายังไม่ได้เผยแพร่ขั้นตอนวิธีการของพวกเขา แต่ข้อมูล ยิลมาส ,เป็น postdoc ที่มักซ์พลังค์สถาบันจุลชีววิทยาทางทะเลใน เบรเมน เยอรมนี ได้ถ่ายมันสำหรับไดรฟ์ทดสอบ . เธอเป็นสมาชิกของฐานข้อมูล ซิลวา , คอลเลกชันของลำดับอาร์เอ็นเอไรโบโซม curated ข้อมูล . ทุกเดือนเธอจะหยิบคำถามจากผู้ใช้ ถามเกี่ยวกับ อาจติดฉลากผิดลำดับเธอใช้ stamatakis เป็นซอฟต์แวร์ในกลุ่มของสิ่งมีชีวิตประกอบด้วยเท่านั้นที่มี . โดยใช้อนุกรมวิธานจากขนาด " ออกจาก 1000 [ ดังนี้ ] ฉันพบ 150 ปลอมซึ่งก็ไม่เลวนะ " เธอกล่าว ข้อมูลอื่น ๆ 2 , greengenes และโครงการฐานข้อมูลไรโบโซมแต่ละแสดงให้เห็นถึง 90 อาจติดฉลากผิดลำดับ ในขณะที่อนุกรมวิธาน ซิลวา

มี 30" มันคงจะหายากจริงๆ mislabels เช่นนี้ " ยิลมาส กล่าว " ถ้าฉันต้องทำมันด้วยตนเอง ผมจะต้องสร้างต้นไม้ phylogenetic ซ้ำแล้วซ้ำอีก ดีกว่า "

ความสำเร็จสำหรับขั้นตอนวิธีการเริ่มต้นเพื่อทำลายลงในชนิดระดับ แต่สกุลมันค่อนข้างแน่นอน ระบุปลอมลำดับถึง 98 เปอร์เซ็นต์ความแม่นยํา อเล็กซี คอซลอฟกล่าวว่า ,นักศึกษาระดับบัณฑิตศึกษาใน stamatakis ของแล็บ ปัจจุบันโปรแกรมสามารถจัดการเกี่ยวกับ 10 ลำดับ ดังนั้น ดีที่สุดคือใช้กับข้อมูลที่มีขนาดเล็ก นาโบคอฟบอกปรับขึ้นตัวเลขของลำดับเป็นเป้าหมายในอนาคต

ส่วน ncbi ทำให้ความพยายามที่จะทำความสะอาด misidentified ลำดับบริเวณหน่วยงานได้ทำงานภายในและกับหน่วยงานภายนอกเพื่อพัฒนาชุดของลำดับเบส 16S curated เชื่อมโยงกับชนิดของสายพันธุ์และภายในและ spacer ( ITS ) ลำดับอื่นใช้เครื่องหมายในรา " ผู้ที่มีลำดับสำคัญโดยเฉพาะอย่างยิ่งและดูแลทำความสะอาดชุดเพราะพวกเขากำลังใช้โดยมากจะแบ่งสิ่งมีชีวิตของพวกเขา , " กล่าวว่า ลิปแมน

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.