Research articleIdentification of c

Research article
Identification of conserved regulatory elements by comparative genome analysis
Boris Lenhard*†, Albin Sandelin*†, Luis Mendoza*‡, Pär Engström*,
Niclas Jareborg*§ and Wyeth W Wasserman*¶
Addresses: *Center for Genomics and Bioinformatics, Karolinska Institutet, 171 77 Stockholm, Sweden. ‡Current address: Serono Research and Development, CH-1121 Geneva 20, Switzerland. §Current address: AstraZeneca Research and Development, S-151 85 Södertälje, Sweden.
¶Current address: Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada.
†These authors contributed equally to this work.
Correspondence: Wyeth W Wasserman. E-mail: wyeth@cmmt.ubc.ca
Published: 22 May 2003 Received: 12 December 2002
Revised: 21 March 2003
Journal of Biology 2003, 2:13
Accepted: 8 April 2003
The electronic version of this article is the complete one and can be found online at http://jbiol.com/content/2/2/13
© 2003 Lenhard et al., licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
Abstract

Background: For genes that have been successfully delineated within the human genome sequence, most regulatory sequences remain to be elucidated. The annotation and interpretation process requires additional data resources and significant improvements in computational methods for the detection of regulatory regions. One approach of growing popularity is based on the preferential conservation of functional sequences over the course of evolution by selective pressure, termed ‘phylogenetic footprinting’. Mutations are more likely to be disruptive if they appear in functional sites, resulting in a measurable difference in evolution rates between functional and non-functional genomic segments.
Results: We have devised a flexible suite of methods for the identification and visualization of conserved transcription-factor-binding sites. The system reports those putative transcriptionfactor-binding sites that are both situated in conserved regions and located as pairs of sites in equivalent positions in alignments between two orthologous sequences. An underlying collection of metazoan transcription-factor-binding profiles was assembled to facilitate the study. This approach results in a significant improvement in the detection of transcriptionfactor-binding sites because of an increased signal-to-noise ratio, as demonstrated with two sets of promoter sequences. The method is implemented as a graphical web application, ConSite, which is at the disposal of the scientific community at http://www.phylofoot.org/.
Conclusions: Phylogenetic footprinting dramatically improves the predictive selectivity of bioinformatic approaches to the analysis of promoter sequences. ConSite delivers unparalleled performance using a novel database of high-quality binding models for metazoan transcription factors. With a dynamic interface, this bioinformatics tool provides broad access to promoter analysis with phylogenetic footprinting.

Introduction
The information in genes generally flows from static DNA sequences to active proteins via an RNA intermediary. Depending upon the cellular context of physiological, developmental and environmental inputs, genes are selectively activated via regulatory sequences in the DNA. At their foundation, transcriptional regulatory regions in the human genome are characterized by the presence of target binding sites for transcription factors (TFs). Knowledge of the identity of a mediating TF can give important insights into the function of a gene via inference of the processes or conditions that lead to expression. Research in bioinformatics has developed reliable methods to model the DNA binding specificity of individual TFs. As most eukaryotic TFs tolerate considerable sequence variation in their target sites, simple consensus sequences fail to represent the specificity of binding factors. This realization led to the development of the quantitative representation of binding specificity with position weight matrices [1]. Such matrices can be highly accurate in identifying in vitro target sequences [2], but are insufficiently specific in the identification of sites with in vivo function to provide meaningful predictions [3]. The in vivo binding specificity of a TF depends upon additional properties not modeled by a weight matrix, such as protein-protein interactions, chromatin superstructures and TF concentrations.
Comparison of orthologous gene sequences has emerged as a powerful tool in genome analysis. ‘Phylogenetic footprinting’ [4] provides complementary data to computational predictions, as sequence conservation over evolution highlights segments in genes likely to mediate biological function. The utility of phylogenetic footprinting extends to a broad array of annotation challenges, but it is particularly suited to the identification of sequences with a functional role in the regulation of gene transcription [5,6]. Despite specific successes [7] in studies of gene regulation, the central algorithms for phylogenetic footprinting remain to be optimized and are thus the focus of continuing research. In particular, new algorithms based on phylogenetic footprinting have been presented for the alignment of genomic sequences, data visualization and the identification of exons [8,9]. Algorithms for the analysis of regulatory sequences have addressed the detection of over-represented patterns in the promoters of co-regulated genes [10], and the improved discrimination of regulatory modules [11], as well as comparative studies of orthologous promoters across collections of microbial genomes [12,13].
Here, we introduce a highly specific algorithm, ConSite, for the detection of transcription-factor-binding sites (TFBSs) that is based on phylogenetic footprinting. Three central components underlie the advance: first, a non-redundant set of transcription-factor binding models; second, a suitable alignment algorithm for orthologous non-coding genomic sequences; and third, modular software for the integration of binding-site predictions with analysis of sequence similarity. We show that our approach results in an increased specificity of predicted TFBSs as a result of a significant reduction of noise. The ConSite algorithm is thus particularly suited to the analysis of pairs of orthologous genomic sequences with limited or no experimental annotation of regulatory elements.
Results
A non-redundant set of high-quality transcriptionfactor binding models
Potential TFBSs can be identified within a genomic sequence by well-studied computational approaches based on quantitative profiles describing the binding site characteristics for TFs. The quality of matrix models is dependent upon the number of biochemically determined target sites. While the binding specificities of few eukaryotic TFs are described richly in the literature by multiple in vivo functional sites, a significant number of TF binding profiles have been produced through the application of in vitro target-site detection assays [14]. We collected available data of both types from the biological literature to construct 108 nonredundant high-quality profiles [15]. The profiles are derived from the super-classes vertebrates, insects or plants, but the majority (65%) of matrices model the binding of human or rodent factors. As the majority of the profiles originate from site-selection assays, the average number of TFBSs contributing to each profile is a robust 31.2 sites per model. Information content, in terms of bits of information, is commonly used within bioinformatics to describe the overall specificity of a profile. The models in the collection range in information content from 5.6 to 26.2 bits, with an average of 12.1 bits. All models are hyperlinked to corresponding sequence accession numbers and the PubMed abstract for the article describing the binding study.
Integrating binding-site prediction with analysis of sequence conservation in orthologous genomic sequences
Phylogenetic footprinting provides data complementary to binding-site predictions, for the analysis of gene regulation. The simple hypothesis that motivates phylogenetic footprinting is that important functional sequences will be under selective pressure to be retained over moderate periods of evolution. The classification of sequences as conserved or freely evolving (as proposed by Kimura [16]) is not yet a quantitative process. It should be noted that evolutionary rates vary dramatically between genes and the choice of species is an important consideration in phylogenetic footprinting studies. Too great an evolutionary distance can result in regulatory alterations or difficulty in aligning short patches of similarity between long sequences. Inadequate evolutionary distance does not significantly improve the overall specificity of predictions. We have developed the ConSite method to integrate phylogenetic footprinting with profile-based predictions of TFBSs, in order to achieve specific predictions of functional regulatory elements in genes. As an example of the influence of species selection on the qualitative performance of the system, the human globin promoter was compared to a diverse range of orthologs (Figure 1).
In this report, we focus on human-rodent comparisons, as several studies have suggested that only a small portion (1720%) of non-coding regions are conserved (on average) at this evolutionary distance [10,17]. Furthermore, similarity is punctuated, with distinguishable segments of high similarity flanked by regions of apparently random sequence (roughly 33% nucleotide identity is observed between random genomic sequences, with wide variations dependent upon the applied alignment algorithm, settings, and sequence characteristics [18]). This compartmentalized pattern

Research article
Identification of conserved regulatory elements by comparative genome analysis
Boris Lenhard*†, Albin Sandelin*†, Luis Mendoza*‡, Pär Engström*,
Niclas Jareborg*§ and Wyeth W Wasserman*¶
Addresses: *Center for Genomics and Bioinformatics, Karolinska Institutet, 171 77 Stockholm, Sweden. ‡Current address: Serono Research and Development, CH-1121 Geneva 20, Switzerland. §Current address: AstraZeneca Research and Development, S-151 85 Södertälje, Sweden.
¶Current address: Centre for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver, BC V5Z 4H4, Canada. 
†These authors contributed equally to this work.
Correspondence: Wyeth W Wasserman. E-mail: wyeth@cmmt.ubc.ca
Published: 22 May 2003 Received: 12 December 2002
Revised: 21 March 2003
Journal of Biology 2003, 2:13
Accepted: 8 April 2003
The electronic version of this article is the complete one and can be found online at http://jbiol.com/content/2/2/13
© 2003 Lenhard et al., licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
Abstract
 
Background: For genes that have been successfully delineated within the human genome sequence, most regulatory sequences remain to be elucidated. The annotation and interpretation process requires additional data resources and significant improvements in computational methods for the detection of regulatory regions. One approach of growing popularity is based on the preferential conservation of functional sequences over the course of evolution by selective pressure, termed ‘phylogenetic footprinting’. Mutations are more likely to be disruptive if they appear in functional sites, resulting in a measurable difference in evolution rates between functional and non-functional genomic segments. 
Results: We have devised a flexible suite of methods for the identification and visualization of conserved transcription-factor-binding sites. The system reports those putative transcriptionfactor-binding sites that are both situated in conserved regions and located as pairs of sites in equivalent positions in alignments between two orthologous sequences. An underlying collection of metazoan transcription-factor-binding profiles was assembled to facilitate the study. This approach results in a significant improvement in the detection of transcriptionfactor-binding sites because of an increased signal-to-noise ratio, as demonstrated with two sets of promoter sequences. The method is implemented as a graphical web application, ConSite, which is at the disposal of the scientific community at http://www.phylofoot.org/.
Conclusions: Phylogenetic footprinting dramatically improves the predictive selectivity of bioinformatic approaches to the analysis of promoter sequences. ConSite delivers unparalleled performance using a novel database of high-quality binding models for metazoan transcription factors. With a dynamic interface, this bioinformatics tool provides broad access to promoter analysis with phylogenetic footprinting.
 
Introduction
The information in genes generally flows from static DNA sequences to active proteins via an RNA intermediary. Depending upon the cellular context of physiological, developmental and environmental inputs, genes are selectively activated via regulatory sequences in the DNA. At their foundation, transcriptional regulatory regions in the human genome are characterized by the presence of target binding sites for transcription factors (TFs). Knowledge of the identity of a mediating TF can give important insights into the function of a gene via inference of the processes or conditions that lead to expression. Research in bioinformatics has developed reliable methods to model the DNA binding specificity of individual TFs. As most eukaryotic TFs tolerate considerable sequence variation in their target sites, simple consensus sequences fail to represent the specificity of binding factors. This realization led to the development of the quantitative representation of binding specificity with position weight matrices [1]. Such matrices can be highly accurate in identifying in vitro target sequences [2], but are insufficiently specific in the identification of sites with in vivo function to provide meaningful predictions [3]. The in vivo binding specificity of a TF depends upon additional properties not modeled by a weight matrix, such as protein-protein interactions, chromatin superstructures and TF concentrations. 
Comparison of orthologous gene sequences has emerged as a powerful tool in genome analysis. ‘Phylogenetic footprinting’ [4] provides complementary data to computational predictions, as sequence conservation over evolution highlights segments in genes likely to mediate biological function. The utility of phylogenetic footprinting extends to a broad array of annotation challenges, but it is particularly suited to the identification of sequences with a functional role in the regulation of gene transcription [5,6]. Despite specific successes [7] in studies of gene regulation, the central algorithms for phylogenetic footprinting remain to be optimized and are thus the focus of continuing research. In particular, new algorithms based on phylogenetic footprinting have been presented for the alignment of genomic sequences, data visualization and the identification of exons [8,9]. Algorithms for the analysis of regulatory sequences have addressed the detection of over-represented patterns in the promoters of co-regulated genes [10], and the improved discrimination of regulatory modules [11], as well as comparative studies of orthologous promoters across collections of microbial genomes [12,13].
Here, we introduce a highly specific algorithm, ConSite, for the detection of transcription-factor-binding sites (TFBSs) that is based on phylogenetic footprinting. Three central components underlie the advance: first, a non-redundant set of transcription-factor binding models; second, a suitable alignment algorithm for orthologous non-coding genomic sequences; and third, modular software for the integration of binding-site predictions with analysis of sequence similarity. We show that our approach results in an increased specificity of predicted TFBSs as a result of a significant reduction of noise. The ConSite algorithm is thus particularly suited to the analysis of pairs of orthologous genomic sequences with limited or no experimental annotation of regulatory elements.
Results
A non-redundant set of high-quality transcriptionfactor binding models
Potential TFBSs can be identified within a genomic sequence by well-studied computational approaches based on quantitative profiles describing the binding site characteristics for TFs. The quality of matrix models is dependent upon the number of biochemically determined target sites. While the binding specificities of few eukaryotic TFs are described richly in the literature by multiple in vivo functional sites, a significant number of TF binding profiles have been produced through the application of in vitro target-site detection assays [14]. We collected available data of both types from the biological literature to construct 108 nonredundant high-quality profiles [15]. The profiles are derived from the super-classes vertebrates, insects or plants, but the majority (65%) of matrices model the binding of human or rodent factors. As the majority of the profiles originate from site-selection assays, the average number of TFBSs contributing to each profile is a robust 31.2 sites per model. Information content, in terms of bits of information, is commonly used within bioinformatics to describe the overall specificity of a profile. The models in the collection range in information content from 5.6 to 26.2 bits, with an average of 12.1 bits. All models are hyperlinked to corresponding sequence accession numbers and the PubMed abstract for the article describing the binding study.
Integrating binding-site prediction with analysis of sequence conservation in orthologous genomic sequences
Phylogenetic footprinting provides data complementary to binding-site predictions, for the analysis of gene regulation. The simple hypothesis that motivates phylogenetic footprinting is that important functional sequences will be under selective pressure to be retained over moderate periods of evolution. The classification of sequences as conserved or freely evolving (as proposed by Kimura [16]) is not yet a quantitative process. It should be noted that evolutionary rates vary dramatically between genes and the choice of species is an important consideration in phylogenetic footprinting studies. Too great an evolutionary distance can result in regulatory alterations or difficulty in aligning short patches of similarity between long sequences. Inadequate evolutionary distance does not significantly improve the overall specificity of predictions. We have developed the ConSite method to integrate phylogenetic footprinting with profile-based predictions of TFBSs, in order to achieve specific predictions of functional regulatory elements in genes. As an example of the influence of species selection on the qualitative performance of the system, the human globin promoter was compared to a diverse range of orthologs (Figure 1). 
In this report, we focus on human-rodent comparisons, as several studies have suggested that only a small portion (1720%) of non-coding regions are conserved (on average) at this evolutionary distance [10,17]. Furthermore, similarity is punctuated, with distinguishable segments of high similarity flanked by regions of apparently random sequence (roughly 33% nucleotide identity is observed between random genomic sequences, with wide variations dependent upon the applied alignment algorithm, settings, and sequence characteristics [18]). This compartmentalized pattern

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

บทความวิจัยรหัสขององค์ประกอบบังคับนำโดยวิเคราะห์เปรียบเทียบจีโนมบอริ Lenhard * †, Albin Sandelin * †, เมนโดซา Luis * ‡, Pär Engström *Niclas Jareborg * วา W แท้และไวเอต * ถัดที่อยู่: * ศูนย์ Genomics และ Bioinformatics, Karolinska Institutet, 171 77 สต็อกโฮล์ม สวีเดน ‡Current ที่อยู่: โซวิจัยและพัฒนา เจนีวาป.ณ. 1121 CH 20 สวิตเซอร์แลนด์ §Current ที่อยู่: AstraZeneca วิจัยและพัฒนา อย่างไร Södertälje 85 S-151 สวีเดน¶Current ที่อยู่: ศูนย์การแพทย์ระดับโมเลกุลและ Therapeutics มหาวิทยาลัยของรัฐบริติชโคลัมเบีย แวนคูเวอร์ BC V5Z 4H 4 แคนาดา ผู้เขียน †These ส่วนเท่า ๆ กันเพื่องานนี้ติดต่อ: ไวเอต W วา อีเมล์: wyeth@cmmt.ubc.caเผยแพร่: 22 2003 พฤษภาคมรับ: 12 2002 ธันวาคมแก้ไข: 21 2003 มีนาคมสมุดรายวันของชีววิทยา 2003, 2:13ยอมรับ: 8 2003 เมษายนรุ่นอิเล็กทรอนิกส์ของบทความนี้ได้เสร็จสมบูรณ์ และสามารถพบออนไลน์ที่ http://jbiol.com/content/2/2/13© 2003 Lenhard et al. ผู้รับใบอนุญาต BioMed เซ็นทรัล จำกัด นี่คือบทความเข้าเปิด: คัดลอกและซอร์สของบทความนี้ทุกตัวอักษรจะได้รับอนุญาตในทุกสื่อเพื่อวัตถุประสงค์ใด ๆ โดยประกาศนี้จะถูกรักษาไว้พร้อมกับ URL ของบทความต้นฉบับบทคัดย่อ พื้นหลัง: สำหรับยีนที่มี delineated อย่างเรียบร้อยภายในลำดับมมนุษย์แล้ว สุดทางลำดับยังคงเป็น elucidated การอธิบายและตีความต้องมีแหล่งข้อมูลเพิ่มเติมและปรับปรุงที่สำคัญในวิธีการคำนวณตรวจกำกับดูแลภูมิภาค วิธีการหนึ่งของการเติบโตความนิยมจะขึ้นอยู่กับการอนุรักษ์ต้องลำดับการทำงานในช่วงวิวัฒนาการ โดยใช้ความดัน เรียกว่า 'phylogenetic footprinting' กลายพันธุ์มีแนวโน้มที่จะปรากฏในไซต์งาน ผลต่างวัดราคาวิวัฒนาการระหว่างทำงาน และไม่ใช่หน้าที่ส่วน genomic ขวัญ ผลลัพธ์: เรามีกำหนดชุดยืดหยุ่นวิธีสำหรับการระบุและแสดงภาพประกอบเพลงของอเมริกา transcription ผูกปัจจัยนำ ระบบรายงานเหล่านั้นผูก transcriptionfactor ไซต์ putative ที่ทั้งแห่งภูมิภาคนำ และอยู่เป็นคู่ในตำแหน่งเทียบเท่าในจัดแนวระหว่างสอง orthologous ลำดับ ชุดเป็นต้นแบบของ metazoan transcription ปัจจัยรวมค่าถูกรวบรวมเพื่อให้ง่ายต่อการศึกษา วิธีการนี้ผลในการปรับปรุงที่สำคัญในการตรวจพบเว็บไซต์ transcriptionfactor รวมเนื่องจากอัตราการเพิ่มสัญญาณเสียง ดังกับโปรโมเตอร์ลำดับสองชุด วิธีการคือนำมาใช้เป็นโปรแกรมประยุกต์กราฟิกเว็บ ConSite ซึ่งเป็นที่ทิ้งของชุมชนวิทยาศาสตร์ที่ http://www.phylofoot.org/บทสรุป: Phylogenetic footprinting อย่างมากปรับปรุงวิธีการมอบของแนวการวิเคราะห์ของโปรโมเตอร์ลำดับผลเชิงวิวัฒนาการ ConSite ให้ระบุประสิทธิภาพการนวนิยายคุณภาพรวมรุ่น metazoan transcription ปัจจัย มีอินเทอร์เฟซแบบไดนามิก เครื่องมือนี้ bioinformatics ให้กว้างถึงวิเคราะห์โปรโมเตอร์ด้วย phylogenetic footprinting แนะนำข้อมูลในยีนโดยทั่วไปจะไหลจากลำดับดีเอ็นเอคงไปโปรตีนผ่านอาร์เอ็นเอเป็นตัวกลางที่ใช้งานอยู่ ขึ้นอยู่กับบริบทของอินพุตสรีรวิทยา พัฒนา และสิ่งแวดล้อมโทรศัพท์มือถือ ยีนเลือกเรียกใช้งานผ่านลำดับระเบียบในดีเอ็นเอ ที่ของมูลนิธิ ภูมิภาค transcriptional กำกับดูแลในกลุ่มมนุษย์มีลักษณะ โดยสถานะของไซต์รวมเป้าหมายปัจจัย transcription (TFs) ความรู้ในตัวตนของ mediating TF จะสำคัญเจาะลึกการทำงานของยีนผ่านข้อของกระบวนการหรือเงื่อนไขที่ทำให้นิพจน์ วิจัยใน bioinformatics ได้พัฒนาวิธีการเชื่อถือแบบ specificity การรวมดีเอ็นเอของ TFs ละ เป็น TFs eukaryotic สุดทนการเปลี่ยนแปลงลำดับจำนวนมากในเว็บไซต์ของพวกเขาเป้าหมาย มติเรื่องลำดับไม่ถึง specificity ของผูกปัจจัย สำนึกนี้นำไปสู่การพัฒนาของการแสดงเชิงปริมาณของผูก specificity กับเมทริกซ์น้ำหนักตำแหน่ง [1] เมทริกซ์ดังกล่าวได้อย่างแม่นยำสูงในการระบุเป้าหมายในลำดับ [2], แต่เฉพาะ insufficiently ในรหัสของไซต์ด้วยฟังก์ชันในสัตว์ทดลองให้คาดคะเนความหมาย [3] Specificity ผูกในสัตว์ทดลองของรหัสขึ้นไม่จำลอง โดยเมทริกซ์น้ำหนัก โปรตีนโปรตีนโต้ โครมาติน superstructures และความเข้มข้นรหัสคุณสมบัติเพิ่มเติม เปรียบเทียบลำดับยีน orthologous ได้ผงาดขึ้นเป็นเครื่องมือที่มีประสิทธิภาพในการวิเคราะห์จีโนม 'Phylogenetic footprinting' [4] ให้ข้อมูลเพิ่มเติมเพื่อการคาดคะเนคำนวณ เป็นเซ็กเมนต์ในยีนอาจบรรเทาฟังก์ชันชีวภาพเน้นอนุรักษ์ลำดับผ่านวิวัฒนาการ ของ phylogenetic footprinting ขยายไปท้าทายคำอธิบายกว้าง แต่ก็เหมาะอย่างยิ่งกับการระบุลำดับกับบทบาทหน้าที่ในการควบคุมของยีนราชบัณฑิตยสถาน [5,6] แม้ มีเฉพาะความสำเร็จ [7] ในการศึกษาการควบคุมยีน อัลกอริทึมกลางสำหรับ phylogenetic footprinting ยังสามารถปรับให้เหมาะสม และจุดเน้นของการวิจัยอย่างต่อเนื่อง โดยเฉพาะ อัลกอริทึมใหม่ที่ยึดตาม phylogenetic footprinting ได้ถูกแสดงในตำแหน่งลำดับ genomic นำเสนอภาพข้อมูล และรหัสของ exons [8,9] อัลกอริทึมสำหรับการวิเคราะห์ลำดับระเบียบได้ส่งตรวจรูปแบบเกิน represented ในก่อการร่วมควบคุมยีน [10], และเลือกปฏิบัติปรับปรุงโมดูกำกับดูแล [11], รวมทั้งศึกษาเปรียบเทียบการก่อ orthologous ในคอลเลกชันของจุลินทรีย์ genomes [12,13]ที่นี่ เราแนะนำเป็นการอัลกอริทึม ConSite ตรวจ transcription-ปัจจัยรวมเว็บไซต์ (TFBSs) ที่อยู่ phylogenetic footprinting ส่วนประกอบกลางสามรวบล่วงหน้า: ครั้งแรก transcription ปัจจัยรวมรุ่น ชุดที่ไม่ซ้ำซ้อน สอง อัลกอริทึมการจัดตำแหน่งที่เหมาะสมสำหรับ orthologous ไม่ใช่รหัส genomic ลำดับเรีย และซอฟต์แวร์สาม โมดุลสำหรับการรวมของการคาดคะเนรวมไซต์กับวิเคราะห์ความคล้ายคลึงกันของลำดับ เราดูที่ของเราวิธีการผลลัพธ์ใน specificity การเพิ่มขึ้นของ TFBSs คาดการณ์เป็นผลมาจากการลดลงอย่างมีนัยสำคัญของเสียง อัลกอริทึม ConSite จึงเหมาะสมโดยเฉพาะอย่างยิ่งการวิเคราะห์ของคู่ลำดับ genomic orthologous มีจำกัดหรือไม่คำอธิบายทดลองขององค์ประกอบทางผลลัพธ์ชุดที่ไม่ซ้ำซ้อนรุ่นผูก transcriptionfactor คุณภาพสูงสามารถระบุในลำดับที่ genomic TFBSs อาจเกิดขึ้น โดยเชิญ studied วิธีคำนวณค่าเชิงปริมาณที่อธิบายลักษณะไซต์ผูกสำหรับ TFs แบบ คุณภาพของแบบจำลองเมตริกซ์มีจำนวนไซต์ที่ biochemically กำหนดเป้าหมายขึ้น ในขณะที่ specificities รวมของ TFs eukaryotic น้อยอธิบายมั่งคั่งในวรรณคดีโดยหลายไซต์ทำงานในสัตว์ทดลอง จำนวนมากของโพรไฟล์การผูกรหัสได้ถูกผลิตผ่านแอพลิเคชันของไซต์ในเป้าหมายตรวจ assays [14] เราเก็บรวบรวมข้อมูลของทั้งสองชนิดจากเอกสารข้อมูลทางชีวภาพเพื่อสร้าง 108 nonredundant คุณภาพสูงโพรไฟล์ [15] โพรไฟล์มาจากซุปเปอร์คลา vertebrates แมลง หรือพืช แต่ส่วนใหญ่ (65%) ของเมทริกซ์รุ่นผูกปัจจัยมนุษย์ หรือ rodent ส่วนใหญ่ของโพรไฟล์เริ่มต้นจากเลือกไซต์ assays จำนวนเฉลี่ยของ TFBSs ที่สนับสนุนแต่ละโพรไฟล์เป็นอเมริกาเท่ากับ 31.2 แข็งแกร่งต่อรุ่น โดยทั่วไปข้อมูลเนื้อหา ในรูปแบบของบิตของข้อมูล ใช้ภายใน bioinformatics อธิบาย specificity โดยรวมของโพรไฟล์ รูปแบบในช่วงเก็บในข้อมูลเนื้อหาจาก 5.6 26.2 บิต บิตที่ 12.1 โดยเฉลี่ย แบบจำลองทั้งหมดเชื่อมโยงหลายมิติหมายเลขทะเบียนลำดับที่สอดคล้องกันและบทคัดย่อ PubMed สำหรับบทความที่อธิบายการศึกษารวมรวมรวมเว็บไซต์ทำนายกับวิเคราะห์การอนุรักษ์ลำดับในลำดับ genomic orthologousPhylogenetic footprinting ให้ข้อมูลเพิ่มเติมเพื่อการคาดคะเนรวมเว็บไซต์ การวิเคราะห์ยีนควบคุม สมมติฐานง่าย ๆ ที่ phylogenetic footprinting แรงบันดาลใจคือ จะเป็นลำดับสำคัญทำงานภายใต้ความดันเลือกที่จะคงผ่านวิวัฒนาการระยะปานกลาง การจัดประเภทของลำดับเป็นอยู่ หรือเกิดขึ้นได้อย่างอิสระ (ตามที่เสนอโดยคิมุระโย [16]) ยังไม่เป็นกระบวนการเชิงปริมาณ มันควรจะตั้งข้อสังเกตว่า ราคาวิวัฒนาการที่แตกต่างกันอย่างมากระหว่างยีน และเลือกพันธุ์เป็นการพิจารณาที่สำคัญในการศึกษา phylogenetic footprinting ระยะทางวิวัฒนาการที่มากเกินไปอาจส่งผลในการเปลี่ยนแปลงข้อบังคับหรือความยากลำบากในการจัดตำแหน่งโปรแกรมสั้น ๆ ของความคล้ายคลึงกันระหว่างลำดับยาว ระยะทางวิวัฒนาการที่ไม่เพียงพอไม่พัฒนา specificity โดยรวมของการคาดคะเนอย่างมีนัยสำคัญ เราได้พัฒนาวิธีการ ConSite เพื่อรวม phylogenetic footprinting กับโพรไฟล์ตามคาดคะเนของ TFBSs เพื่อคาดคะเนเฉพาะขององค์ประกอบที่กำกับดูแลงานในยีน เป็นตัวอย่างของอิทธิพลของชนิดตัวเลือกประสิทธิภาพเชิงคุณภาพของระบบ โปรโมเตอร์ globin มนุษย์ถูกเปรียบเทียบกับหลากหลายของ orthologs (1 รูป) ในรายงานนี้ เราเน้นเปรียบเทียบบุคคลหนู เป็นหลายการศึกษาได้แนะนำที่ เดียวขนาดเล็กส่วน (1720%) ของ ภูมิภาคไม่ใช่รหัสมีอยู่ (โดยเฉลี่ย) ในระยะนี้วิวัฒนาการ [10,17] นอกจากนี้ เป็น punctuated ความคล้ายคลึงกัน มีส่วนแตกต่างของสูงคล้ายนักภูมิภาคของลำดับตัวอย่างเห็นได้ชัด (ประมาณ 33% นิวคลีโอไทด์ตัวจะสังเกตระหว่างแบบสุ่มลำดับ genomic ด้วยรูปแบบที่กว้างขึ้นตำแหน่งใช้อัลกอริทึม การตั้งค่า และลักษณะลำดับ [18]) รูปแบบนี้ compartmentalized

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

บทความวิจัย
ประจำตัวขององค์ประกอบการกำกับดูแลอนุรักษ์โดยการวิเคราะห์จีโนมเปรียบเทียบ
บอริส Lenhard * †, บิน Sandelin * †หลุยส์เมนโดซา * ‡พาร์Engström *
* * * * Niclas Jareborg §และไวเอท W Wasserman * ¶
ที่อยู่: * ศูนย์ฟังก์ชั่นและชีวสารสนเทศศาสตร์, Karolinska Institutet 171 77 สตอกโฮล์มสวีเดน ที่อยู่ปัจจุบัน‡: Serono การวิจัยและพัฒนา, CH-1121 เจนีวา 20, วิตเซอร์แลนด์ ที่อยู่§Current: แอสตร้าวิจัยและพัฒนา, S-151 85 Södertälje, สวีเดน.
ที่อยู่¶Current. ศูนย์การแพทย์ระดับโมเลกุลและ Therapeutics, มหาวิทยาลัยบริติชโคลัมเบีย, Vancouver, BC V5Z 4H4, แคนาดา
†ผู้เขียนเหล่านี้มีส่วนอย่างเท่าเทียมกันในการทำงานนี้
จดหมาย: ไวเอท W Wasserman E-mail: wyeth@cmmt.ubc.ca
เผยแพร่: 22 พฤษภาคม 2003 ที่ได้รับ: 12 ธันวาคม 2002
แก้ไข: 21 มีนาคม 2003
วารสารชีววิทยา 2003 02:13
รับการยอมรับ: 8 เมษายน 2003
รุ่นอิเล็กทรอนิกส์ของบทความนี้คือหนึ่งที่สมบูรณ์และ สามารถพบได้ทั่วไปที่ http://jbiol.com/content/2/2/13
. © 2003 Lenhard และคณะผู้รับใบอนุญาต BioMed เซ็นทรัล จำกัด นี้เป็นบทความการเข้าถึงเปิดบริการ: การคัดลอกคำต่อคำและการกระจายของบทความนี้จะได้รับอนุญาตในทุก สื่อเพื่อวัตถุประสงค์ใด ๆ ให้แจ้งให้ทราบล่วงหน้านี้จะถูกเก็บรักษาไว้พร้อมกับ URL ที่เป็นต้นฉบับของบทความ.
บทคัดย่อพื้นหลัง: สำหรับยีนที่ได้รับการประสบความสำเร็จในคดีที่อยู่ในลำดับจีโนมมนุษย์ลำดับกฎระเบียบส่วนใหญ่ยังคงที่จะอธิบาย ขั้นตอนการบันทึกย่อและการตีความต้องใช้ทรัพยากรข้อมูลเพิ่มเติมและการปรับปรุงที่สำคัญในวิธีการคำนวณการตรวจหาภูมิภาคกฎระเบียบ วิธีการหนึ่งของความนิยมที่เพิ่มขึ้นจะขึ้นอยู่กับการอนุรักษ์พิเศษของลำดับการทำงานในช่วงเวลาของการวิวัฒนาการโดยความดันเลือกที่เรียกว่า 'footprinting สายวิวัฒนาการ' การกลายพันธุ์ที่มีแนวโน้มที่จะมีความยุ่งยากหากพวกเขาปรากฏในเว็บไซต์การทำงานที่เกิดขึ้นในความแตกต่างที่วัดได้ในอัตราการวิวัฒนาการระหว่างกลุ่มจีโนมการทำงานและไม่ทำงาน. ผล: เราได้คิดค้นชุดที่มีความยืดหยุ่นของวิธีการสำหรับการระบุและแสดงภาพของ transcription- อนุรักษ์ เว็บไซต์ปัจจัยที่มีผลผูกพัน ระบบรายงานเว็บไซต์เหล่านั้นสมมุติผูกพัน transcriptionfactor ว่ามีทั้งที่ตั้งอยู่ในภูมิภาคอนุรักษ์และตั้งอยู่เป็นคู่ของเว็บไซต์ในตำแหน่งเทียบเท่าในการจัดแนวระหว่างสองลำดับ orthologous คอลเลกชันพื้นฐานของเกี่ยวกับ METAZOA โปรไฟล์ถอดความปัจจัยที่มีผลผูกพันประกอบเพื่อความสะดวกในการศึกษา นี้ส่งผลให้วิธีการในการปรับปรุงที่สำคัญในการตรวจสอบของเว็บไซต์ transcriptionfactor ผูกพันเนื่องจากการเพิ่มขึ้นของสัญญาณต่อเสียงรบกวนอัตราส่วนที่แสดงให้เห็นกับสองชุดของลำดับโปรโมเตอร์ วิธีการจะดำเนินการเป็นโปรแกรมเว็บกราฟิก ConSite ซึ่งอยู่ที่การกำจัดของชุมชนวิทยาศาสตร์ที่ http://www.phylofoot.org/. สรุป: Phylogenetic footprinting อย่างรวดเร็วช่วยเพิ่มการคาดการณ์การเลือกของวิธีการทางชีววิทยาการวิเคราะห์ของผู้ก่อการ ลำดับ ConSite มอบประสิทธิภาพที่เหนือชั้นโดยใช้ฐานข้อมูลนวนิยายที่มีคุณภาพสูงรุ่นที่มีผลผูกพันสำหรับการถอดความปัจจัยเกี่ยวกับ METAZOA กับอินเตอร์เฟซแบบไดนามิกเครื่องมือรสนี้ให้การเข้าถึงในวงกว้างการวิเคราะห์ก่อการกับ footprinting phylogenetic. บทนำข้อมูลในยีนทั่วไปเงินสดจากลำดับดีเอ็นเอคงโปรตีนที่ใช้งานผ่านตัวกลางอาร์เอ็นเอ ทั้งนี้ขึ้นอยู่กับบริบทของเซลล์สรีรวิทยาปัจจัยการผลิตการพัฒนาและสิ่งแวดล้อม, ยีนจะเปิดใช้งานผ่านการคัดเลือกลำดับกำกับดูแลในดีเอ็นเอ รากฐานของพวกเขาภูมิภาคกฎระเบียบในการถอดรหัสจีโนมมนุษย์มีความโดดเด่นด้วยการปรากฏตัวของเป้าหมายเว็บไซต์ผูกพันสำหรับการถอดความปัจจัย (TFS) ความรู้เกี่ยวกับตัวตนของ TF mediating สามารถให้ข้อมูลเชิงลึกที่สำคัญในการทำงานของยีนที่ผ่านการอนุมานของกระบวนการหรือเงื่อนไขที่นำไปสู่การแสดงออก การวิจัยในชีวสารสนเทศได้มีการพัฒนาวิธีการที่เชื่อถือได้ในการจำลองความจำเพาะผูกพันดีเอ็นเอของบุคคล TFS ในฐานะที่เป็น TFS eukaryotic ที่สุดทนต่อการเปลี่ยนแปลงลำดับมากในสถานที่เป้าหมายของพวกเขาลำดับฉันทามติง่ายล้มเหลวที่จะเป็นตัวแทนของความจำเพาะของปัจจัยที่มีผลผูกพัน สำนึกนี้นำไปสู่การพัฒนาของการเป็นตัวแทนเชิงปริมาณของความจำเพาะผูกพันกับเมทริกซ์น้ำหนักตำแหน่ง [1] การฝึกอบรมดังกล่าวจะมีความแม่นยำสูงในการระบุเป้าหมายในลำดับหลอดทดลอง [2] แต่มีความเฉพาะเจาะจงไม่เพียงพอในตัวของเว็บไซต์ที่มีฟังก์ชั่นในร่างกายเพื่อให้การคาดการณ์ที่มีความหมาย [3] ในร่างกายมีผลผูกพันจำเพาะของ TF ขึ้นอยู่กับคุณสมบัติเพิ่มเติมที่ไม่มีรูปแบบโดยเมทริกซ์น้ำหนักเช่นปฏิกริยาระหว่างโปรตีน, superstructures โครมาติและความเข้มข้น TF. เปรียบเทียบลำดับยีน orthologous ได้กลายเป็นเครื่องมือที่มีประสิทธิภาพในการวิเคราะห์จีโนม 'วิวัฒนาการ footprinting' [4] ให้ข้อมูลประกอบกับการคาดการณ์การคำนวณ, การอนุรักษ์ลำดับวิวัฒนาการมากกว่าส่วนไฮไลท์ในยีนแนวโน้มที่จะเป็นสื่อกลางในการทำงานทางชีวภาพ ยูทิลิตี้ของ footprinting phylogenetic ขยายไปยังหลากหลายของความท้าทายคำอธิบายประกอบ แต่มันเป็นโดยเฉพาะอย่างยิ่งเหมาะกับบัตรประจำตัวของลำดับที่มีบทบาทในการควบคุมการทำงานของยีนถอดความ [5,6] แม้จะมีความสำเร็จที่เฉพาะเจาะจง [7] ในการศึกษายีนควบคุมกลไกกลางสำหรับ footprinting phylogenetic ยังคงที่จะเพิ่มประสิทธิภาพและจึงมุ่งเน้นการวิจัยอย่างต่อเนื่อง โดยเฉพาะอย่างยิ่งขั้นตอนวิธีการใหม่บนพื้นฐานของ footprinting สายวิวัฒนาการได้รับการนำเสนอสำหรับการจัดตำแหน่งของลำดับจีโนมการแสดงข้อมูลและการจำแนก exons [8,9] อัลกอริทึมสำหรับการวิเคราะห์ของลำดับการกำกับดูแลที่มีการตรวจสอบรูปแบบมากกว่าที่แสดงในโปรโมเตอร์ของยีนที่ควบคุมร่วม [10] และการเลือกปฏิบัติที่ดีขึ้นของโมดูลการกำกับดูแล [11] เช่นเดียวกับการศึกษาเปรียบเทียบของโปรโมเตอร์ orthologous ทั่วคอลเลกชันของ จีโนมของจุลินทรีย์ [12,13]. ที่นี่เราแนะนำวิธีเฉพาะสูง ConSite สำหรับการตรวจสอบของเว็บไซต์ถอดความปัจจัยที่มีผลผูกพัน (TFBSs) ที่อยู่บนพื้นฐาน footprinting สายวิวัฒนาการ สามองค์ประกอบของกลางรองรับล่วงหน้าแรกชุดที่ไม่ซ้ำซ้อนของการถอดความปัจจัยที่มีผลผูกพันรุ่น; สองขั้นตอนวิธีการจัดตำแหน่งที่เหมาะสมสำหรับการที่ไม่ได้เข้ารหัส orthologous ลำดับจีโนม; และสามซอฟต์แวร์แบบแยกส่วนสำหรับการรวมของการคาดการณ์ผลผูกพันสถานที่ที่มีการวิเคราะห์ความคล้ายคลึงกันตามลำดับ เราแสดงให้เห็นว่าวิธีการของเราส่งผลให้ความจำเพาะที่เพิ่มขึ้นของการคาดการณ์ TFBSs เป็นผลมาจากการลดความสำคัญของเสียง อัลกอริทึม ConSite จึงเหมาะอย่างยิ่งกับการวิเคราะห์ของคู่ของลำดับจีโนม orthologous มีอยู่อย่าง จำกัด หรือไม่มีคำอธิบายประกอบการทดลองขององค์ประกอบการกำกับดูแล. ผลชุดที่ไม่ซ้ำซ้อนของ transcriptionfactor ที่มีคุณภาพสูงมีผลผูกพันรุ่นที่มีศักยภาพ TFBSs สามารถระบุได้ภายในลำดับจีโนมโดยดี วิธีการคำนวณ -studied ขึ้นอยู่กับรูปแบบเชิงปริมาณอธิบายลักษณะเว็บไซต์ผูกพันสำหรับ TFS คุณภาพของรูปแบบเมทริกซ์จะขึ้นอยู่กับจำนวนของการพิจารณาคุณสมบัติทางชีวเคมีเว็บไซต์เป้าหมาย ในขณะที่ความจำเพาะผูกพันของ TFS eukaryotic ไม่กี่อธิบายไว้อย่างหรูหราในวรรณคดีโดยหลายเว็บไซต์ในร่างกายทำงานเป็นจำนวนที่มีนัยสำคัญของ TF โปรไฟล์ผูกพันได้รับการผลิตผ่านการประยุกต์ใช้ในหลอดทดลองการตรวจการตรวจจับเป้าหมายสถานที่ [14] เราเก็บรวบรวมข้อมูลที่มีอยู่ของทั้งสองชนิดจากวรรณกรรมทางชีวภาพในการสร้าง 108 nonredundant โปรไฟล์ที่มีคุณภาพสูง [15] โปรไฟล์จะได้มาจากสัตว์ที่มีกระดูกสันหลังซุปเปอร์คลาสแมลงหรือพืช แต่ส่วนใหญ่ (65%) การฝึกอบรมแบบมีผลผูกพันของปัจจัยของมนุษย์หรือสัตว์ฟันแทะ ในฐานะที่เป็นส่วนใหญ่ของโปรไฟล์มาจากการวิเคราะห์เว็บไซต์เลือกค่าเฉลี่ยของจำนวน TFBSs เอื้อต่อรายละเอียดแต่ละที่แข็งแกร่ง 31.2 เว็บไซต์ต่อรูปแบบ เนื้อหาข้อมูลในแง่ของบิตของข้อมูลที่เป็นที่นิยมใช้ภายในชีวสารสนเทศเพื่ออธิบายความจำเพาะโดยรวมของรายละเอียด รูปแบบในช่วงการเก็บรวบรวมข้อมูลในเนื้อหา 5.6-26.2 บิตที่มีค่าเฉลี่ยของ 12.1 บิต ทุกรุ่นที่มีไฮเปอร์ลิงก์ไปสอดคล้องหมายเลขภาคยานุวัติลำดับและ PubMed นามธรรมสำหรับบทความที่อธิบายถึงการศึกษาผลผูกพัน. การบูรณาการทำนายผลผูกพันสถานที่ที่มีการวิเคราะห์การอนุรักษ์ลำดับในลำดับจีโนม orthologous Phylogenetic footprinting ให้ข้อมูลประกอบกับการคาดการณ์ผลผูกพันสถานที่สำหรับการวิเคราะห์ ยีนควบคุม สมมติฐานง่ายที่กระตุ้น footprinting สายวิวัฒนาการคือลำดับการทำงานที่สำคัญจะอยู่ภายใต้ความดันเลือกที่จะถูกเก็บรักษาไว้ในระดับปานกลางในช่วงเวลาของการวิวัฒนาการ การจัดหมวดหมู่ของลำดับเป็นอนุรักษ์หรือพัฒนาได้อย่างอิสระ (ที่เสนอโดยคิมูระ [16]) ยังไม่ได้เป็นกระบวนการเชิงปริมาณ มันควรจะสังเกตว่าอัตราการวิวัฒนาการแตกต่างกันอย่างมากระหว่างยีนและทางเลือกของสายพันธุ์คือการพิจารณาที่สำคัญในการศึกษา footprinting สายวิวัฒนาการ มากเกินไประยะวิวัฒนาการจะส่งผลในการปรับเปลี่ยนกฎระเบียบหรือความยากลำบากในการจัดตำแหน่งแพทช์สั้นของความคล้ายคลึงกันระหว่างลำดับนาน ระยะทางวิวัฒนาการไม่เพียงพอไม่ได้มีนัยสำคัญเพิ่มความจำเพาะโดยรวมของการคาดการณ์ เราได้พัฒนาวิธีการที่จะบูรณา ConSite footprinting สายวิวัฒนาการกับการคาดการณ์รายละเอียดตาม TFBSs ในการสั่งซื้อเพื่อให้บรรลุการคาดการณ์ที่เฉพาะเจาะจงขององค์ประกอบการกำกับดูแลการทำงานในยีน เป็นตัวอย่างของอิทธิพลของการเลือกสายพันธุ์ที่ผลการดำเนินงานเชิงคุณภาพของระบบการก่อการ globin มนุษย์เมื่อเทียบกับความหลากหลายของ orthologs (รูปที่ 1). ในรายงานฉบับนี้เรามุ่งเน้นการเปรียบเทียบมนุษย์หนูเช่นการศึกษาหลายมี ชี้ให้เห็นว่าเพียงส่วนเล็ก ๆ (1720%) ในภูมิภาคที่ไม่ได้เข้ารหัสป่าสงวน (โดยเฉลี่ย) ที่ระยะวิวัฒนาการนี้ [10,17] นอกจากความคล้ายคลึงกันคือคั่นด้วยส่วนความแตกต่างของความคล้ายคลึงกันสูงขนาบข้างด้วยภูมิภาคของสุ่มลำดับเห็นได้ชัด (ประมาณตัวตนของนิวคลีโอ 33% เป็นที่สังเกตระหว่างลำดับจีโนมสุ่มด้วยรูปแบบที่หลากหลายขึ้นอยู่กับขั้นตอนวิธีการจัดตำแหน่งที่ใช้การตั้งค่าและลักษณะลำดับ [18] ) รูปแบบนี้ compartmentalized

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.