Introduction
Protein sequences are far from a random arrangement of amino acids. Function and history have a great influence on the composition of peptide sequences. It seems intuitive that proteins involved in similar functions would share similar features in their amino acid sequence. Likewise, orthologous sequences show sequence conservation. Even much broader groups, such as developmental proteins, can been shown to possess patterns characteristic of that group (Karlin and Burge 1996; Huntley and Golding 2004).
One of these unusual sequence features is the presence of excess simple sequence in proteins (Wootton and Federhen 1993). Simple sequences can range from highly biased homopolymer tracts and regions enriched primarily for one amino acid, to larger, more complex repetitive structures. Simple repetitive protein sequences are found in all domains of life; however, they are particularly abundant within eukaryotic proteins (Karlin and Burge 1996; Marcotte et al. 1999; Huntley and Golding 2000; Sim and Creamer 2002).
Within the eukaryotes, most homopolymer tracts have been investigated more thoroughly owing to their relative ease of detection. Karlin and Burge (1996) found that both short and long homopeptides are more frequent in developmental proteins than in other classes of proteins. They also found that many proteins containing multiple long homopeptide sequences were involved in nervous system disease and development. Indeed, Huntington disease (Duyao et al. 1993; Snell et al. 1993; Kieburtz et al. 1994), Kennedy dis ease (also known as spinal and bulbar muscular atrophy (La Spada et al. 1991)), dentatorubral pallidoluysian atrophy (Li et al. 1993; Burke et al. 1994; Koide et al. 1994; Nagafuchi et al. 1994), and several spinocerebellar ataxias (Banfi et al. 1994; Kawaguchi et al. 1994; Pulst et al. 1996; David et al. 1997; Zhuchenko et al. 1997; Nakamura et al. 2001; Silveira et al. 2002) contain CAG repeats, which encode polyglutamine tracts.
To investigate this association with repetitive sequence, we previously conducted a survey of neurological and developmental proteins from Homo sapiens and Drosophila melanogaster (Huntley and Golding 2004). Our results confirmed that developmental proteins are indeed enriched for simple sequences but that sequences with neurological function are not. However, many of those sequences considered to be neurological proteins may not be specific to the brain and nervous system. Further, many of the proteins involved in the neurodegenerative disorders may have a normal, nonpathogenic function that remains elusive.
Therefore, to study sequences specific to the brain and nervous system we used EST expression data. In this study, we examine ESTs from the brain and nervous system, which may not have a known function, to determine whether sequences expressed specifically in these tissues are enriched for simple sequences.
Introduction
Protein sequences are far from a random arrangement of amino acids. Function and history have a great influence on the composition of peptide sequences. It seems intuitive that proteins involved in similar functions would share similar features in their amino acid sequence. Likewise, orthologous sequences show sequence conservation. Even much broader groups, such as developmental proteins, can been shown to possess patterns characteristic of that group (Karlin and Burge 1996; Huntley and Golding 2004).
One of these unusual sequence features is the presence of excess simple sequence in proteins (Wootton and Federhen 1993). Simple sequences can range from highly biased homopolymer tracts and regions enriched primarily for one amino acid, to larger, more complex repetitive structures. Simple repetitive protein sequences are found in all domains of life; however, they are particularly abundant within eukaryotic proteins (Karlin and Burge 1996; Marcotte et al. 1999; Huntley and Golding 2000; Sim and Creamer 2002).
Within the eukaryotes, most homopolymer tracts have been investigated more thoroughly owing to their relative ease of detection. Karlin and Burge (1996) found that both short and long homopeptides are more frequent in developmental proteins than in other classes of proteins. They also found that many proteins containing multiple long homopeptide sequences were involved in nervous system disease and development. Indeed, Huntington disease (Duyao et al. 1993; Snell et al. 1993; Kieburtz et al. 1994), Kennedy dis ease (also known as spinal and bulbar muscular atrophy (La Spada et al. 1991)), dentatorubral pallidoluysian atrophy (Li et al. 1993; Burke et al. 1994; Koide et al. 1994; Nagafuchi et al. 1994), and several spinocerebellar ataxias (Banfi et al. 1994; Kawaguchi et al. 1994; Pulst et al. 1996; David et al. 1997; Zhuchenko et al. 1997; Nakamura et al. 2001; Silveira et al. 2002) contain CAG repeats, which encode polyglutamine tracts.
To investigate this association with repetitive sequence, we previously conducted a survey of neurological and developmental proteins from Homo sapiens and Drosophila melanogaster (Huntley and Golding 2004). Our results confirmed that developmental proteins are indeed enriched for simple sequences but that sequences with neurological function are not. However, many of those sequences considered to be neurological proteins may not be specific to the brain and nervous system. Further, many of the proteins involved in the neurodegenerative disorders may have a normal, nonpathogenic function that remains elusive.
Therefore, to study sequences specific to the brain and nervous system we used EST expression data. In this study, we examine ESTs from the brain and nervous system, which may not have a known function, to determine whether sequences expressed specifically in these tissues are enriched for simple sequences.
การแปล กรุณารอสักครู่..
