shows the five most abundant transcripts for each of the three libraries. The SSH procedure conducted on phenanthrene exposed animals appeared efficient. Of the top five phenanthrene clusters three show high similarities to monooxygenases of the cytochrome P450 enzyme family, which are known to be involved in phase I biotransformation of lipophilic substances such as phenanthrene [36]. The two other clusters show homology to other monooxygenases, and might be involved in phase I metabolism as well. The results for the cadmium library are less straightforward. Two of the five most abundant clusters remain un-annotated, and two clusters show resemblance to accessions that are not from animal origin. Note that one of those two latter clusters (cluster Fcc00170) occurred in all three libraries (Table 3). As with the 'bacterial clusters', those clusters are currently not discarded from the database and are submitted to GenBank. Supplementary experiments will be conducted to determine the exact origin of those clusters, and whether or not they represent contaminants.
The absence of highly expressed house-keeping genes among the five most abundant transcripts in the normalized library, suggests that the normalization procedure was successful. Without normalization more highly abundant transcripts, like tubulins, ribosomal proteins and actins, would have been sequenced (e.g. [37]). Although these sequences are present in the dataset, they do not form the list of most abundantly sequenced transcripts. For example, more than 40 ribosomal protein sequences were obtained (e.g. cluster Fcc02740), but most of these were represented by only one or two ESTs.
The prot4EST [38] script was applied to infer protein sequences (excluding the DECODER program). Putative open reading frames of the total dataset ranged between 23 and 440 amino-acids, and had an average length of 115 amino-acids. The amino acid sequences were annotated with Gene Ontology terms (GO; http://geneontology.org) using the PartiGene [27] annot8r_blast2GO script (Schmid and Blaxter, personal comm.; [39]). An overview of the results of these analyses is given in Table 5. Of the 6212 contigs 1126 contigs (~18%) were assigned at least one GO term (expect-value < 10-25; 1025 contigs when excluding the 140 clusters originating from yeast and human mRNA from the analysis). The Partigene [27] PERL scripts were used to store all the information in a web-accessible relational database [17]. All processed ESTs, excluding the ones marked as human and yeast contamination, were submitted to dbEST (accession numbers: – ).