Abundance of phages across samplesCoverage calculations were limited to the window encompassingthe region defined by the spacer hits (proto-spacers) and phageassociatedgenes. This was in order to make sure only phageabundance was measured if it was integrated in a bacterial genome.Metagenomic reads from each sample were mapped to phagecontigs using BLASTN, requiring at least 80% of the read to alignagainst the contig with at least 85% identity. To declare a phagecontig exists in a particular sample, the phage region of the contighad to be covered by at least one read per base pair and at least 70%of the phage-region bases had to be covered (if the phage regionwas <2 kb, it was extended by 2 kb upstream and downstream). Seethe Supplemental Text for discussion on the parameters chosen forthe sharing analysis. Coverage is reported in reads per kilobase permillions of reads (RPKM). Visual data on phage abundance acrosssamples are found at http://www.weizmann.ac.il/molgen/Sorek/microbiome_phages/.Similarity of phage existence profilesPhage existence profiles for pairs of samples were compared usingbinary asymmetric distance: the proportion of phages that exist inonly one sample out of all phages that exist in at least one of thesamples in the pair.Rate-of-discovery analysisMetaHIT samples were added one at a time. After each sample wasadded, a tally was made of all unique phage contigs in the accumulatedsamples that were detectable using the spacers found inthose accumulated samples. The analysis was repeated 10 timeswith random sample order (Fisher-Yates shuffling). Fourteen of thesamples could not contribute spacers due to shorter sequencingreads.Distribution of phage contigs in other data setsReads sequenced in all individuals and time points in each study(Kurokawa et al. 2007; Reyes et al. 2010; Minot et al. 2011) wereused to form a study pool. Each pool was mapped to MetaHITphage contigs with BLASTN using a cutoff of at least 85% identityover at least 85% of the read length. A phage contig was determinedto exist in a data set if the phage region (as previouslydefined) was covered by at least one read per base pair, and at least70% of bases in the region were covered. For the VLP data sets,contigs that failed the above test were subjected to a second testwith the same set of parameters across the entire length of thecontig since the location of read mapping in this case did not needto be strictly controlled.Abundance of HMP bacteriaHMP genomes deposited in IMG in March 2011 were queried forCOG functions corresponding to 31 universal single-copy genes(Ciccarelli et al. 2006). BLASTN with -F F flag and an e-valuethreshold of 0.00005 was performed using all reads from eachsample against the universal single copy genes from organisms forwhich at least 25 of the 31 gene sequences were available. Only thebest BLAST hit was taken for each read and only if at least 80% ofthe read aligned against a bacterial gene with at least 85% identity.Coverage was measured in RPKM.Assembly of CRISPR arraysAll reads that matched the set of known CRISPR repeats usingBLASTN with an e-value of 0.01 were gathered from each of the 110individuals in the MetaHIT data that had 75-bp read sequences.The QSRA short-read assembler (Bryant et al. 2009) was then usedto assemble these reads, in each individual separately. To facilitateassembly of repetitive CRISPR loci, reads were considered for extensionof an assembled array only if they matched at least the last60 bases of the growing contig (option -u), or, if not enough ofthese existed, at least the last 50 bases (option -l).Evidence for prophage integrationBLASTN with default parameters was used to align phage contigsonto the set of HMP genomes. Prophage integration was determinedif at least 1000 bases of the phage contig were aligned tothe bacterial genome with at least 95% identity, and the alignmentwas localized to one or both ends of the phage contig.AcknowledgmentsWe thank Alejandro Reyes and Samuel Minot for their gracioushelp in providing supplementary VLP sequencing data. We alsothank Debbie Lindell, EyalWeinstock, Dvir Dahary, Eytan Ruppin,Hila Sberro-Livnat, Asaf Levy, OmriWurtzel, Gil Amitai, and ShanyDoron for comments on the manuscript. R.S. was supported, inpart, by the ERC-StG program (grant 260432), the Leona M. and
Harry B. Helmsley Charitable Trust, and by a DIP grant from the
Deutsche Forschungsgemeinschaft. A.S. was the beneficiary of
a postdoctoral grant from the AXA research fund. E.M. was supported,
in part, by a fellowship from the Edmond J. Safra Bioinformatics
Program at Tel Aviv University. I.T. was supported by
the Clore Center at the Weizmann Institute of Science.
การแปล กรุณารอสักครู่..