Sequence Analyses and Community Comparisons. Sequences were processed and analyzed following the procedures described previously (8, 11). Sequences were removed from the analysis if they were less than 200 or more than 300 bp in length, had a quality score less than 25, contained ambiguous characters, contained an uncorrectable barcode, or did not contain the primer sequence. Remaining sequences were assigned to samples by examining the 12-nt barcode. Similar sequences were clustered into operational taxonomic units (OTUs) using cd-hit (17) with a minimum coverage of 97% and a minimum identity of 97%. A representative sequence was chosen from each OTU by selecting the longest sequence that had the largest number of hits to other sequences in the OTU. Representative sequences were aligned using NAST (18) and the Greengenes database (19) with a minimum alignment length of 150 and a minimum identity of 75%. The PH Lane mask was used to screen out hypervariable regions after alignment. A phylogenetic tree was inferred using Clearcut (20) with Kimura's two-parameter model. Taxonomy was assigned using the RDP classifier with a minimum support threshold of 60% and the RDP taxonomic nomenclature (21).
For each of the samples included in the three studies described above (including those in the database of 270 palm surfaces used to estimate the accuracy of the computer mouse assignments) we obtained a minimum of 800 quality sequences (range 800–1,500 sequences per sample) with sequences averaging 240 bp in length.
To determine the amount of dissimilarity (distance) between any pair of bacterial communities, we used the UniFrac metric (10, 22, 23). UniFrac distances are based on the fraction of branch length shared between two communities within a phylogenetic tree constructed from the 16S rRNA gene sequences from all communities being compared. A relatively small UniFrac distance implies that two communities are compositionally similar, harboring lineages sharing a common evolutionary history. In unweighted UniFrac, only the presence or absence of lineages is considered. In weighted UniFrac, branch lengths are weighted based on the relative abundances of lineages within communities. We used the analysis of similarities (ANOSIM) (24) function in the program PRIMER (25) to test for differences in community composition among groups of samples.