Quantifying OEU reproducibility across OEU calling methodologies. To compare
the effects of different OEU-calling algorithms, OEU calling was performed as
described for the main 2013 data set except replacing the Euclidean distance
(L2 norm) with the L1 distance (Bray–Curtis).
Statistical methodology quantifying OEU reproducibility. To quantify the
reproducibility of the OEUs between time points or between sample preparation
methods, we computed the numbers of pairs of OTUs such that the following applied:
• Both OTUs are present in both data sets (for example, in both the 2008 and 2013
data sets).
• Both OTUs were in the same OEU in both data sets (for example, OTUs A and B
were both in OEU X in the 2008 data set and both in OEU Y in the 2013 data set).
We compared this number of pairs against the number of pairs satisfying the same
criteria that would arise at random, specifically, if we randomly shuffled the
abundances of OTUs in each sample before computing the Euclidean distance
between OTUs.
Reference OTU selection. Reference OTUs were selected by matching the Illumina
OTUs to Sanger clone sequences. Only exact matches between the 77 bp Illumina
OTUs and Sanger clones were considered. Three Illumina OTUs matched multiple
Sanger clones with nucleotide distances between clones larger than 0.1 and resulted
in OTU distributions that were the product of two distinct organismal signals. These
were corrected by aligning Sanger clone sequences to identify discriminating bases 5′
of the Illumina OTU sequence end point. One or two differentiating bases were
identified for each of the three cases and the length of sequence required to
differentiate between the two sequences was determined. Once a unique sequence