Given the enormous amount of data generated in
these studies, sophisticated bioinformatics and
dedicated statisticians are fundamental.
In genomics and transcriptomics microarray data
analysis can prove difficult. Huge numbers of
variables (each gene) in microarray experiments
complicate the statistics and increase the
likelihood of false positives. Microarray changes
should be validated using real-time PCR. In
proteomics, the properties of many thousands of
ions are recorded in a single experiment and
complex algorithms are used to match these data to
a theoretical database to enable protein
identification and/or quantification. In
metabolomics, raw data require transformation to
a suitable format prior to processing. The methods
available for analysis comprise various statistical
techniques including univariate and multivariate
analysis, supervised and unsupervised learning
tools and system-based analyses. The aim of these
strategies is to find data patterns that provide
useful biological information which can be used to
generate further hypotheses for testing.
Omic strategies generate huge amounts of data and
multiple testing increases the likelihood of false
positives. Data validation is essential to ensure that
findings are not just random findings.P-values can
be corrected for multiple testing (false discovery
rate). Other methods of model validation include
the use of a ‘hold-out’or ‘test’set.18 The set used inproducing the model is called the training set.
Models built using the training data can then be
independently validated using the hold-out set.An
alternative method of independent model validation
is to use permutation testing.19 More robust methods
include confirming the observations with a
complementary technique and replicating the
experiment in a different sample set.20
There are many publications, across all the
biological sciences, pointing out the potential folly
of using profiling techniques such as metabolomics,
proteomics, transcriptomics and genomics in order
to discover clinically significant biomarkers.21–25
These areas of experimental design, sample
preparation, analytical techniques and data analysis
are covered in greater detail in a number of review
articles.