The potential for using whole genome sequencing (WGS) data in microbiological risk assessment (MRA) has
been discussed on several occasions since the beginning of this century. Still, the proposed heuristic approaches
have never been applied in a practical framework. This is due to the non-trivial problem of mapping microbial
information consisting of thousands of loci onto a probabilistic scale for risks. The paradigm change for MRA
involves translation of multidimensional microbial genotypic information to much reduced (integrated) phenotypic
information and onwards to a single measure of human risk (i.e. probability of illness).
In this paper a first approach in methodology development is described for the application of WGS data in MRA;
this is supported by a practical example. That is, combining genetic data (single nucleotide polymorphisms;
SNPs) for Shiga toxin-producing Escherichia coli (STEC) O157 with phenotypic data (in vitro adherence to epithelial
cells as a proxy for virulence) leads to hazard identification in a Genome Wide Association Study (GWAS).
This application revealed practical implications when using SNP data for MRA. These can be summarized by
considering the following main issues: optimum sample size for valid inference on population level, correction
for population structure, quantification and calibration of results, reproducibility of the analysis, links with
epidemiological data, anchoring and integration of results into a systems biology approach for the translation of
molecular studies to human health risk.
Future developments in genetic data analysis for MRA should aim at resolving the mapping problem of processing
genetic sequences to come to a quantitative description of risk. The development of a clustering scheme focusing
on biologically relevant information of the microbe involved would be a useful approach in molecular data reduction
for risk assessment