The potential for using whole genome sequencing (WGS) data in microbiological risk assessment (MRA) has
been discussed on several occasions since the beginning of this century. Still, the proposed heuristic approaches
have never been applied in a practical framework. This is due to the non-trivial problem of mapping microbial
information consisting of thousands of loci onto a probabilistic scale for risks. The paradigm change for MRA
involves translation ofmultidimensionalmicrobial genotypic information tomuch reduced (integrated) phenotypic
information and onwards to a single measure of human risk (i.e. probability of illness).
In this paper a first approach inmethodology development is described for the application ofWGS data inMRA;
this is supported by a practical example. That is, combining genetic data (single nucleotide polymorphisms;
SNPs) for Shiga toxin-producing Escherichia coli (STEC) O157with phenotypic data (in vitro adherence to epithelial
cells as a proxy for virulence) leads to hazard identification in a GenomeWide Association Study (GWAS).
This application revealed practical implications when using SNP data for MRA. These can be summarized by
considering the following main issues: optimum sample size for valid inference on population level, correction
for population structure, quantification and calibration of results, reproducibility of the analysis, links with
epidemiological data, anchoring and integration of results into a systems biology approach for the translation of
molecular studies to human health risk.
Future developments in genetic data analysis forMRA should aimat resolving themapping problemof processing
genetic sequences to come to a quantitative description of risk. The development of a clustering scheme focusing
on biologically relevant information of themicrobe involvedwould be a useful approach inmolecular data reduction
for risk assessment