The philosophy underlying databases
The food environment can be very complex and it may be difficult to quantify or even to categorise some of its features and their potential effects on microbial population dynamics or the ability to recover a target organism from a food sample. An example is the effect of food structure, reviewed by Brocklehurst (2003) which may affect environmental limits for growth (Koutsoumanis et al., 2004).
An additional difficulty is that, with the background information on the environment and with currently available techniques to measure microbial responses both variability and uncertainty may be large (Ratkowsky, 2004). Variability in microbial characteristics such as growth rate or lag phase duration is well characterised and increases markedly with increasing response times. Its effect is seen in the widening confidence limits of response time estimates (Ratkowsky et al., 1996 and Ratkowsky, 2004) and may even lead to inability to recover a target organism under conditions where growth is possible (Graham and Lund, 1993).
Variability and uncertainty in microbial responses were also discussed by Bridson and Gould (2000) in their treatise on classical versus quantal microbiology. Uncertainty, of course, also arises when information is missing or conflicting, events that regularly cause consternation in the conduct of quantitative risk assessments (Nauta, 2002). In such situations, the accumulation of MANY pieces of information is essential. An analogy can be conceived as follows: if one can take a picture of only a small segment of the sky then it is impossible to see the trajectory of the Milky Way from a single picture. However, when the pieces are put together, a pattern may emerge showing the now well-known spiral of the Milky Way. We will call this pattern, showing the potential of databases to put pieces of information together, the “Milky-way effect”.
It is important to consider the above capitalisation of the word MANY. This requirement means that the information must be put in a well-defined systematic format, following a strict database protocol, otherwise no computer program can be developed to retrieve the information. Database is not the same as “data-dump”! The fields of a database are created for a certain purpose (in the case of predictive models to represent Environment/Response mapping) and entering data in the field already requires categorisation, simplification i.e. food microbiology and modelling expertise. When, for example, the main environmental factors affecting microbial responses are identified, the same simplification is carried out as when a process is characterised by some mathematical variables. In other words, there is a parallel between mathematical abstraction and the creation of the database structure. The fields of the database correspond to mathematical variables; the relations between those fields correspond to mathematical equations and inequalities.
To introduce some philosophy, we will call the need for simplification, which is necessary to create a database, the “Platonian effect”, since it was Plato who emphasized first that the scientific thinking needs these simplifications (i.e. idealisations). We will also speak about the “Gutenberg effect” of the databases. Namely, in the IT age, Internet databases have an effect analogous to that caused by the invention of printing. Before the 15th century, certain information, even if available, was not necessarily accessible, because hand-written books were too few to be available for everybody. Now, the amount of information (even if now in printed form) has become too large and a new technique, the FAST, ACCESSIBLE, RELATED database, has given new impetus to information processing. As the early creators of the first linked databases remarked, “Gutenberg could not make his books speak to each other”.
Such a database called ComBase (Combined, or Common i.e. joint, dataBase of microbial responses to food environments) was launched at the 4th International Conference on Predictive Modelling in Foods, Quimper, France, June 2003. Its technical details can be read in Baranyi and Tamplin (2004) and on the website,www.combase.cc. The ComBase idea came from two independent, but similar, initiatives on both sides of the Atlantic. The Ministry of Agriculture Fisheries and Food in the United Kingdom initiated, in 1988, a coordinated program to collect data on the growth and death of bacterial pathogens. Those data served as the base on which the first validated, commercialised predictive package, Food MicroModel was built. The task of supporting these developments was taken over, when established, by the UK Food Standards Agency (FSA). Parallel to these events in the UK, the US equivalent of Food MicroModel called PMP (Pathogen Modeling Program: www.arserrc.gov/mfs/pathogen.htm) was developed at the Eastern Regional Research Center of the USDA Agricultural Research Service.
In the meantime, a database (ComBase) was being developed in the Institute of Food Research, Norwich, UK to pool available predictive microbiology data. Soon the leaders of FSA and USDA-ARS agreed that incorporating all their data in this common database, named ComBase, would be mutually beneficial. The European Commission also embraced the idea, and now the original Food MicroModel and PMP datasets have been supplemented with additional data submitted by supporting institutes, universities and companies mainly from Europe. Besides, data have also been compiled from the scientific literature.
ComBase has its “Milky way”, “Platonian” and “Gutenberg” effect on predictive microbiology. Its “Milky-way effect” is obvious; the amount of data can compensate for the inaccuracy of the data. Le Marc et al. (2005) showed an example how to make use of the large amount of information provided by ComBase to define, at least approximately, the total growth region of Listeria monocytogenes in the space of the main environmental factors.
Numerically and computationally minded scientists are convinced that the “Platonian effect” of ComBase is very useful to bring more exact (mathematical and quantitative) elements in microbiology research. However, this is not necessarily popular among traditional microbiologists. Namely, many of them have a certain degree of aversion to the required simplifications, saying that the information in the database does not reflect the environment and/or microbial response with sufficient accuracy. The fact is, however, that is not the function of the database. It is, rather, to make the most important aspects of the data available FAST, even if at the expense of omitting some details. It should be admitted, that this bears some danger of distortion and subjectivity, but this has always been hand in hand with the Platonian idea of idealisation and simplification. In fact, different disciplines omit different details from the same phenomena. The contributors to ComBase omit those details that, according to our food microbiology knowledge accumulated so far, do not significantly influence the Environment/Response mapping. This also shows that the “Platonian effect” can play a role in the development of a scientific discipline only after many observations and empirical descriptions have become accepted knowledge about the system to be characterised.
The “Gutenberg effect” of ComBase is probably the most popular at the moment. Thousands of researchers, risk assessors, legislative officers, food manufacturers and their laboratory managers can access published and unpublished data fast and at no expense. Publicly available databases like ComBase and, in fact, the whole Internet, are virtual forums of democracy. Besides, they can be major tools in the assessment of predictive microbiology results. Users can compare observations with independent predictions gained from other software packages, which contribute to the correct evaluation of the potential and limitation of the discipline. ComBase is a repository of predictive microbiology data that can be used by risk assessors of different countries; therefore, if ComBase is accepted internationally as the benchmark, the number of sources generating different views on risk can be decreased.