Collecting and downloading the data took a week, but processing the data into a single dataset took two years of the three-year project, Tagkopoulos said. The team built models for four layers, starting with gene expression and working up to the activity at the whole-cell level. Then they integrated the layers together. They used techniques in machine learning to train the models to predict the behavior of each layer, and ultimately of the cell itself, under different conditions.
The model was built on computer clusters at UC Davis, and on supercomputers available through a national network. The researchers received a National Science Foundation grant of computing time on "Blue Waters," one of the world's most powerful supercomputers, at the National Center for Supercomputer Applications.
Although E. coli is a well-known organism, we are far from knowing everything about its biochemistry and metabolism, Tagkopoulos said.
"We are exploring a vast space here," he said. "Our aim is to create a crystal ball for the bacteria, which can help us decide what is the next experiment we should do to explore this space better."
With collaborators at Mars Inc., Tagkopoulos hopes to begin building similar databases and models for bacteria involved in foodborne illness, such as Salmonella enterica and Bacillus subtilis. He expects other researchers to draw on the Ecomics database, and hopes to make the MOMA model interface more accessible for biologists to use.
"We're living in an amazing era at the intersection of computer science, engineering and biology," he said. "It's a very interesting time."
Co-authors on the paper are Minseung Kim at the UC Davis Department of Computer Science and Genome Center, and Navneet Rai and Violeta Zorraquino, UC Davis Genome Center. The work was supported by the U.S. Army Research Office and the National Science Foundation.