The basic interaction unit in STRING is the functional
association, i.e. a specific and productive functional relationship
between two proteins, likely contributing to a common
biological purpose. Interactions are derived from multiple
sources: (i) known experimental interactions are imported
from primary databases, (ii) pathway knowledge is
parsed from manually curated databases, (iii) automated
text-mining is applied to uncover statistical and/or semantic
links between proteins, based on Medline abstracts and
a large collection of full-text articles, (iv) interactions are
predicted de novo by a number of algorithms using genomic
information (23–25) as well as by co-expression
analysis and (v) interactions that are observed in one organism
are systematically transferred to other organisms,
via pre-computed orthology relations. STRING centers
on protein-coding gene loci––alternative splice isoforms or
post-translationally modified forms are not resolved, but
are instead collapsed at the level of the gene locus. All
sources of interaction evidence are benchmarked and calibrated
against previous knowledge, using the high-level
functional groupings provided by the manually curated Kyoto
Encyclopedia of Genes and Genomes (KEGG) pathway
maps (5).
As of the current update to version 10.0, the number of
organisms covered by STRING has increased to 2031, almost
doubling over the previous release. The update alsosources again, re-running all prediction algorithms and reexecuting
the entire text-mining pipeline with new dictionaries
and extended text collections. Many of the features
and interfaces of STRING have already been described previously
(26–28). Below, we have given a short overview of
the resource and describe recent additions and modifications.