Roughly, the human genome contains 20 000 protein-
coding genes [11], and 25 000 noncoding genes [12]. Some
genes are crucial for life, some are crucial for health, and
some can be deleted in their entirety without apparent harm.
One of the most important information structures
within a typical gene is the presence of alternating regions
called introns and exons. The boundaries between these
regions are determined by patterns in the nucleotide
sequence, and many disease-causing mutations act by
disrupting these patterns. Spinal muscular atrophy (SMA),
which is the leading genetic cause of infant mortality in
North America [13], results if a baby’s genome is missing
the SMN1 gene, or contains a damaged version of it,
resulting in deficient production of the survival motor
neuron (SMN) protein. Another version of the gene, called
SMN2, can compensate for the production of the SMN
protein. Fig. 1 shows the nucleotide sequence from the
seventh exon of the protein-coding gene SMN2.Dueto
differences in nucleotides at the four positions shown, the
cell’s machinery fails to recognize the exon, resulting in a
protein that does not function properly, thereby unable to
compensate for the production of the SMN protein.
Researchers are evaluating therapies that restore function of
exon 7 in SMN2 [14], [15]. SMA is well studied and can be
diagnosed by outward symptoms, but genetic testing is crucial
for confirmation and therapeutic development. In other
genetic diseases, the causal mechanisms are more complex.
Cancer is a prime example of a heterogeneous disease, i.e., a
disease with multiple causal pathways all leading to similar
symptoms but requiring different treatments [16]. For cancer,
genomic data are becoming essential for providing more
detailed diagnoses and targeted treatments [17].
The concept of precision medicine is not entirely new;
doctors have been using blood type to tailor blood
transfusions for over a century [18]. What is different