Random mutations of the nucleotide sequence within a gene may change
the amino acid sequence of the corresponding protein. Some of these mutations
do not drastically alter the protein’s structure, but others do and impair
the protein’s ability to function. While the former mutations usually do not
affect the fitness of the organism, the latter often do. Therefore some amino
acid substitutions are commonly found throughout the process of molecular
evolution and others are rare: Asn, Asp, Glu, and Ser are the most
“mutable” amino acids while Cys and Trp are the least mutable. For example,
the probability that Ser mutates into Phe is roughly three times greater
than the probability that Trp mutates into Phe. Knowledge of the types
of changes that are most and least common in molecular evolution allows
biologists to construct the amino acid scoring matrices and to produce biologically
adequate sequence alignments. As a result, in contrast to nucleotide
sequence comparison, the optimal alignments of amino acid sequences may
have very fewmatches (if any) but still represent biologically adequate alignments.
The entry of amino acid scoring matrix (i, j) usually reflects how
often the amino acid i substitutes the amino acid j in the alignments of related
protein sequences. If one is provided with a large set of alignments ofrelated sequences, then computing (i, j) simply amounts to counting how
many times the amino acid i is alignedwith amino acid j. A “minor” complication
is that to build this set of biologically adequate alignments one needs
to know the scoring matrix! Fortunately, in many cases the alignment of
very similar sequences is so obvious that it can be constructed even without
a scoring matrix, thus resolving this predicament. For example, if proteins
are 90% identical, even a naive scoring matrix (e.g., a matrix that gives premium
+1 formatches and penalties −1 formismatches and indels) would do
the job. After these “obvious” alignments are constructed they can be used
to compute a scoring matrix that can be used iteratively to construct less
obvious alignments.