The vast majority of living organisms encode their genes in long strands of DNA (deoxyribonucleic acid). DNA consists of a chain made from four types of nucleotide subunits, each composed of: a five-carbon sugar (2'-deoxyribose), a phosphate group, and one of the four bases adenine, cytosine, guanine, and thymine.[2]:2.1
Two chains of DNA twist around each other to form a DNA double helix with the phosphate-sugar backbone spiralling around the outside, and the bases pointing inwards with adenine base pairing to thymine and guanine to cytosine. The specificity of base pairing occurs because adenine and thymine align to form two hydrogen bonds, whereas cytosine and guanine form three hydrogen bonds. The two strands in a double helix must therefore be complementary, with their sequence of bases matching such that the adenines of one strand are paired with the thymines of the other strand, and so on.[2]:4.1
Due to the chemical composition of the pentose residues of the bases, DNA strands have directionality. One end of a DNA polymer contains an exposed hydroxyl group on the deoxyribose; this is known as the 3' end of the molecule. The other end contains an exposed phosphate group; this is the 5' end. The two strands of a double-helix run in opposite directions. Nucleic acid synthesis, including DNA replication and transcription occurs in the 5'→3' direction, because new nucleotides are added via a dehydration reaction that uses the exposed 3' hydroxyl as a nucleophile.[25]:27.2
The expression of genes encoded in DNA begins by transcribing the gene into RNA, a second type of nucleic acid that is very similar to DNA, but whose monomers contain the sugar ribose rather than deoxyribose. RNA also contains the base uracil in place of thymine. RNA molecules are less stable than DNA and are typically single-stranded. Genes that encode proteins are composed of a series of three-nucleotide sequences called codons, which serve as the "words" in the genetic "language". The genetic code specifies the correspondence during protein translation between codons and amino acids. The genetic code is nearly the same for all known organisms.