The ¯owering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions.
Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the
125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication,
followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene
transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000
families, similar to the functional diversity of Drosophila and Caenorhabditis elegansÐ the other sequenced multicellular
eukaryotes. Arabidopsis hasmany families of new proteins but also lacks several common protein families, indicating that the sets
of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the ®rst
complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes
in all eukaryotes, identifying a wide range of plant-speci®c gene functions and establishing rapid systematic ways to identify
genes for crop improvement.