The majority of food authentication studies have relied on the DNA database GenBank as a source of sequence information. GenBank is an expansive collection of all publicly available DNA sequences for genes in a multitude of species. This database is produced by the Natl. Center for Biotechnology Information (NCBI) and can be accessed online at the NCBI website (http://www.ncbi.nlm.nih.gov). However, while GenBank is freely accessible and provides sequence information for many species, this database has been criticized for its susceptibility to misidentification of species or population, missing information, and inconsistent terminology.
In an attempt to catalogue all life forms in DNA terms, the Consortium for the Barcoding of Life (CBOL; http://www.barcoding.si.edu/) was established. This initiative is focused on sequencing the mt COI gene in all biological species. The sector of the project focused on fish species identification is FISH-BOL (http://www.fishbol.org/), which has established barcodes for a growing number of marine and freshwater species (currently over 4500). Although data from this project may prove useful in species detection for prevention of commercial fraud, there is currently less information on COI than on the molecular marker mt cyt b, which is supported by more sequence data from a greater number of species (Dawnay and others 2007). Moreover, a literature search for species identification studies using the combined databases Academic Search Premier and Agricola resulted in 288 hits with the search terms “species identification and cytochrome b or cyt b gene” and only 142 hits using the search terms “species identification and cytochrome c oxidase subunit I or COI gene.” Standardizing the identification approach to be limited to COI could potentially be a major source of controversy, as it has become in the field of taxonomy (DeSalle and others 2005). On the other hand, the compilation of sequence information for a specific gene in all species could greatly improve genetic identification techniques and provide a focused effort for fraud prevention. To this effect, USFDA researchers have recently been investigating the possibility of incorporating DNA COI barcodes in the Regulatory Fish Encyclopedia (RFE) (Yancy and others 2008).
The RFE was developed by CFSAN in an attempt to assist government officials and purchasers of seafood in the correct identification of species and detection of species substitution and economic fraud. This database can be found online at http://www.cfsan.fda.gov/~frf/rfe0.html, and it currently includes detailed information on 94 commercially important fish species in the United States (Tenge and others 1997). Specific characteristics of each fish species are readily available, including high-resolution images of the whole and filleted fish; geographic, taxonomic, and nomenclature information; and expected IEF protein patterns and analysis toolkits. In addition to protein patterns, the organization is currently working to post the species-specific DNA patterns and sequence information for these fish. Yancy and others (2008) recently reported the development of DNA COI barcodes for 72 species of fish that may be used as an additional identification resource available in the RFE. The accuracy of this method was also tested for use with commercial samples. A blind study was carried out with 60 unknown fish species that were all identified correctly using the online identification engine BOLD, which is provided by the Barcode of Life data system. The supplementation of the RFE with results from the Barcode of Life project might help to provide a focused, nationwide effort for the development of species differentiation methods. Additionally, the availability of DNA barcodes in a publicly accessible format could greatly facilitate efforts to enforce regulatory labeling laws for fish and seafood species. A recently published study reported the use of DNA barcoding to identify species in a variety of smoked fish products (Smith and others 2008). An approximately 600-bp fragment of the COI gene was amplified from each sample, sequenced, and then matched against reference COI sequences from BOLD and GenBank. This method allowed for species identification in products representing fish species spanning 10 families and 4 orders, and it was predicted to become a standard tool for identification of fish species in food products.
Another project that has been focused on sequence information for specific genes is the FishTrace Consortium (http://www.fishtrace.org), which comprises 53 members from several European institutions (Sevilla and others 2007). The FishTrace Database provides detailed information on a number of fish species common to Europe, along with DNA barcoding data for the genes mt cyt b and nuclear rhodopsin. The sequence data have been obtained from referenced FishTrace specimens, and the database provides online tools that can be used to predict restriction enzyme cutting sites, carry out BLAST searches, and construct phylogenetic trees. The barcoding information used by FishTrace includes a longer DNA sequence than that used in COI studies, and it has been argued that the use of DNA barcodes longer in length will allow for increased efficiency of identification labels (Sevilla and others 2007). Also, the combination of 2 genes that exhibit different genomic positions and rates of evolution, such as mt cyt b and rhodopsin, was reported to be valuable for the efficiency of DNA barcoding.