The Public Data Portal of BOLD (Ratnasingham, 2007) and Core Nucleotide
database of GenBank were searched for COI (Cytochorome C
Oxidase 1) barcode sequences of Indian freshwater fishes. The data
were retrieved using Boolean operator ‘AND’ with two terms under a
different context (taxonomic: Order and geographic: India) thereby
extracting records that only matched both the terms. Sequences from
both the databases were compiled together and duplicate records
were removed, to finally get a set of 1413 barcode sequences for 179
species. Sequences of length N600 bp, with no missing nucleotides or
gaps,were included, thereby reducing the possibility of NUMTs (nuclear
DNA originating from mitochondrial DNA sequences) (Zhang and
Hewitt, 1996), and aligned using Clustal Omega (Sievers et al., 2011).
Suspected erroneous sequences, with highly unlikely positions (species
clustering with different family or order) or having extreme branch
lengths were omitted, based on a Neighbor-Joining tree. The COI coding
DNA sequence were translated using MEGA 5.1 and aligned with the
available COI amino acid sequences to ensure the presence of an open
reading frame (Tamura et al., 2011). The sequenceswere trimmed at either
ends to exclude any gaps and a final set of 503 bp long 1383 consensus
barcode sequences for 175 species were used for analysis.
Among them, 172 sequences for North-East Indian freshwater fishes
were developed following the protocol as below.