the necessary first step in analysis of nsSNPs is to identify whether a given SNP is indeed non-synonymous. For this purpose we map SNPs onto known protein on the basis of SNP DNA flanking sequences.
Flanking genomic sequences of SNPs from HGVbase with length 25 bp each have been translated in all six possible and searched for in the proteins in the human proteins subset of SWALL database. Protein sequences and genomic fragments were pre-processed with the SEG,XNU,RepeatMasker and Dust program, which are use to fitter in to areas of low compositional complexity,regions containing internal repeats of short periodicity and know human genomic repeat sequence. ALU subfamily proteins also excluded from the set. We required that at least one translated flanking sequence should have an exact match with database protein sequence. If this match detected, we further required that the second flanking sequence had either an exact match with protein sequence or matched the protein sequence in all position until the end of the protein or a conventional exon/intron border is observed.