he distance distribution of this dataset is skewed. DNA protein symbol sequences of length sixteen. The sequences are compared by a weighted edit distance according to the Needleman–Wunsch algorithm (=-=Needleman & Wunsch, 1970-=-). This distance function has a very limited domain of possible values-the returned values are integers between 0 and 100. Observe that none of these datasets can be efficiently indexed and searched b