A new access method, called M-tree, is proposed
to organize and search large data sets
from a generic “metric space”, i.e. where object
proximity is only defined by a distance
function satisfying the positivity, symmetry,
and triangle inequality postulates. We detail
algorithms for insertion of objects and split
management, which keep the M-tree always
balanced - several heuristic split alternatives
are considered and experimentally evaluated.
Algorithms for similarity (range and k-nearest
neighbors) queries are also described. Results
from extensive experimentation with a
prototype system are reported, considering as
the performance criteria the number of page
I/O’s and the number of distance computations.
The results demonstrate that the Mtree
indeed extends the domain of applicability
beyond the traditional vector spaces,
performs reasonably well in high-dimensional
data spaces, and scales well in case of growing
files.