In this article we have studied the problem of ordering leaves of the tree
representing hierarchical clustering of the gene expression profiles. We
have presented an efficient Θ(n
3
) algorithm solving the problem, which
improved on an existing Θ(n
4
) algorithm for the same problem. Our algorithm
is suitable for different distance measures and various optimization
criteria, and this approach could be applied to other tree-ordering problems
unrelated to gene expression arrays. For example, one could use it to
reorder the subtrees of a phylogeny tree under some similarity measure.
In the second part of the article we have presented results of a small
experimental study. The results from all of our experiments point to one
conclusion: different orderings may create slightly different results, but no
one ordering seems to be superior. Quantitatively, sharp differences can
be seen, with TSP and optimal ordering outperforming Eisen’s heuristic
and random ordering over several distance metrics. However, these advantages,
particularly in the case of TSP, may have to be examined in
light of qualitative results. Qualitatively, differences are subtle, and while
we can identify individual variations on individual data sets, it is diffi-
cult to make general determinations. We have observed an example where
the optimal ordering produces biologically more meaningful results than
Eisen’s heuristic. Several more data sets with significant qualitative improvement
have been observed in [2]. Of course, qualitative improvement
cannot be expected in all cases.