1. INTRODUCTION
The consensus tree problem has been rst described in [1]
alongside with a solution method. It frequently appears in
the domain of phylogenetics [13], though it has applications
in other clustering domains as well.
A phylogenetic tree is composed of nodes and branches
(arcs) and models the evolutionary relationship between a
set L of related objects called taxa. These taxa are the labeled
leaf nodes of the tree whereas unlabeled inner nodes
represent probably extinct ancestors. In this work we will
only consider rooted unweighted binary trees, i.e. there exists
a single distinguished root node being the common ancestor
of all taxa, the relations represented by the tree are
not weighted by any means, and each inner node always has
exactly two direct descendants.
The phylogeny problem is to infer the intermediate ancestors
and branches, thus to derive the evolutionary relationships,
from given species data like DNA or RNA sequences.
For this purpose a multitude of conceptually dierent inference
methods exists, each having individual advantages and
drawbacks [15].
Unfortunately, it is likely that biologists end up with several
dierent trees for one and the same taxa set L due to
(i) having multiple data sets available, (ii) inferring with
dierent methods, or (iii) repeated runs of the same nondeterministic
method.
The question is then how to take advantage of these probably
high-quality but dierent and partly contradictory trees
beside manually comparing and merging them. A possible
solution is to look for a single tree over L est" representing
the collection T . This non-trivial task is known as the
consensus tree problem (CTP).
On the one hand the meaning of est" depends on the desired
information to retain in the consensus tree and on the
other hand the possible consensus tree is literally restricted
by the degree of strictness of the applied method as well as
the granularity of the tree metric used, see also [3]. Figure 1
shows a schematic representation of this circumstance. Generally,
a strict method and a coarse-grained metric rather
lead to poorly resolved trees with few inner nodes having
high degrees, and a substantial portion of the information
contained in the input trees is lost. In contrast, we aim at
deriving fully resolved (thus, binary) high-quality consensus
trees inheriting as much information as possible. Our
approach is therefore based on maximizing a more specic
ne-grained measure, the so-called TreeRank score.
In Section 2 we report on previous as well as related work
and dene the TreeRank measure. Meaningful tree neigh-