Modeling the Lifespan of Discourse Entities with Application to Coreference Resolution
A discourse typically involves numerous entities, but few are mentioned more than once. Distinguishing
those that die out after just one mention (singleton) from those that lead longer lives
(coreferent) would dramatically simplify the hypothesis space for coreference resolution models,
leading to increased performance. To realize these gains, we build a classifier for predicting the
singleton/coreferent distinction. The model’s feature representations synthesize linguistic insights
about the factors affecting discourse entity lifespans (especially negation, modality, and attitude
predication) with existing results about the benefits of “surface” (part-of-speech and n-gram-based)
features for coreference resolution. The model is effective in its own right, and the feature representations
help to identify the anchor phrases in bridging anaphora as well. Furthermore, incorporating
the model into two very different state-of-the-art coreference resolution systems, one rule-based and
the other learning-based, yields significant performance improvements.