Towards Fine(r)-Grained Identification of Coreference Resolution Types

Loic De Langhe, Orphée De Clercq and Veronique Hoste

In recent years, attention in coreference resolution studies has gradually shifted from traditional entity-based coreference resolution to event-based coreference resolution (ECR), where the aim is
to determine whether two events or happenings described in text refer in fact to the same (real-world) occurrence [Lu and Ng, 2018]. An important aspect in entity coreference studies has been the distinction between different types of coreference relations. Traditionally, three distinct categories of entity coreference have been identified: the identity relation (where the two (named) entities refer to exactly the person or object in the real world), the part-whole relation (where one entity is only a part of the other overarching entity e.g. “The president” – “The government”) and the type/token relation (in which the anaphor does not refer to the same entity as its antecedent but to one of a similar description) [Hoste, 2005]. In contrast, current studies in event coreference resolution have exclusively focused on the identity relation between events, even though a solid case can be made that other relationships exist between textual events. For instance, one can argue that, given the proper context, an event such as “The opening speech” is a part of “The Oscars”, a nuance that is currently overlooked in virtually all ECR research. In this preliminary study, we propose to investigate different coreferential relations between events in Dutch. We believe that defining more fine-grained coreference types can be a first step towards integrating ECR algorithms into practical applications such as content-based news recommendation systems and fake news detection. In addition to this, introducing specific relations between event mentions can also be of great benefit to language understanding in general, especially at the discourse level. We present a recently developed event dataset that distinguishes between the identity and part-whole relationships on the event level, providing us with an excellent opportunity to investigate the possibility of fine-grained event coreference resolution. Concretely, we aim to explore coreferential event types both in theory and in practice. In our theoretical section we aim to explain the distinction between identity and part-whole relationships for events and to discuss possible advantages of adding more fine-grained relations for coreferential relationships in events. For this theoretical discussion, we primarily base ourselves on both advancements in entity coreference resolution [Ng, 2017] and fundamental literature on events themselves [Quine, 1985]. We then perform a classification task in which the coreference type (identity/part-whole) is predicted for
coreferential events. Our initial experiments with Dutch and multilingual transformer-based models
show promise (macro F1: 0.81). However, since coreferential relations are mainly situated at the
discourse level of the text, we aim to expand upon the current transformer models by enriching them with discourse and meta-linguistic features.