A Hybrid Knowledge and Transformer-Based Model for Event Detection with Automatic Self-Attention Threshold and Layer Selection

Thierry Desot, Orphée De Clercq and Veronique Hoste

A core task in information extraction is event extraction for identifying event triggers in sentences and classifying those into event types. In this work an event is considered as
the unit to measure diversity and similarity in news articles in the framework of a news recommendation system. Current typology-based event extraction approaches fail to handle the variety of events expressed in real-world situations. To overcome this, we aim to perform event salience classification of new information into less and more general event prominence classes. Event and argument role detection are frequently conceived as separate tasks where a multi-word event is first split into a verb as single-word event to process, after which its argument roles (subject, direct and indirect object(s)) and semantic roles (such as time and location) are extracted. These are typically trained in a multi-task setup for event extraction, i.e. the combination of event
span detection and classification. In this study, we go beyond single-word events and tackle multi-word event extraction. On top of that we conceive event span detection and argument extraction as one and the same task in a hybrid knowledge and transformer-based event detection method.

Deep learning approaches that were in recent years combined with Word2Vec, GLoVe and fastText word embeddings have led to the rise of the transformer architecture. Its contextual language
models have been successfully integrated in a range of NLP tasks using pre-trained contextual BERT (Bidirectional Encoder Representations from Transformers) word embeddings. The main component of our hybrid event extraction method is based on automatic keyword extraction (AKE) using the self-attention mechanism of a Dutch BERT model, BERTje. As a bottleneck for AKE is defining the threshold of the attention values to take into account, we propose a novel method for automatic self-attention threshold selection, by exploiting the interaction between
self-attention based AKE and rule-based event detection. The main function of the rule-based syntactic parser is to provide the necessary information for the automatic attention threshold mechanism, targeting only minimal event information, i.e. the backbone of the event, or the verb and its arguments, subject, verb and object. This allows the transformer main component to complement it with other semantic roles and semantically salient information. However, the latter type of information is in some cases essential and required to constitute the core meaning of the event.

Certain transformer self-attention layers and heads exhibit linguistic notions, such as syntax and co-reference. According to several studies on the (English) BERT transformer, attention follows syntactic dependency and subject-verb-object agreement most strongly in the middle layers of the BERT model. Hence, we verify this for the Dutch BERTje model. Based on these findings, we propose an automatic self-attention layer selection mechanism, by analyzing which layers in the structure of the BERT transformer contribute most to hybrid event detection and which linguistic tasks they represent. On top of that, this approach was integrated in a pipeline event extraction approach and able to outperform event detection and classification of three state-of-the-art multi-task event extraction methods.