A tool for exploring corpora from a semantic perspective 

Katrien Beuls and Paul Van Eecke

Existing tools that support corpus-based linguistic research, such as GrETEL (Augustinus et al. 2012), provide an excellent means to search through text corpora using either morpho-syntactic patterns that can be partially instantiated or example utterances that instantiate a specific morpho-syntactic structure. In essence, these tools enable users to find corpus observations that instantiate the morpho-syntactic patterns they have in mind. While this is certainly useful for many purposes, the fact that the user needs to specify on beforehand the morpho-syntactic patterns that they are looking for inherently limits the applicability of such tools. In particular, these tools cannot be used for finding all different observed morpho-syntactic realisations of a particular semantic structure, as they search for corpus observations based on their form rather than based on their meaning.

Complementary to search tools that focus on finding instantiations of morpho-syntactic patterns, we present here a novel tool that enables users to search for corpus observations which instantiate a particular semantic structure. For example, a user might want to search for corpus observations in which the transfer sense of the verb `give’ is expressed in combination with an agent (i.e. a giver), a theme (i.e. a thing being given) and a beneficiary (i.e. a receiver). The tool then retrieves corpus examples that instantiate this semantic structure using any morpho-syntactic realisation, e.g. ‘scholars will give you a detailed analysis’, ‘he gives priority to diplomacy or internal affairs’ or ‘the Spirit gives to one person the power to do miracles’.

The tool provides a user-friendly interface through which the user can define a semantic structure of interest based on the PropBank rolesets (https://propbank.github.io). Optionally, form-related constraints can be included, in particular constraints on the order in which the semantic roles are realised, the morpho-syntactic means through which one or more of the semantic roles are expressed, or the exact strings that appear in the instantiations of the semantic roles.

At CLIN, we will first discuss the design of the tool and its underlying construction grammar-based semantic analysis technology. Then, we will present a case study, in which we show that the tool makes it possible to tackle research questions that were previously difficult to operationalise. In particular, we will focus on finding corpus observations of unexpected morpho-syntactic realisations of English argument structure patterns.

A beta version of the tool is available at https://ehai.ai.vub.ac.be/ccxg-explorer.

References

Augustinus, L., Vandeghinste, V., and Van Eynde, F. (2012). “Example-Based Treebank Querying” In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC-2012). Istanbul, Turkey. pp. 3161-3167