Cedric Waterschoot, Antal van Den Bosch and Ernst van den Hemel
Moderating online comments has proven to be an extremely challenging task. Moderators are faced with a fast-growing quantity of incoming posts, while simultaneously aiming to create a constructive environment (Delgado 2019). Deliberation has been a cornerstone of the ‘constructive’ framework and heterogeneous environments have a role to play in this deliberative goal (Suiter 2016; Wright2007). To create such heterogeneous environments, moderators ought to simultaneously recognize different, often nuanced arguments and balance the discussion to create a good overview of all argumentative positions. This task becomes even more demanding in polarized debates, like those concerning climate change.
With this goal in mind, we have (1) trained classifiers that aim to correctly classify all arguments within the climate change discussion and (2) analyzed the distinct vocabulary of each argument. As mentioned earlier, exposure to conflicting viewpoints can boost deliberation (Suiter 2016).
We specifically focused on minority arguments in order to facilitate heterogeneous discussion. To illustrate the approach, we created a case study on the topic of climate change with posts from the comment platform of Dutch online newspaper NU.nl, NUjij.
We have created a dataset (n=3000) of comments derived from articles with the [klimaat] (climate) tag, including deleted comments. We annotated each comment for the specific argument it entails. The annotation scheme was constructed based on the vast literature on the climate change debate. The scheme includes a total of seven arguments, alongside a ‘non-argumentative/off-topic’ class. Due to the scarcity of the minority arguments, initial models underperformed. We therefore constructed an active learning approach with the goal of obtaining unlabeled data which includes relatively more minority viewpoints (Zhao 2006). The approach, based on the ‘query-by-committee’ method, assigns an uncertainty value to unlabeled comments and is trained on the BERT embeddings from the initial RobBERT model (Delobelle 2020). In total, we filtered out two waves of 1000 comments each based on 20,000 unseen climate comments. These uncertain posts were subsequently labelled by an annotator and added to the training data. This procedure not only significantly boosted the presence of minority climate change arguments in our dataset, but improved the macro F1-scores of our classifiers on the validation data.
Our second goal was to analyze what sets each argument apart from the others. We compared each argumentative subcorpus to the other arguments by extracting all patterns and differentiating them using log-likelihood (Rayson 2010). We find that each argument can be characterized by repeated patterns that are not as represented in other subcorpora. These vocabularies can be used as summary of and entry-point into the nuanced minority arguments that make up the climate change debate on online platforms.
By tagging argumentative messages and presenting unique vocabulary, we aim to boost mutual understanding in the climate change debate. Moderators can present all arguments within a discussion and incentivize differing opinions to interact and discuss viewpoints. Furthermore, moderators can connect those who reject mainstream viewpoints with accessible, well-argumented content.