Legal-RobBERT: A Dutch BERT model specific to the legal domain

Thomas Boer, Jesse Tijsterman and Lan Chu

Dutch pre-trained BERT language models like BERTje and RobBERT have achieved impressive performance in several natural language processing tasks. However, there has been limited investigation of domain specific language models and no investigation at all of Dutch BERT models specific to the legal domain. Thus, in this study, we focus on the legal domain, where we introduce a Dutch language model called legal-RobBERT. Two variations of legal-RobBERT were created. Legal-RobBERT(FP) was created by further pretraining RobBERT on 1 GB of Dutch law text. Legal-RobBERT(SC) was created by only using the RobBERT architecture without the pretrained weights and training it on the same corpus. The performance of these models was compared by means of a downstream custom semantic role labelling (SRL) task. Furthermore, we compared the performance of two Dutch BERT models (RobBERT and BERTje) on the same SRL task. lastly, an extensive investigation of optimal hyperparameters was conducted for RobBERT on the SRL task.