The following system description presents our approach to the detection of fake news in texts. The given task has been framed as a multi-class classification problem. In a multi-class classification problem, each input chunk is assigned one of several class labels.
To dissect content patterns in the training data, we made use of topic modeling. Topic modeling techniques such as Latent Dirichlet Allocation (LDA) are unsupervised algorithms that pick up on patterns and provide an estimate of what the messages convey.
In order to assign class labels to the given documents, we opted for RoBERTa (A Robustly Optimized BERT Pretraining Approach) and Longformer as neural network architectures for sequence classification. Starting off with a pre-trained model for language representation, we fine-tuned this model on the given classification task with the provided annotated data in supervised training steps. In a hierarchical approach, the training of a classifier took place at topic level.
«The following system description presents our approach to the detection of fake news in texts. The given task has been framed as a multi-class classification problem. In a multi-class classification problem, each input chunk is assigned one of several class labels.
To dissect content patterns in the training data, we made use of topic modeling. Topic modeling techniques such as Latent Dirichlet Allocation (LDA) are unsupervised algorithms that pick up on patterns and provide an estimate of wha...
»