Althabiti, S, Alsalka, M and Atwell, E orcid.org/0000-0001-9395-3764 (2021) SCUoL at CheckThat! 2021: An AraBERT model for check-worthiness of Arabic tweets. In: CEUR Workshop Proceedings. CLEF 2021 - Conference and Labs of the Evaluation Forum, 21-24 Sep 2021, Bucharest, Romania. CEUR Workshop Proceedings , pp. 430-434.
Abstract
Many people nowadays tend to explore social media to obtain news and find information about various events and activities. However, an abundance of misleading and false information is spreading every day for many purposes, dramatically impacting societies. Therefore, it is vitally important to identify false information on social media to help individuals distinguish the truth and protect communities from the harmful effects of false information. For this reason, determining which information has the priority to be scrutinized is a significant prior step that several studies have considered. In this paper, we have addressed Subtask-1A(Arabic) of CLEF2021 CheckThat! Lab. We have done that in two steps. The first involved pre-processing the provided dataset with text segmentation and tokenization. In the second step, we implemented different models on the Arabic tweets in order to binary classify them according to whether a specific tweet is worth being considered for fact-checking or not. We mainly compared two versions of the pre-trained AraBERT model with some of the traditional word encoding methods, including the Linear SVC model with TF-IDF. The results indicate that the AraBERTv2 version outperforms the other models. Consequently, we used it for our final submission, and we were ranked third among eight other participating teams.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). |
Keywords: | AraBERTv2, AraBERTv0.2, Check-worthiness, Fact-check, CheckThat Lab |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 29 Nov 2021 12:42 |
Last Modified: | 29 Nov 2021 12:42 |
Status: | Published |
Publisher: | CEUR Workshop Proceedings |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:180924 |