Singh, I., Scarton, C. orcid.org/0000-0002-0103-4072, Song, X. orcid.org/0000-0002-4188-6974 et al. (1 more author) (Submitted: 2023) Finding already debunked narratives via multistage retrieval: enabling cross-lingual, cross-dataset and zero-shot learning. [Preprint - arXiv] (Submitted)
Abstract
The task of retrieving already debunked narratives aims to detect stories that have already been fact-checked. The successful detection of claims that have already been debunked not only reduces the manual efforts of professional fact-checkers but can also contribute to slowing the spread of misinformation. Mainly due to the lack of readily available data, this is an understudied problem, particularly when considering the cross-lingual task, i.e. the retrieval of fact-checking articles in a language different from the language of the online post being checked. This paper fills this gap by (i) creating a novel dataset to enable research on cross-lingual retrieval of already debunked narratives, using tweets as queries to a database of fact-checking articles; (ii) presenting an extensive experiment to benchmark fine-tuned and off-the-shelf multilingual pre-trained Transformer models for this task; and (iii) proposing a novel multistage framework that divides this cross-lingual debunk retrieval task into refinement and re-ranking stages. Results show that the task of cross-lingual retrieval of already debunked narratives is challenging and off-the-shelf Transformer models fail to outperform a strong lexical-based baseline (BM25). Nevertheless, our multistage retrieval framework is robust, outperforming BM25 in most scenarios and enabling cross-domain and zero-shot learning, without significantly harming the model's performance.
Metadata
Item Type: | Preprint |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2023 The Author(s). For reuse permissions, please contact the Author(s). |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 14 Feb 2025 09:14 |
Last Modified: | 14 Feb 2025 10:17 |
Published Version: | https://arxiv.org/abs/2308.05680v1 |
Status: | Submitted |
Identification Number: | 10.48550/arXiv.2308.05680 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:223245 |