Leite, J.A. orcid.org/0000-0002-3587-853X, Razuvayevskaya, O. orcid.org/0000-0002-7922-7982, Bontcheva, K. orcid.org/0000-0001-6152-9600 et al. (1 more author) (2024) EUvsDisinfo: a dataset for multilingual detection of pro-Kremlin disinformation in news articles. In: CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management. 33rd ACM International Conference on Information and Knowledge Management, 21-25 Oct 2024, Boise, Idaho, USA. Association for Computing Machinery , pp. 5380-5384. ISBN 9798400704369
Abstract
This work introduces EUvsDisinfo, a multilingual dataset of disinformation articles originating from pro-Kremlin outlets, along with trustworthy articles from credible / less biased sources. It is sourced directly from the debunk articles written by experts leading the EUvsDisinfo project. Our dataset is the largest to-date resource in terms of the overall number of articles and distinct languages. It also provides the largest topical and temporal coverage. Using this dataset, we investigate the dissemination of pro-Kremlin disinformation across different languages, uncovering language-specific patterns targeting certain disinformation topics. We further analyse the evolution of topic distribution over an eight-year period, noting a significant surge in disinformation content before the full-scale invasion of Ukraine in 2022. Lastly, we demonstrate the dataset's applicability in training models to effectively distinguish between disinformation and trustworthy content in multilingual settings.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2024 Owner/Author. This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License. https://creativecommons.org/licenses/by-nd/4.0/ |
Keywords: | classification; dataset; disinformation; news articles; pro-kremlin |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number UK RESEARCH AND INNOVATION 101070093 10039055 UK Research and Innovation 10039055 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 13 Nov 2024 11:00 |
Last Modified: | 13 Nov 2024 11:00 |
Status: | Published |
Publisher: | Association for Computing Machinery |
Refereed: | Yes |
Identification Number: | 10.1145/3627673.3679167 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:219537 |