Althabiti, S. orcid.org/0000-0002-4646-0577, Alsalka, M.A. orcid.org/0000-0003-3335-1918 and Atwell, E. orcid.org/0000-0001-9395-3764 (2023) Generative AI for Explainable Automated Fact Checking on the FactEx: A New Benchmark Dataset. In: Ceolin, D., Caselli, T. and Tulin, M., (eds.) Disinformation in Open Online Media. 5th Multidisciplinary International Symposium, MISDOOM 2023, 21-22 Nov 2023, Amsterdam, The Netherlands. Lecture Notes in Computer Science, 14397 . Springer , pp. 1-13. ISBN 9783031478956
Abstract
The immense volume of online information has made verifying claims’ credibility more complex, increasing interest in automatic fact-checking models that classify evidence into binary or multi-class verdicts. However, there are few studies on predicting textual verdicts to explain claims’ credibility. This field focuses on generating a textual verdict to explain a given claim based on a given news article. This paper presents our three-fold contribution to this field. Firstly, we collected the FactEx, an English dataset of facts with explanations from various fact-checking websites on different topics. Secondly, we employed seq2seq models and LLMs (namely T5, BERT2BERT, and BLOOM) to develop an automated fact-checking system. Lastly, we used ChatGPT to generate verdicts to check its performance and compared the results against other models. In addition, we explored the impact of dataset size on the model performance by conducting a series of experiments on seven different dataset sizes. The findings indicate that our fine-tuned T5-based model outperformed other generative LLMs and Seq2Seq Models with a ROUGE-1 score of about 26.75, making it the selected baseline for this task. Our study recommends examining the semantic similarity of the generative models for automatic fact-checking applications while also highlighting the importance of evaluating such models using additional techniques, such as crowd-based tools, to ensure the accuracy and reliability of the generated verdicts.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG. This version of the conference paper has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use (https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms), but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-031-47896-3_1. |
Keywords: | FactEx Dataset, Automatic Fact-check, ChatGPT, Generative LLMs, NLP, Artificial Intelligence, Computer Science, Disinformation |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 07 Dec 2023 13:15 |
Last Modified: | 14 Nov 2024 01:13 |
Status: | Published |
Publisher: | Springer |
Series Name: | Lecture Notes in Computer Science |
Identification Number: | 10.1007/978-3-031-47896-3_1 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:206287 |