Han, S., Gao, J. orcid.org/0000-0002-3610-8748 and Ciravegna, F. orcid.org/0000-0001-5817-4810 (2019) Data augmentation for rumor detection using context-sensitive neural language model with large-scale credibility corpus. In: Learning from Limited Labeled Data: ICLR 2019 Workshop. Seventh International Conference on Learning Representations, 06-09 May 2019, New Orleans, Louisiana, United States. OpenReview
Abstract
In this paper, we address the challenge of limited labeled data and class imbalance problem for machine-learning-based rumor detection in social media. We present an offline data augmentation method based on semantic relatedness for rumor detection. Unlabeled social media data is exploited to augment limited labeled data. Context-aware neural language model and a large credibility-focused Twitter corpus are employed to learn effective representations of rumor tweets for our semantic relatedness measurement method. A language model fine-tuned with the large domain-specific corpus shows a dramatic improvement on training data augmentation for rumor detection over pre-trained language models. We conduct experiments on 6 different real-world events based on 5 publicly available datasets and 1 augmented dataset. Our experiments show that the proposed method allows us to generate a large amount of training data with reasonable quality via weak supervision. We present preliminary results achieved using a state-of-the-art neural network model for rumor detection with augmented data for rumor detection.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2019 The Authors. |
Keywords: | Rumor Detection; Data Augmentation; Social Media; Neural Language Models; Weak Supervision |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 10 May 2019 11:14 |
Last Modified: | 15 Jul 2020 12:35 |
Published Version: | https://openreview.net/forum?id=SyxCysRNdV |
Status: | Published |
Publisher: | OpenReview |
Refereed: | Yes |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:145668 |