Delbari, Z., Moosavi, N.S. orcid.org/0000-0002-8332-307X and Pilehvar, M.T. (2024) Spanning the spectrum of hatred detection: a Persian multi-label hate speech dataset with annotator rationales. In: Proceedings of the AAAI Conference on Artificial Intelligence. The 38th Annual AAAI Conference on Artificial Intelligence, 20-27 Feb 2024, Vancouver, Canada. Association for the Advancement of Artificial Intelligence (AAAI) , pp. 17889-17897. ISBN 978-1-57735-887-9
Abstract
With the alarming rise of hate speech in online communities, the demand for effective NLP models to identify instances of offensive language has reached a critical point. However, the development of such models heavily relies on the availability of annotated datasets, which are scarce, particularly for less-studied languages. To bridge this gap for the Persian language, we present a novel dataset specifically tailored to multi-label hate speech detection. Our dataset, called Phate, consists of an extensive collection of over seven thousand manually-annotated Persian tweets, offering a rich resource for training and evaluating hate speech detection models on this language. Notably, each annotation in our dataset specifies the targeted group of hate speech and includes a span of the tweet which elucidates the rationale behind the assigned label. The incorporation of these information expands the potential applications of our dataset, facilitating the detection of targeted online harm or allowing the benchmark to serve research on interpretability of hate speech detection models. The dataset, annotation guideline, and all associated codes are accessible at https://github.com/Zahra-D/Phate.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2024 The Authors. Except as otherwise noted, this author-accepted version of a paper published in Proceedings of the AAAI Conference on Artificial Intelligence is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | NLP: Other; NLP: Applications |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 09 Aug 2024 13:22 |
Last Modified: | 12 Aug 2024 13:41 |
Published Version: | http://dx.doi.org/10.1609/aaai.v38i16.29743 |
Status: | Published |
Publisher: | Association for the Advancement of Artificial Intelligence (AAAI) |
Refereed: | Yes |
Identification Number: | 10.1609/aaai.v38i16.29743 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:215377 |