Robinson, D., Zhang, Z. orcid.org/0000-0002-8587-8618 and Tepper, J. (2018) Hate speech detection on Twitter : feature engineering v.s. feature selection. In: Gangemi, A., Gentile, A.L., Nuzzolese, A.G., Rudolph, S., Maleshkova, M., Paulheim, H., Pan, J.Z. and Alam, M., (eds.) The Semantic Web: ESWC 2018 Satellite Events. ESWC: European Semantic Web Conference, 03-07 Jun 2018, Crete, Greece. Springer , pp. 46-49. ISBN 9783319981918
Abstract
The increasing presence of hate speech on social media has drawn significant investment from governments, companies, and empirical research. Existing methods typically use a supervised text classification approach that depends on carefully engineered features. However, it is unclear if these features contribute equally to the performance of such methods. We conduct a feature selection analysis in such a task using Twitter as a case study, and show findings that challenge conventional perception of the importance of manual feature engineering: automatic feature selection can drastically reduce the carefully engineered features by over 90% and selects predominantly generic features often used by many other language related tasks; nevertheless, the resulting models perform better using automatically selected features than carefully crafted task-specific features.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2018 Springer Nature. This is an author-produced version of a paper subsequently published in ESWC 2018 Proceedings. Uploaded in accordance with the publisher's self-archiving policy. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 27 Nov 2019 15:33 |
Last Modified: | 27 Nov 2019 22:41 |
Status: | Published |
Publisher: | Springer |
Refereed: | Yes |
Identification Number: | 10.1007/978-3-319-98192-5_9 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:153929 |