Khare, P., Burel, G., Maynard, D. orcid.org/0000-0002-1773-7020 et al. (1 more author) (2018) Cross-lingual classification of crisis data. In: Vrandecic, D., Bontcheva, K., Suárez-Figueroa, M.C., Presutti, V., Celino, I., Sabou, M., Kaffee, L.-A. and Simperl, E., (eds.) The Semantic Web – ISWC 2018. International Semantic Web Conference (ISWC 2018), 08-12 Oct 2018, Monterey, CA, USA. Lecture Notes in Computer Science, 11136 . Springer Verlag , pp. 617-633. ISBN 978-3-030-00670-9
Abstract
Many citizens nowadays flock to social media during crises to share or acquire the latest information about the event. Due to the sheer volume of data typically circulated during such events, it is necessary to be able to efficiently filter out irrelevant posts, thus focusing attention on the posts that are truly relevant to the crisis. Current methods for classifying the relevance of posts to a crisis or set of crises typically struggle to deal with posts in different languages, and it is not viable during rapidly evolving crisis situations to train new models for each language. In this paper we test statistical and semantic classification approaches on cross-lingual datasets from 30 crisis events, consisting of posts written mainly in English, Spanish, and Italian. We experiment with scenarios where the model is trained on one language and tested on another, and where the data is translated to a single language. We show that the addition of semantic features extracted from external knowledge bases improve accuracy over a purely statistical model.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2018 Springer Nature Switzerland AG. This is an author-produced version of a paper subsequently published in Vrandečić D. et al. (eds) The Semantic Web – ISWC 2018. ISWC 2018. Lecture Notes in Computer Science. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Semantics; Cross-lingual; Multilingual; Crisis informatics; Tweet classification |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 04 Jul 2019 15:44 |
Last Modified: | 19 Dec 2022 13:50 |
Status: | Published |
Publisher: | Springer Verlag |
Series Name: | Lecture Notes in Computer Science |
Refereed: | Yes |
Identification Number: | 10.1007/978-3-030-00671-6_36 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:146652 |