Peng, X. orcid.org/0000-0001-5787-9982, Zheng, Y., Lin, C. orcid.org/0000-0003-3454-2468 et al. (1 more author) (2021) Summarising historical text in modern languages. In: Merlo, P., Tiedemann, J. and Tsarfaty, R., (eds.) Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. The 16th Conference of the European Chapter of the Association for Computational Linguistics, 19-23 Apr 2021, Virtual conference. Association for Computational Linguistics (ACL) , pp. 3123-3142. ISBN 9781954085022
Abstract
We introduce the task of historical text summarisation, where documents in historical forms of a language are summarised in the corresponding modern language. This is a fundamentally important routine to historians and digital humanities researchers but has never been automated. We compile a high-quality gold-standard text summarisation dataset, which consists of historical German and Chinese news from hundreds of years ago summarised in modern German or Chinese. Based on cross-lingual transfer learning techniques, we propose a summarisation model that can be trained even with no cross-lingual (historical to modern) parallel data, and further benchmark it against state-of-the-art algorithms. We report automatic and human evaluations that distinguish the historic to modern language summarisation task from standard cross-lingual summarisation (i.e., modern to modern language), highlight the distinctness and value of our dataset, and demonstrate that our transfer learning approach outperforms standard cross-lingual benchmarks on this task.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2021 Association for Computational Linguistics. Licensed on a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 12 Aug 2021 10:46 |
Last Modified: | 12 Aug 2021 10:46 |
Published Version: | https://aclanthology.org/2021.eacl-main.273/ |
Status: | Published |
Publisher: | Association for Computational Linguistics (ACL) |
Refereed: | Yes |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:177024 |