He, W. orcid.org/0009-0000-8005-062X, Farrahi, K., Chen, B. et al. (2 more authors) (2024) Representation transfer and data cleaning in multi-views for text simplification. Pattern Recognition Letters, 177. pp. 40-46. ISSN: 0167-8655
Abstract
Representation transfer is a widely used technique in natural language processing. We propose methods of cleaning the dominant dataset of text simplification (TS) WikiLarge in multi-views to remove errors that impact model training and fine-tuning. The results show that our method can effectively refine the dataset. We propose to take the pre-trained text representations from a similar task (e.g., text summarization) to text simplification to conduct a continue-fine-tuning strategy to improve the performance of pre-trained models on TS. This approach will speed up the training and make the model convergence easier. Besides, we also propose a new decoding strategy for simple text generation. It is able to generate simpler and more comprehensible text with controllable lexical simplicity. The experimental results show that our method can achieve good performance on many evaluation metrics.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
Keywords: | Text simplification; Sentence representation; Pre-trained language model; Data cleaning; Decoding |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number ENGINEERING AND PHYSICAL SCIENCE RESEARCH COUNCIL EP/T02450X/1 ROYAL SOCIETY NAF\R2\202209 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 19 Sep 2025 15:51 |
Last Modified: | 19 Sep 2025 15:51 |
Status: | Published |
Publisher: | Elsevier BV |
Refereed: | Yes |
Identification Number: | 10.1016/j.patrec.2023.11.011 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:231955 |