Don’t waste a single annotation: improving single-label classifiers through soft labels

Wu, B., Li, Y., Mu, Y. et al. (3 more authors) (2023) Don’t waste a single annotation: improving single-label classifiers through soft labels. In: Findings of the Association for Computational Linguistics: EMNLP 2023. 2023 Conference on Empirical Methods in Natural Language Processing, 06-10 Dec 2023, Singapore. Association for Computational Linguistics , pp. 5347-5355. ISBN 979-8-89176-061-5

Abstract

This paper addresses the limitations of the common data annotation and training methods for objective single-label classification tasks. Typically, in such tasks annotators are only asked to provide a single label for each sample and annotator disagreement is discarded when a final hard label is decided through majority voting. We challenge this traditional approach, acknowledging that determining the appropriate label can be difficult due to the ambiguity and lack of context in the data samples. Rather than discarding the information from such ambiguous annotations, our soft label method makes use of them for training. Our findings indicate that additional annotator information, such as confidence, secondary label and disagreement, can be used to effectively generate soft labels. Training classifiers with these soft labels then leads to improved performance and calibration on the hard label test set.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Wu, B. Li, Y. Mu, Y. Scarton, C. https://orcid.org/0000-0002-0103-4072 Bontcheva, K. https://orcid.org/0000-0001-6152-9600 Song, X. https://orcid.org/0000-0002-4188-6974
Copyright, Publisher and Additional Information:	© 2023 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Dates:	Published (online): December 2023 Published: December 2023
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	13 Feb 2025 16:06
Last Modified:	13 Feb 2025 16:06
Status:	Published
Publisher:	Association for Computational Linguistics
Refereed:	Yes
Identification Number:	10.18653/v1/2023.findings-emnlp.355
Related URLs:	Conference
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:223247

Download

Published Version

Filename: 2023.findings-emnlp.355.pdf

Licence: CC-BY 4.0

CLICK TO DOWNLOAD

[thumbnail of 2023.findings-emnlp.355.pdf]

CORE (COnnecting REpositories)

Don’t waste a single annotation: improving single-label classifiers through soft labels

Abstract

Metadata

Download

Published Version

Export

Statistics