Evidence extraction for automated medical coding: preliminary evaluation

Jiang, X. orcid.org/0000-0003-4255-5445, Khan, K. orcid.org/0009-0008-0588-1974, Vasantha, S.T. orcid.org/0009-0001-1935-5552 et al. (1 more author) (2025) Evidence extraction for automated medical coding: preliminary evaluation. In: NLPIR '24: Proceedings of the 2024 8th International Conference on Natural Language Processing and Information Retrieval. NLPIR 2024: 2024 8th International Conference on Natural Language Processing and Information Retrieval, 13-15 Dec 2024, Okayama University, Japan. Association for Computing Machinery (ACM) , pp. 18-23. ISBN 9798400717383

Abstract

Coding clinical texts in standard language such as ICD is an important but tedious and error-prone process. Automated medical coding algorithms suffer problems due to the combined the challenge of handling the significant length of clinical text, the complexity of the huge code hierarchy and the lack of interpretability to ensure user trust. Large language models (LLM) have also been proven struggling with this task in recent studies. Recent efforts have been made to annotate an evidence-supported medical coding dataset. The current study makes the first empirical investigation into how well (small) fine-tuned pretrained language models (PLM) and LLMs could identify the sentences containing medical evidence supporting the assigned codes. Hierarchical sequential sentence classification and GPT-3.5 in the zero-shot setting were tested for evidence sentence extraction. Extra evaluation was performed to investigate how evidence extraction impacts clinical coding and what implications it has towards the future generation algorithms for automated medical coding.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Jiang, X. https://orcid.org/0000-0003-4255-5445 Khan, K. https://orcid.org/0009-0008-0588-1974 Vasantha, S.T. https://orcid.org/0009-0001-1935-5552 Haider, S. https://orcid.org/0000-0002-4458-1594
Copyright, Publisher and Additional Information:	© 2024 The Authors. Except as otherwise noted, this author-accepted version of a journal article published in NLPIR '24: Proceedings of the 2024 8th International Conference on Natural Language Processing and Information Retrieval is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/
Keywords:	Information and Computing Sciences; Language, Communication and Culture; Linguistics; Clinical Research
Dates:	Published (online): 13 April 2025 Published: 13 April 2025
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	17 Jun 2025 13:40
Last Modified:	18 Jun 2025 02:58
Status:	Published
Publisher:	Association for Computing Machinery (ACM)
Refereed:	Yes
Identification Number:	10.1145/3711542.3711580
Related URLs:	Author Conference
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:227917

CORE (COnnecting REpositories)

Evidence extraction for automated medical coding: preliminary evaluation

Abstract

Metadata

Download

Accepted Version

Export

Statistics