Danso, S, Atwell, E orcid.org/0000-0001-9395-3764 and Johnson, O (2013) Linguistic and statistically derived features for cause of death prediction from verbal autopsy text. In: Gurevych, I, Biemann, C and Zesch, T, (eds.) Language Processing and Knowledge in the Web. 25th International Conference, GSCL, 25-27 Sep 2013, Darmstadt, Germany. Lecture Notes in Artificial Intelligence (8105). Springer , pp. 47-60. ISBN 978-3-642-40721-5
Abstract
Automatic Text Classification (ATC) is an emerging technology with economic importance given the unprecedented growth of text data. This paper reports on work in progress to develop methods for predicting Cause of Death from Verbal Autopsy (VA) documents recommended for use in low-income countries by the World Health Organisation. VA documents contain both coded data and open narrative. The task is formulated as a Text Classification problem and explores various combinations of linguistic and statistical approaches to determine how these may improve on the standard bag-of-words approach using a dataset of over 6400 VA documents that were manually annotated with cause of death. We demonstrate that a significant improvement of prediction accuracy can be obtained through a novel combination of statistical and linguistic features derived from the VA text. The paper explores the methods by which ATC may leads to improved accuracy in Cause of Death prediction.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2013, Springer-Verlag Berlin Heidelberg. This is an author produced version of a paper published in Language Processing and Knowledge in the Web. The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-40722-2_5. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | Verbal Autopsy; Cause of Death Prediction; Features; Text Classification |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 11 Aug 2016 13:12 |
Last Modified: | 19 Jan 2018 13:04 |
Published Version: | http://dx.doi.org/10.1007/978-3-642-40722-2 |
Status: | Published |
Publisher: | Springer |
Series Name: | Lecture Notes in Artificial Intelligence |
Identification Number: | 10.1007/978-3-642-40722-2_5 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:89830 |