White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Building a semantically annotated corpus of clinical texts

Roberts, A., Gaizauskas, R., Hepple, M., Demetriou, G., Guo, Y. and Roberts, I. (2009) Building a semantically annotated corpus of clinical texts. Journal of Biomedical Informatics, 42 (5). pp. 950-966. ISSN 1532-0464

[img] Text

Download (430Kb)


In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains.

Item Type: Article
Copyright, Publisher and Additional Information: © 2009 Elsevier. This is an author produced version of a paper subsequently published in Journal of Biomedical Informatics. Uploaded in accordance with the publisher's self-archiving policy.
Keywords: Corpora; Semantic annotation; Clinical text; Natural language processing; Gold standards; Evaluation; Information extraction; Text mining; Temporal annotation; Annotation guidelines
Institution: The University of Sheffield
Academic Units: The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User: Miss Anthea Tucker
Date Deposited: 23 Nov 2009 14:09
Last Modified: 08 Feb 2013 16:59
Published Version: http://dx.doi.org/10.1016/j.jbi.2008.12.013
Status: Published
Publisher: Elsevier
Identification Number: 10.1016/j.jbi.2008.12.013
URI: http://eprints.whiterose.ac.uk/id/eprint/10186

Actions (repository staff only: login required)