Clough, P., Gaizauskas, R.J. and Piao, S.S. (2002) Building and annotating a corpus for the study of journalistic text reuse. In: LREC 2002. The Third International Conference on Language Resources and Evaluation, 29-31 May 2002, Las Palmas, Spain. European Language Resources Association , pp. 1678-1685.
Abstract
In this paper we present the METER Corpus, a novel resource for the study and analysis of journalistic text reuse. The corpus consists of a set of news stories written by the Press Association (PA), the major UK news agency, and a set of stories about the same news events, as published in various British newspapers. In some cases the newspaper stories are rewritten from the PA source; in other cases they have been independently written by the newspapers' own journalists. We discuss the motivation for creating the corpus, its contents, the annotation of certain attributes for analysis of text reuse and finally the encoding of those annotations into a standardised corpus format: the Text Encoding Initiative (TEI).
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2002 The Author(s). Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | Journalistic text reuse; TEI markup; Corpus annotation; Corpus; Paraphrase. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 24 Apr 2014 09:46 |
Last Modified: | 19 Dec 2022 13:26 |
Published Version: | http://www.lrec-conf.org/proceedings/lrec2002/ |
Status: | Published |
Publisher: | European Language Resources Association |
Refereed: | Yes |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:78663 |