Asheghi, NR, Sharoff, S and Markert, K (2016) Crowdsourcing for web genre annotation. Language Resources and Evaluation, 50 (3). pp. 603-641. ISSN 1574-020X
Abstract
Recently, genre collection and automatic genre identification for the web has attracted much attention. However, currently there is no genre-annotated corpus of web pages where inter-annotator reliability has been established, i.e. the corpora are either not tested for inter-annotator reliability or exhibit low inter-coder agreement. Annotation has also mostly been carried out by a small number of experts, leading to concerns with regard to scalability of these annotation efforts and transferability of the schemes to annotators outside these small expert groups. In this paper, we tackle these problems by using crowd-sourcing for genre annotation, leading to the Leeds Web Genre Corpus—the first web corpus which is, demonstrably reliably annotated for genre and which can be easily and cost-effectively expanded using naive annotators. We also show that the corpus is source and topic diverse.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © The Author(s) 2016. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
Keywords: | Genres on the web; Reliability testing; Annotation guidelines; Crowdsourcing |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Arts, Humanities and Cultures (Leeds) > School of Languages Cultures & Societies (Leeds) > Translation Studies (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 31 May 2020 15:28 |
Last Modified: | 31 May 2020 15:29 |
Status: | Published |
Publisher: | Springer Verlag |
Identification Number: | 10.1007/s10579-015-9331-6 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:160780 |
Download
Filename: Asheghi2016_Article_CrowdsourcingForWebGenreAnnota.pdf
Licence: CC-BY 4.0