Brierley, C and Atwell, E (2007) Prosodic phrase break prediction: problems in the evaluation of models against a gold standard. TAL Journal: Traitement Automatique des Langues, 48 (1). 187 - 206. ISSN 1248-9433
Abstract
The goal of automatic phrase break prediction is to identify prosodic-syntactic boundaries in text which correspond to the way a native speaker might process or chunk that same text as speech. This is treated as a classification task in machine learning and output predictions from language models are evaluated against a ‘gold standard’: human-labelled prosodic phrase break annotations in transcriptions of recorded speech - the speech corpus. Despite the introduction of rigorous metrics such as precision and recall, the evaluation of phrase break models is still problematic because prosody is inherently variable; morphosyntactic analysis and prosodic annotations for a given text are not representative of the range of parsing and phrasing strategies available to, and exhibited by, native speakers. This article recommends creating automatically-generated POS tagged and prosodically annotated variants of a text to enrich the gold standard and enable more robust ‘noise-tolerant’ evaluation of language models.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | (c) 2007, Association pour le Traitement Automatique des Langues. This article has been published in the journal ’Traitement Automatique des Langues’ Volume 48, Issue 1: 187-206, 2007, @ATALA. The original manuscript is available on the web site www.atala.org |
Keywords: | Evaluation; prosody; supervised learning; statistical method |
Dates: |
|
Institution: | The University of Leeds |
Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
Depositing User: | Symplectic Publications |
Date Deposited: | 01 Dec 2014 11:05 |
Last Modified: | 16 Jan 2018 17:34 |
Published Version: | http://www.atala.org/IMG/pdf/TAL-2007-48-1-08-Brie... |
Status: | Published |
Publisher: | Association pour le Traitement Automatique des Langues |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:81657 |