Brierley, C and Atwell, E (2007) Prosodic phrase break prediction: problems in the evaluation of models against a gold standard. TAL Journal: Traitement Automatique des Langues, 48 (1). 187 - 206. ISSN 1248-9433
Abstract
The goal of automatic phrase break prediction is to identify prosodic-syntactic boundaries in text which correspond to the way a native speaker might process or chunk that same text as speech. This is treated as a classification task in machine learning and output predictions from language models are evaluated against a ‘gold standard’: human-labelled prosodic phrase break annotations in transcriptions of recorded speech - the speech corpus. Despite the introduction of rigorous metrics such as precision and recall, the evaluation of phrase break models is still problematic because prosody is inherently variable; morphosyntactic analysis and prosodic annotations for a given text are not representative of the range of parsing and phrasing strategies available to, and exhibited by, native speakers. This article recommends creating automatically-generated POS tagged and prosodically annotated variants of a text to enrich the gold standard and enable more robust ‘noise-tolerant’ evaluation of language models.
Metadata
| Item Type: | Article |
|---|---|
| Authors/Creators: |
|
| Copyright, Publisher and Additional Information: | (c) 2007, Association pour le Traitement Automatique des Langues. This article has been published in the journal ’Traitement Automatique des Langues’ Volume 48, Issue 1: 187-206, 2007, @ATALA. The original manuscript is available on the web site www.atala.org |
| Keywords: | Evaluation; prosody; supervised learning; statistical method |
| Dates: |
|
| Institution: | The University of Leeds |
| Academic Units: | The University of Leeds > Faculty of Engineering & Physical Sciences (Leeds) > School of Computing (Leeds) > Artificial Intelligence & Biological Systems (Leeds) |
| Depositing User: | Symplectic Publications |
| Date Deposited: | 01 Dec 2014 11:05 |
| Last Modified: | 16 Jan 2018 17:34 |
| Published Version: | http://www.atala.org/IMG/pdf/TAL-2007-48-1-08-Brie... |
| Status: | Published |
| Publisher: | Association pour le Traitement Automatique des Langues |
| Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:81657 |
CORE (COnnecting REpositories)
CORE (COnnecting REpositories)