Dabike, G.R. and Barker, J. orcid.org/0000-0002-1684-5660
(2019)
Automatic lyric transcription from karaoke vocal tracks: resources and a baseline system.
In:
Interspeech 2019 Proceedings.
Interspeech 2019, 15-19 Sep 2019, Graz, Austria.
International Speech Communication Association
, pp. 579-583.
Abstract
Automatic sung speech recognition is a relatively understudied topic that has been held back by a lack of large and freely available datasets. This has recently changed thanks to the release of the DAMP Sing! dataset, a 1100 hour karaoke dataset originating from the social music-making company, Smule. This paper presents work undertaken to define an easily replicable, automatic speech recognition benchmark for this data. In particular, we describe how transcripts and alignments have been recovered from Karaoke prompts and timings; how suitable training, development and test sets have been defined with varying degrees of accent variability; and how language models have been developed using lyric data from the LyricWikia website. Initial recognition experiments have been performed using factored-layer TDNN acoustic models with lattice-free MMI training using Kaldi. The best WER is 19.60% - a new state-of-the-art for this type of data. The paper concludes with a discussion of the many challenging problems that remain to be solved. Dataset definitions and Kaldi scripts have been made available so that the benchmark is easily replicable.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2019 ISCA. Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | Lyrics; Singing; Speech Recognition; Lyrics Transcription; DAMP |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 24 Jan 2025 09:53 |
Last Modified: | 24 Jan 2025 09:53 |
Status: | Published |
Publisher: | International Speech Communication Association |
Refereed: | Yes |
Identification Number: | 10.21437/interspeech.2019-2378 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:221010 |