Milner, R. and Hain, T. orcid.org/0000-0003-0939-3464 (2016) DNN-based speaker clustering for speaker diarisation. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Interspeech 2016, 08-12 Sep 2016, San Francisco, USA. , pp. 2185-2189.
Abstract
Speaker diarisation, the task of answering "who spoke when?", is often considered to consist of three independent stages: speech activity detection, speaker segmentation and speaker clustering. These represent the separation of speech and nonspeech, the splitting into speaker homogeneous speech segments, followed by grouping together those which belong to the same speaker. This paper is concerned with speaker clustering, which is typically performed by bottom-up clustering using the Bayesian information criterion (BIC). We present a novel semi-supervised method of speaker clustering based on a deep neural network (DNN) model. A speaker separation DNN trained on independent data is used to iteratively relabel the test data set. This is achieved by reconfiguration of the output layer, combined with fine tuning in each iteration. A stopping criterion involving posteriors as confidence scores is investigated. Results are shown on a meeting task (RT07) for single distant microphones and compared with standard diarisation approaches. The new method achieves a diarisation error rate (DER) of 14.8%, compared to a baseline of 19.9%.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2016 ISCA. This is an author produced version of a paper subsequently published in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Uploaded in accordance with the publisher's self-archiving policy. |
Keywords: | speaker diarisation; speaker separation; deep neural network |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 14 Dec 2016 15:47 |
Last Modified: | 19 Dec 2022 13:35 |
Published Version: | http://doi.org/10.21437/Interspeech.2016-126 |
Status: | Published |
Refereed: | Yes |
Identification Number: | 10.21437/Interspeech.2016-126 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:109281 |