Speech and crosstalk detection in multichannel audio

Abstract

The analysis of scenarios in which a number of microphones record the activity of speakers, such as in a round-table meeting, presents a number of computational challenges. For example, if each participant wears a microphone, speech from both the microphone's wearer (local speech) and from other participants (crosstalk) is received. The recorded audio can be broadly classified in four ways: local speech, crosstalk plus local speech, crosstalk alone and silence. We describe two experiments related to the automatic classification of audio into these four classes. The first experiment attempted to optimize a set of acoustic features for use with a Gaussian mixture model (GMM) classifier. A large set of potential acoustic features were considered, some of which have been employed in previous studies. The best-performing features were found to be kurtosis, "fundamentalness," and cross-correlation metrics. The second experiment used these features to train an ergodic hidden Markov model classifier. Tests performed on a large corpus of recorded meetings show classification accuracies of up to 96%, and automatic speech recognition performance close to that obtained using ground truth segmentation.

Metadata

Item Type:	Article
Authors/Creators:	Wrigley, S.N. Brown, G.J. Wan, V. Renals, S.
Copyright, Publisher and Additional Information:	Copyright © 2005 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Keywords:	feature extraction, crosstalk, cochannel interference, Gaussian mixture model, hidden Markov models (HMM), speech recognition
Dates:	Published: January 2005
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User:	Sherpa Assistant
Date Deposited:	12 Dec 2005
Last Modified:	06 Jun 2014 12:58
Published Version:	http://dx.doi.org/10.1109/TSA.2004.838531
Status:	Published
Refereed:	Yes
Identification Number:	10.1109/TSA.2004.838531
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:812

CORE (COnnecting REpositories)

Speech and crosstalk detection in multichannel audio

Abstract

Metadata

Download

wrigleysn1

Export

Statistics