Exploring auditory-inspired acoustic features for room acoustic parameter estimation from monaural speech

Abstract

Room acoustic parameters that characterize acoustic environments can help to improve signal enhancement algorithms such as for dereverberation, or automatic speech recognition by adapting models to the current parameter set. The reverberation time (RT) and the early-to-late reverberation ratio (ELR) are two key parameters. In this paper, we propose a blind ROom Parameter Estimator (ROPE) based on an artificial neural network that learns the mapping to discrete ranges of the RT and the ELR from single-microphone speech signals. Auditory-inspired acoustic features are used as neural network input, which are generated by a temporal modulation filter bank applied to the speech time-frequency representation. ROPE performance is analyzed in various reverberant environments in both clean and noisy conditions for both fullband and subband RT and ELR estimations. The importance of specific temporal modulation frequencies is analyzed by evaluating the contribution of individual filters to the ROPE performance. Experimental results show that ROPE is robust against different variations caused by room impulse responses (measured versus simulated), mismatched noise levels, and speech variability reflected through different corpora. Compared to state-of-the-art algorithms that were tested in the acoustic characterisation of environments (ACE) challenge, the ROPE model is the only one that is among the best for all individual tasks (RT and ELR estimation from fullband and subband signals). Improved fullband estimations are even obtained by ROPE when integrating speech-related frequency subbands. Furthermore, the model requires the least computational resources with a real time factor that is at least two times faster than competing algorithms. Results are achieved with an average observation window of 3 s, which is important for real-time applications.

Metadata

Item Type:	Article
Authors/Creators:	Xiong, F. Goetze, S. https://orcid.org/0000-0003-1044-7343 Kollmeier, B. Meyer, B.T.
Copyright, Publisher and Additional Information:	© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reproduced in accordance with the publisher's self-archiving policy.
Keywords:	Reverberation time; early-to-late reverberation ratio; blind estimation; auditory-inspired acoustic features; machine learning
Dates:	Accepted: 14 May 2018 Published (online): 4 June 2018 Published: October 2018
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	06 Apr 2020 10:28
Last Modified:	18 May 2020 07:38
Status:	Published
Publisher:	Institute of Electrical and Electronics Engineers (IEEE)
Refereed:	Yes
Identification Number:	10.1109/taslp.2018.2843537
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:159136

CORE (COnnecting REpositories)

Exploring auditory-inspired acoustic features for room acoustic parameter estimation from monaural speech

Abstract

Metadata

Download

Accepted Version

Export

Statistics