Ravenscroft, W., Goetze, S. orcid.org/0000-0003-1044-7343 and Hain, T. (2023) Deformable temporal convolutional networks for monaural noisy reverberant speech separation. In: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04-10 Jun 2023, Rhodes Island, Greece. Institute of Electrical and Electronics Engineers (IEEE) ISBN 9781728163284
Abstract
Speech separation models are used for isolating individual speakers in many speech processing applications. Deep learning models have been shown to lead to state-of-the-art (SOTA) results on a number of speech separation benchmarks. One such class of models known as temporal convolutional networks (TCNs) has shown promising results for speech separation tasks. A limitation of these models is that they have a fixed receptive field (RF). Recent research in speech dereverberation has shown that the optimal RF of a TCN varies with the reverberation characteristics of the speech signal. In this work deformable convolution is proposed as a solution to allow TCN models to have dynamic RFs that can adapt to various reverberation times for reverberant speech separation. The proposed models are capable of achieving an 11.1 dB average scale-invariant signal-to-distortion ratio (SISDR) improvement over the input signal on the WHAMR benchmark. A relatively small deformable TCN model of 1.3M parameters is proposed which gives comparable separation performance to larger and more computationally complex models.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2023, The Authors. Except as otherwise noted, this author-accepted version of a paper published in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | speech separation; deformable convolution; dynamic neural networks |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number Engineering and Physical Sciences Research Council EP/S023062/1 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 17 May 2023 14:02 |
Last Modified: | 04 Sep 2023 13:17 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Refereed: | Yes |
Identification Number: | 10.1109/icassp49357.2023.10095230 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:199264 |