Ravenscroft, J.W. orcid.org/0000-0002-0780-3303, Goetze, S. and Hain, T. (2024) On time domain conformer models for monaural speech separation in noisy reverberant acoustic environments. In: 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE ASRU 2023), 16-20 Dec 2023, Taipei, Taiwan. Institute of Electrical and Electronics Engineers (IEEE) ISBN 979-8-3503-0690-3
Abstract
Speech separation remains an important topic for multi-speaker technology researchers. Convolution augmented transformers (conformers) have performed well for many speech processing tasks but have been under-researched for speech separation. Most recent state-of-the-art (SOTA) separation models have been time-domain audio separation networks (TasNets). A number of successful models have made use of dual-path (DP) networks which sequentially process local and global information. Time domain conformers (TD-Conformers) are an analogue of the DP approach in that they also process local and global context sequentially but have a different time complexity function. It is shown that for realistic shorter signal lengths, conformers are more efficient when controlling for feature dimension. Subsampling layers are proposed to further improve computational efficiency. The best TD-Conformer achieves 14.6 dB and 21.2 dB SISDR improvement on the WHAMR and WSJ0-2Mix benchmarks, respectively.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2023 The Authors. Except as otherwise noted, this author-accepted version of a paper published in 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) proceedings is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | speech separation; conformer; speech enhance-ment; time domain; single channel |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number Engineering and Physical Sciences Research Council 2268977 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 25 Oct 2023 10:59 |
Last Modified: | 09 Feb 2024 15:54 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Refereed: | Yes |
Identification Number: | 10.1109/ASRU57964.2023.10389669 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:204584 |