This is the latest version of this eprint.
Ravenscroft, J. orcid.org/0000-0002-0780-3303, Goetze, S. and Hain, T. (2023) On data sampling strategies for training neural network speech separation models. In: 2023 31st European Signal Processing Conference (EUSIPCO). 31st European Signal Processing Conference (EUSIPCO 2023), 04-08 Sep 2023, Helsinki, Finland. Institute of Electrical and Electronics Engineers (IEEE) ISBN 978-9-4645-9360-0
Abstract
Speech separation remains an important area of multi-speaker signal processing. Deep neural network (DNN) models have attained the best performance on many speech separation benchmarks. Some of these models can take significant time to train and have high memory requirements. Previous work has proposed shortening training examples to address these issues but the impact of this on model performance is not yet well understood. In this work, the impact of applying these training signal length (TSL) limits is analysed for two speech separation models: SepFormer, a transformer model, and Conv-TasNet, a convolutional model. The WJS0-2Mix, WHAMR and Libri2Mix datasets are analysed in terms of signal length distribution and its impact on training efficiency. It is demonstrated that, for specific distributions, applying specific TSL limits results in better performance. This is shown to be mainly due to randomly sampling the start index of the waveforms resulting in more unique examples for training. A SepFormer model trained using a TSL limit of 4.42s and dynamic mixing (DM) is shown to match the best-performing SepFormer model trained with DM and unlimited signal lengths. Furthermore, the 4.42s TSL limit results in a 44% reduction in training time with WHAMR.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2023 The Authors. Except as otherwise noted, this author-accepted version of a proceedings paper published in 2023 31st European Signal Processing Conference (EUSIPCO) is made available via the University of Sheffield Research Publications and Copyright Policy under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ |
Keywords: | speech separation; context modelling; data sampling; speech enhancement; transformer |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number Engineering and Physical Sciences Research Council 2268977 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 21 Jun 2023 12:08 |
Last Modified: | 07 Dec 2023 12:34 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Refereed: | Yes |
Identification Number: | 10.23919/EUSIPCO58844.2023.10289800 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:200684 |
Available Versions of this Item
-
On data sampling strategies for training neural network speech separation models. (deposited 21 Jun 2023 12:02)
- On data sampling strategies for training neural network speech separation models. (deposited 21 Jun 2023 12:08) [Currently Displayed]