Hu, Q., Ma, N. and Brown, G.J. orcid.org/0000-0001-8565-5476 (2023) Robust binaural sound localisation with temporal attention. In: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Proceedings. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04-10 Jun 2023, Rhodes Island, Greece. Institute of Electrical and Electronics Engineers (IEEE) ISBN 9781728163284
Abstract
Despite there being clear evidence for attentional effects in biological spatial hearing, relatively few machine hearing systems exploit attention in binaural sound localisation. This paper addresses this issue by proposing a novel binaural machine hearing system with temporal attention for robust localisation of sound sources in noisy and reverberant conditions. A convolutional neural network is employed to extract noise-robust localisation features, which are similar to interaural phase difference, directly from phase spectra of the left and right ears for each frame. A temporal attention layer operates on top of these frame-level features by incorporating outputs of a temporal mask estimation module that indicate target dominance within each frame. The combined features are then exploited by fully connected layers, which map them to the corresponding source azimuth. Both the temporal mask estimation module and the sound localisation module are trained jointly in a multi-task learning manner. Our evaluation shows that the proposed system is able to accurately estimate the azimuth of a sound source in various reverberant and noisy conditions.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reproduced in accordance with the publisher's self-archiving policy. |
Keywords: | temporal attention; sound source localisation; temporal mask estimation; multi-task learning; phase spectrum |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 28 Feb 2023 14:58 |
Last Modified: | 05 May 2024 00:13 |
Status: | Published |
Publisher: | Institute of Electrical and Electronics Engineers (IEEE) |
Refereed: | Yes |
Identification Number: | 10.1109/ICASSP49357.2023.10096640 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:196758 |