Chen, Jinhui, Takashima, Ryoichi, Guo, Xingchen et al. (4 more authors) (2021) Multimodal Fusion for Indoor Sound Source Localization. Pattern recognition. 107906. ISSN 0031-3203
Abstract
To identify the localization of indoor sound source, especially when attempted using only a single microphone, it is a challenging problem to machine learning. To address these issues, this paper presents a distinct novel solution based on fusing visual and acoustic models. Therefore, we propose two novel approaches. First, to estimate orientation of vocal object in a stable manner, we employ the visual approach as estimation model, where we develop a robust image feature representation method that adopts Fourier analysis to efficiently extract polar descriptors. Second the distance information is estimated by calculating the signal difference between transmit receive ends. To implement these, we use phoneme-level hidden Markov models (HMMs) extracted from clean speech sound, to estimate the acoustic transfer function (ATF), which can capture the speech signal as a network of phoneme HMMs. And using the separated frame sequences of the ATF, we can indicate the signal difference between two positions, which can be used to estimate the distance of sound source. Experimental results show that the proposed method can simultaneously extract the sound source parameters of direction and distance, and thus improves the verification task of sound source localization.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | This is an author-produced version of the published paper. Uploaded in accordance with the publisher’s self-archiving policy. |
Keywords: | Sound source localization,acoustic transfer function,HMM,polar HOG,SVM |
Dates: |
|
Institution: | The University of York |
Academic Units: | The University of York > Faculty of Sciences (York) > Computer Science (York) |
Depositing User: | Pure (York) |
Date Deposited: | 24 Feb 2021 11:50 |
Last Modified: | 02 Apr 2025 23:21 |
Published Version: | https://doi.org/10.1016/j.patcog.2021.107906 |
Status: | Published online |
Refereed: | Yes |
Identification Number: | 10.1016/j.patcog.2021.107906 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:171501 |