Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions

Ma, N. orcid.org/0000-0002-4112-3109, Brown, G. orcid.org/0000-0001-8565-5476 and May, T. (Accepted: 2015) Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions. In: Interspeech. INTERSPEECH 2015, 06-10 Sep 2015, Dresden, Germany. International Speech Communication Association , pp. 160-164.

Abstract

This paper presents a novel machine-hearing system that ex- ploits deep neural networks (DNNs) and head movements for binaural localisation of multiple speakers in reverberant conditions. DNNs are used to map binaural features, consisting of the complete cross-correlation function (CCF) and interaural level differences (ILDs), to the source azimuth. Our approach was evaluated using a localisation task in which sources were located in a full 360-degree azimuth range. As a result, front- back confusions often occurred due to the similarity of binaural features in the front and rear hemifields. To address this, a head movement strategy was incorporated in the DNN-based model to help reduce the front-back errors. Our experiments show that, compared to a system based on a Gaussian mixture model (GMM) classifier, the proposed DNN system substantially re- duces localisation errors under challenging acoustic scenarios in which multiple speakers and room reverberation are present.