GMM-based localisation under reverberant conditions

The Two!Ears Auditory Model comes with several knowledge sources that work together to estimate the perceived azimuth of a sound source, see Localisation knowledge sources for a summary. One stage of this process is the mapping of the extracted features like ITDs and ILDs to the perceived azimuth angle. This mapping is highly influenced by the environment. For example, if you are in a room the ITD values will look quite different than in the case of an anechoic chamber. That is the reason why we have different knowledge sources that do this mapping: DnnLocationKS, GmmLocationsKS, and ItdLocationKS. ItdLocationKS utilises a simple lookup table for the mapping works well in the case of Prediction of localisation in spatial audio systems. GmmLocationsKS is at the moment trained only for anechoic condition. In this example we have a look at GmmLocationsKS which was trained with a multi-conditional training approach to work under reverberant conditions [MaEtAl2015dnn]. Beside this, GmmLocationsKS works in the same way as DnnLocationKS and connects with LocalisationDecisionKS, and HeadRotationKS to solve front-back confusions.

In this example we will have a look at localisation in a larger room, namely the BRIR data set measured in TU Berlin, room Auditorium 3, which provides six different loudspeaker positions as possible sound sources. All files can be found in the examples/localisation_GMMs folder which consists of the following files:

BlackboardNoHeadRotation.xml
Blackboard.xml
estimateAzimuth.m
localise.m
resetBinauralSimulator.m
setupBinauralSimulator.m

The setup is very similar to Localisation with and without head rotations with a few exceptions. First, the setup of the Binaural simulator is different as we use BRIRs instead of HRTFs, and have one impulse response set for every sound source. The initial configuration of the Binaural simulator is provided by the setupBinauralSimulator function:

sim = simulator.SimulatorConvexRoom();
set(sim, ...
    'BlockSize',            4096, ...
    'SampleRate',           44100, ...
    'NumberOfThreads',      1, ...
    'LengthOfSimulation',   1, ...
    'Renderer',             @ssr_brs, ...
    'Verbose',              false, ...
    'Sources',              {simulator.source.Point()}, ...
    'Sinks',                simulator.AudioSink(2) ...
    );
set(sim.Sinks, ...
    'Name',                 'Head', ...
    'Position',             [ 0.00  0.00  0.00]' ...
    );
set(sim.Sources{1}, ...
    'AudioBuffer',          simulator.buffer.Ring(1) ...
    );
set(sim.Sources{1}.AudioBuffer, ...
    'File', 'sound_databases/grid_subset/s1/bbaf2n.wav' ...
    );

Here, we configure it to use the @ssr_brs renderer which is needed for BRIRs, define the speech signal to use, but don’t provide a BRIR yet as this will be done on the fly later on.

We have four different configuration files for setting up the Blackboard system. As an example, we list the file BlackboardGmm.xml`:

<?xml version="1.0" encoding="utf-8"?>
<blackboardsystem>

    <dataConnection Type="AuditoryFrontEndKS">
       <Param Type="double">16000</Param>
    </dataConnection>

    <KS Name="loc" Type="GmmLocationKS">
        <!-- Use MCT-DIFFUSE for full 360 localisation -->
        <Param Type="char">MCT-DIFFUSE</Param>
    </KS>
    <KS Name="dec" Type="LocalisationDecisionKS">
        <!-- set to 1 to enable confusion solving (== head rotation) -->
        <Param Type="int">1</Param>
    </KS>
    <KS Name="rot" Type="HeadRotationKS">
        <Param Type="ref">robotConnect</Param>
    </KS>

    <Connection Mode="replaceOld" Event="AgendaEmpty">
        <source>scheduler</source>
        <sink>dataConnect</sink>
    </Connection>
    <Connection Mode="replaceOld">
        <source>dataConnect</source>
        <sink>loc</sink>
    </Connection>
    <Connection Mode="add">
        <source>loc</source>
        <sink>dec</sink>
    </Connection>
    <Connection Mode="replaceOld" Event="RotateHead">
        <source>dec</source>
        <sink>rot</sink>
    </Connection>

</blackboardsystem>

Here, we use different knowledge sources that work together in order to solve the localisation task. We have AuditoryFrontEndKS for extract auditory cues from the ear signals, GmmLocationsKS channels, LocalisationDecisionKS, and HeadRotationKS for the actual localisation task. The Param tags are parameters we can pass to the knowledge sources. After setting up which knowledge sources we will use, we connect them with the Connection tags. For more information on configuring the blackboard see Configuration.

In the other blackboard configuration files we set up a blackboard for the case of GmmLocationsKS without confusion solving by head rotation.

Now, everything is prepared and we can start Matlab in order to perform the localisation. You can just start it and run the following command to see it in action, afterwards we will have a look at what happened:

>> localise

-------------------------------------------------------------------------
Source direction   GmmLocationKS w head rot.   GmmLocationKS wo head rot.
-------------------------------------------------------------------------
            0                 0                       -180
          -52               -55                       -100
         -131              -135                       -140
            0                 0                       -180
           30                30                         30
          -30               -30                        -30
------------------------------------------------------------------------

As you can see the model with head rotation returned better results than the model without head rotation enabled.

Now, we have a look into the details of the localise() function. We will only talk about the parts that are responsible for the task, not for printing out the results onto the screen. First, we define the sources we are going to synthesise and start the Binaural simulator:

brirs = { ...
    'impulse_responses/qu_kemar_rooms/auditorium3/QU_KEMAR_Auditorium3_src1_xs+0.00_ys+3.97.sofa'; ...
    'impulse_responses/qu_kemar_rooms/auditorium3/QU_KEMAR_Auditorium3_src2_xs+4.30_ys+3.42.sofa'; ...
    'impulse_responses/qu_kemar_rooms/auditorium3/QU_KEMAR_Auditorium3_src3_xs+2.20_ys-1.94.sofa'; ...
    'impulse_responses/qu_kemar_rooms/auditorium3/QU_KEMAR_Auditorium3_src4_xs+0.00_ys+1.50.sofa'; ...
    'impulse_responses/qu_kemar_rooms/auditorium3/QU_KEMAR_Auditorium3_src5_xs-0.75_ys+1.30.sofa'; ...
    'impulse_responses/qu_kemar_rooms/auditorium3/QU_KEMAR_Auditorium3_src6_xs+0.75_ys+1.30.sofa'; ...
    };
headOrientation = 90; % towards y-axis (facing src1)
sourceAngles = [90, 38.5, -41.4, 90, 120, 60] - headOrientation; % phi = atan2d(ys,xs)

After that we have a loop over the different sources in which we are loading the corresponding BRIR into the Binaural simulator and run the Blackboard system inside the estimateAzimuth function:

for ii = 1:length(sourceAngles)
    direction = sourceAngles(ii);
    sim.Sources{1}.IRDataset = simulator.DirectionalIR(brirs{ii});
    sim.rotateHead(headOrientation, 'absolute');
    sim.Init = true;
    % GmmLocationKS w head rot.
    phi1 = estimateAzimuth(sim, 'BlackboardGmm.xml');
    resetBinauralSimulator(sim, headOrientation);
    % GmmLocationKS wo head rot.
    phi2 = estimateAzimuth(sim, 'BlackboardGmmNoHeadRotation.xml');
    sim.ShutDown = true;
end

As we run four different blackboards after each other, we have to reinitialise the Binaural simulator in between.