GMM-based localisation under reverberant conditionsΒΆ
The Two!Ears Auditory Model comes with several knowledge sources that work together to estimate the perceived azimuth of a sound source, see Localisation knowledge sources for a summary. One stage of this process is the mapping of the extracted features like ITDs and ILDs to the perceived azimuth angle. This mapping is highly influenced by the environment. For example, if you are in a room the ITD values will look quite different than in the case of an anechoic chamber. That is the reason why we have different knowledge sources that do this mapping: DnnLocationKS, GmmLocationsKS, and ItdLocationKS. ItdLocationKS utilises a simple lookup table for the mapping works well in the case of Prediction of localisation in spatial audio systems. GmmLocationsKS is at the moment trained only for anechoic condition. In this example we have a look at GmmLocationsKS which was trained with a multi-conditional training approach to work under reverberant conditions [MaEtAl2015dnn]. Beside this, GmmLocationsKS works in the same way as DnnLocationKS and connects with LocalisationDecisionKS, and HeadRotationKS to solve front-back confusions.
In this example we will have a look at localisation in a larger room, namely the
BRIR data set measured in TU Berlin, room Auditorium 3, which provides six
different loudspeaker positions as possible sound sources. All files can be
found in the examples/localisation_GMMs
folder which consists of the
following files:
BlackboardNoHeadRotation.xml
Blackboard.xml
estimateAzimuth.m
localise.m
resetBinauralSimulator.m
setupBinauralSimulator.m
The setup is very similar to Localisation with and without head rotations with a few
exceptions. First, the setup of the Binaural simulator is different as we use BRIRs
instead of HRTFs, and have one impulse response set for every sound source.
The initial configuration of the Binaural simulator is provided by the
setupBinauralSimulator
function:
sim = simulator.SimulatorConvexRoom();
set(sim, ...
'BlockSize', 4096, ...
'SampleRate', 44100, ...
'NumberOfThreads', 1, ...
'LengthOfSimulation', 1, ...
'Renderer', @ssr_brs, ...
'Verbose', false, ...
'Sources', {simulator.source.Point()}, ...
'Sinks', simulator.AudioSink(2) ...
);
set(sim.Sinks, ...
'Name', 'Head', ...
'Position', [ 0.00 0.00 0.00]' ...
);
set(sim.Sources{1}, ...
'AudioBuffer', simulator.buffer.Ring(1) ...
);
set(sim.Sources{1}.AudioBuffer, ...
'File', 'sound_databases/grid_subset/s1/bbaf2n.wav' ...
);
Here, we configure it to use the @ssr_brs
renderer which is needed for
BRIRs, define the speech signal to use, but don’t provide a BRIR yet as
this will be done on the fly later on.
We have four different configuration files for setting up the Blackboard system. As
an example, we list the file BlackboardGmm.xml`
:
<?xml version="1.0" encoding="utf-8"?>
<blackboardsystem>
<dataConnection Type="AuditoryFrontEndKS">
<Param Type="double">16000</Param>
</dataConnection>
<KS Name="loc" Type="GmmLocationKS">
<!-- Use MCT-DIFFUSE for full 360 localisation -->
<Param Type="char">MCT-DIFFUSE</Param>
</KS>
<KS Name="dec" Type="LocalisationDecisionKS">
<!-- set to 1 to enable confusion solving (== head rotation) -->
<Param Type="int">1</Param>
</KS>
<KS Name="rot" Type="HeadRotationKS">
<Param Type="ref">robotConnect</Param>
</KS>
<Connection Mode="replaceOld" Event="AgendaEmpty">
<source>scheduler</source>
<sink>dataConnect</sink>
</Connection>
<Connection Mode="replaceOld">
<source>dataConnect</source>
<sink>loc</sink>
</Connection>
<Connection Mode="add">
<source>loc</source>
<sink>dec</sink>
</Connection>
<Connection Mode="replaceOld" Event="RotateHead">
<source>dec</source>
<sink>rot</sink>
</Connection>
</blackboardsystem>
Here, we use different knowledge sources that work together in order to solve
the localisation task. We have AuditoryFrontEndKS for extract auditory cues
from the ear signals, GmmLocationsKS channels, LocalisationDecisionKS, and
HeadRotationKS for the actual localisation task. The Param
tags are
parameters we can pass to the knowledge sources. After setting up which
knowledge sources we will use, we connect them with the Connection
tags.
For more information on configuring the blackboard see
Configuration.
In the other blackboard configuration files we set up a blackboard for the case of GmmLocationsKS without confusion solving by head rotation.
Now, everything is prepared and we can start Matlab in order to perform the localisation. You can just start it and run the following command to see it in action, afterwards we will have a look at what happened:
>> localise
-------------------------------------------------------------------------
Source direction GmmLocationKS w head rot. GmmLocationKS wo head rot.
-------------------------------------------------------------------------
0 0 -180
-52 -55 -100
-131 -135 -140
0 0 -180
30 30 30
-30 -30 -30
------------------------------------------------------------------------
As you can see the model with head rotation returned better results than the model without head rotation enabled.
Now, we have a look into the details of the localise()
function. We will
only talk about the parts that are responsible for the task, not for printing
out the results onto the screen. First, we define the sources we are going to
synthesise and start the Binaural simulator:
brirs = { ...
'impulse_responses/qu_kemar_rooms/auditorium3/QU_KEMAR_Auditorium3_src1_xs+0.00_ys+3.97.sofa'; ...
'impulse_responses/qu_kemar_rooms/auditorium3/QU_KEMAR_Auditorium3_src2_xs+4.30_ys+3.42.sofa'; ...
'impulse_responses/qu_kemar_rooms/auditorium3/QU_KEMAR_Auditorium3_src3_xs+2.20_ys-1.94.sofa'; ...
'impulse_responses/qu_kemar_rooms/auditorium3/QU_KEMAR_Auditorium3_src4_xs+0.00_ys+1.50.sofa'; ...
'impulse_responses/qu_kemar_rooms/auditorium3/QU_KEMAR_Auditorium3_src5_xs-0.75_ys+1.30.sofa'; ...
'impulse_responses/qu_kemar_rooms/auditorium3/QU_KEMAR_Auditorium3_src6_xs+0.75_ys+1.30.sofa'; ...
};
headOrientation = 90; % towards y-axis (facing src1)
sourceAngles = [90, 38.5, -41.4, 90, 120, 60] - headOrientation; % phi = atan2d(ys,xs)
After that we have a loop over the different sources in which we are loading the
corresponding BRIR into the Binaural simulator and run the Blackboard system inside the
estimateAzimuth
function:
for ii = 1:length(sourceAngles)
direction = sourceAngles(ii);
sim.Sources{1}.IRDataset = simulator.DirectionalIR(brirs{ii});
sim.rotateHead(headOrientation, 'absolute');
sim.Init = true;
% GmmLocationKS w head rot.
phi1 = estimateAzimuth(sim, 'BlackboardGmm.xml');
resetBinauralSimulator(sim, headOrientation);
% GmmLocationKS wo head rot.
phi2 = estimateAzimuth(sim, 'BlackboardGmmNoHeadRotation.xml');
sim.ShutDown = true;
end
As we run four different blackboards after each other, we have to reinitialise the Binaural simulator in between.