Localisation with and without head rotations

The Two!Ears Auditory Model comes with several knowledge sources that work together to estimate the perceived azimuth of a sound source, see Localisation knowledge sources for a summary. The main work is done by the GmmLocationsKS knowledge source that uses ITD and ILD cues provided by the Auditory front-end and compares them with learned cues to azimuth maps. As an output it provides a probability distribution of possible directions for the source. This will be passed on to the LocalisationDecisionKS knowledge source which looks at the probabilities and decides if a clear direction can be extracted from this. If not, it asks HeadRotationKS to rotate the head of the listener (could be in the simulation or of a robot) and start the localisation process again.

In this example we will see how to set up the model to perform a localisation task and how to switch on or off the possibility of the model to rotate its head. This example can be found in the examples/localisation_w_and_wo_head_movements folder which consists of the following files:

BlackboardNoHeadRotation.xml
Blackboard.xml
estimateAzimuth.m
localise.m
SceneDescription.xml

The first file we look at is SceneDescription.xml, it defines the actual acoustic scene in which our virtual head and the sound source will be placed in order to simulate binaural signals. It looks like this:

<?xml version="1.0" encoding="utf-8"?>
<scene
  BlockSize="4096"
  SampleRate="44100"
  MaximumDelay="0.0"
  NumberOfThreads="1"
  LengthOfSimulation = "5"
  HRIRs="impulse_responses/scut_kemar_anechoic/SCUT_KEMAR_anechoic_1m.sofa">
  <source Radius="1.0"
          Mute="false"
          Type="point"
          Name="SoundSource">
    <buffer ChannelMapping="1"
        Type="noise"/>
  </source>
  <sink Name="Head"
        Position="0 0 0"
        UnitX="1 0 0"
        UnitZ="0 0 1"/>
</scene>

Here, we define basic things like the sampling rate, the length of the stimulus, the used HRTF, the source material, the listener position, and the distance between listener and source. For more documentation on specifying an acoustic scene, see Configuration using XML Scene Description.

Note

We don’t specify the exact source azimuth here, as we will choose different azimuth values later on and set them on the fly from within Matlab.

The next thing we have to do is to specify of what components or model should consists and what it should actually do. This is done by selecting appropriate modules for the Blackboard system stage of the model. This can be configured again in a xml file. First we look at the configuration for localisation including head movements (Blackboard.xml):

<?xml version="1.0" encoding="utf-8"?>
<blackboardsystem>

    <dataConnection Type="AuditoryFrontEndKS"/>

    <KS Name="loc" Type="GmmLocationKS"/>
    <KS Name="locDec" Type="LocalisationDecisionKS">
        <Param Type="int">1</Param><!-- enable head rotation -->
    </KS>
    <KS Name="rot" Type="HeadRotationKS">
        <Param Type="ref">robotConnect</Param>
    </KS>

    <Connection Mode="replaceOld" Event="AgendaEmpty">
        <source>scheduler</source>
        <sink>dataConnect</sink>
    </Connection>
    <Connection Mode="replaceOld">
        <source>dataConnect</source>
        <sink>loc</sink>
    </Connection>
    <Connection Mode="add">
        <source>loc</source>
        <sink>locDec</sink>
    </Connection>
    <Connection Mode="replaceOld" Event="RotateHead">
        <source>locDec</source>
        <sink>rot</sink>
    </Connection>

</blackboardsystem>

Here, we use different knowledge sources that work together in order to solve the localisation task. We have AuditoryFrontEndKS for extract auditory cues from the ear signals, GmmLocationsKS, LocalisationDecisionKS, and HeadRotationKS for the actual localisation task. The Param tags are parameters we can pass to the knowledge sources. After setting up which knowledge sources we will use, we connect them with the Connection tags. For more information on configuring the blackboard see Configuration.

In a second configuration file we setting up the same blackboard, but now disabling its ability to turn the head (BlackboardNoHeadRotation.xml):

<?xml version="1.0" encoding="utf-8"?>
<blackboardsystem>

    <dataConnection Type="AuditoryFrontEndKS"/>

    <KS Name="loc" Type="GmmLocationKS"/>
    <KS Name="locDec" Type="LocalisationDecisionKS">
        <Param Type="int">0</Param><!-- disable head rotation -->
    </KS>
    <KS Name="rot" Type="HeadRotationKS">
        <Param Type="ref">robotConnect</Param>
    </KS>

    <Connection Mode="replaceOld" Event="AgendaEmpty">
        <source>scheduler</source>
        <sink>dataConnect</sink>
    </Connection>
    <Connection Mode="replaceOld">
        <source>dataConnect</source>
        <sink>loc</sink>
    </Connection>
    <Connection Mode="add">
        <source>loc</source>
        <sink>locDec</sink>
    </Connection>
    <Connection Mode="replaceOld" Event="RotateHead">
        <source>locDec</source>
        <sink>rot</sink>
    </Connection>

</blackboardsystem>

Now, everything is prepared and we can start Matlab in order to perform the localisation. You can just start it and run the following command to see it in action, afterwards we will have a look at what happened:

>> localise

------------------------------------------------------------------
Source direction        Model w head rot.       Model wo head rot.
------------------------------------------------------------------
   0                            0                    -180
  33                           35                      35
  76                           70                      70
-121                          -55                     -55
------------------------------------------------------------------

As you can see the model with head rotation returned better results as the model without head rotation enabled. The reason why we have problems without head rotations is that we have trained the model with another HRTF data set (QU KEMAR) as we used for the creation of the acoustic scene (SCUT KEMAR).

Now, we have a look into the details of the localise() function. We will only talk about the parts that are responsible for the task, not for printing out the results onto the screen. First we define the source angles we are going to synthesise and start the Binaural simulator:

% Different angles the sound source is placed at
sourceAngles = [0 33 76 239];

% === Initialise binaural simulator
sim = simulator.SimulatorConvexRoom('SceneDescription.xml');
sim.Verbose = false;
sim.Init = true;

After that we have a loop over the different source angles in which we are setting the source position in the Binaural simulator and run two different blackboards after each other, one with, the other one without head rotations:

for direction = sourceAngles

    sim.Sources{1}.set('Azimuth', direction);
    sim.rotateHead(0, 'absolute');
    sim.ReInit = true;

    % GmmLocationKS with head rotation for confusion solving
    bbs = BlackboardSystem(0);
    bbs.setRobotConnect(sim);
    bbs.buildFromXml('Blackboard.xml');
    bbs.run();

    % Reset binaural simulation
    sim.rotateHead(0, 'absolute');
    sim.ReInit = true;

    % GmmLocationKS without head rotation and confusion solving
    bbs = BlackboardSystem(0);
    bbs.setRobotConnect(sim);
    bbs.buildFromXml('BlackboardNoHeadRotation.xml');
    bbs.run();

end