Set up an auditory model

From other auditory modeling frameworks like the Auditory Modeling Toolbox you are most probably used to have one function for an auditory model. In addition you normally first create your complete binaural signal which is then fed into the function. For example, in Auditory Modeling Toolbox the following would return a localisation probability distribution for the binaural input signal:

>> out = may2011(input,fs);

In the Two!Ears Auditory Model things work differently as the model has the ability to interact with the scene. This means that we have no fixed binaural signal, but only a description of the acoustic scene. For an introduction see Set up an acoustic scene. The binaural signal is then created on the fly in a block-based manner during the processing of the auditory model and can be changed if the model decides to interact with the scene, for example by rotating its head. Another difference is that the Two!Ears Auditory Model does not come with different functions for different tasks like localisation, speaker identification and so on. It always has its central brain the Blackboard system in which Knowledge sources are available for different tasks. In the end it is planned that the Blackboard system resolves by itself which knowledge sources are needed in order to solve the given task, at the moment the user has to define which knowledge sources should be active.

Note

If you don’t need the brain unit of the model, but are only interested in extracting whatever auditory cues from your input signals or simulated acoustic scene, you can use the Auditory front-end as a standalone part that extracts auditory cues for you. This is further explained in its Overview section.

In the following we explain this by a simple example. Let’s assume we have the acoustic scene definition from the Binaural renderer tutorial and it is stored in the file binaural_renderer.xml. Now we want to configure the Two!Ears Auditory Model to localise the sound source in that scene. As described above we have to configure the Blackboard system to do so. The configuration is also done via an XML file, which would look like this for the localisation task. The file is called blackboard.xml and can be found in the examples/first_steps/setting_up_an_auditory_model folder:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
<?xml version="1.0" encoding="utf-8"?>
<blackboardsystem>

    <dataConnection Type="AuditoryFrontEndKS"/>

    <KS Name="loc" Type="GmmLocationKS"/>
    <KS Name="dec" Type="LocalisationDecisionKS">
            <Param Type="int">1</Param><!-- enable head rotation -->
    </KS>
    <KS Name="rot" Type="HeadRotationKS">
            <Param Type="ref">robotConnect</Param>
    </KS>

    <Connection Mode="replaceOld" Event="AgendaEmpty">
            <source>scheduler</source>
            <sink>dataConnect</sink>
    </Connection>
    <Connection Mode="replaceOld">
            <source>dataConnect</source>
            <sink>loc</sink>
    </Connection>
    <Connection Mode="add">
            <source>loc</source>
            <sink>dec</sink>
    </Connection>
    <Connection Mode="add" Event="RotateHead">
            <source>dec</source>
            <sink>rot</sink>
    </Connection>

</blackboardsystem>

It looks a little complicated, so let us go through it step by step. First the Blackboard system needs some data it can work on. This data has the form of auditory cues which are extracted by the Auditory front-end. There exists a special knowledge source to get the data into the blackboard, which is handled in line 4. After that in lines 6-12 three other knowledge sources are included for our actual localisation task. Some of them come with parameters like robotConnect for the HeadRotationKS which defines the connection to the system where the head can actually be rotated. It first starts with the GmmLocationsKS, after which the LocalisationDecisionKS then interprets the resulting probability distribution and decides if a head rotation is needed to get better results. In order to allow for the communication between the knowledge sources all the <Connection> tags are used. For a detailed explanation on it have a look at Configuration.

Now after having configured the acoustic scene in binaural_renderer.xml and the blackboard in blackboard.xml we could finally run the model with:

% Initialise binaural simulator
sim = simulator.SimulatorConvexRoom('binaural_renderer.xml');
sim.set('Init', true);
% Initialise blackboard and connect to binaural simulation
bbs = BlackboardSystem(0);
bbs.setRobotConnect(sim);
bbs.buildFromXml('blackboard.xml');
% Run model
bbs.run();

Now the model is finished, but we are of course interested in seeing the results. What you could do here is the following:

>> sourceAzimuth = 63; % real phyiscal position of the source
>> perceivedAzimuths = bbs.blackboard.getData('locationHypothesis');
>> displayLocalisationResults(perceivedAzimuths, sourceAzimuth);

------------------------------------------------------------------------------------
Reference target angle:  63 degrees
------------------------------------------------------------------------------------
Localised source angle:
BlockTime   PerceivedAzimuth        (head orient., relative azimuth)        Probability
------------------------------------------------------------------------------------
  0.56               65 degrees             (  0 degrees,     65 degrees)   0.98
  1.02               65 degrees             (  0 degrees,     65 degrees)   0.98
  1.58               65 degrees             (  0 degrees,     65 degrees)   0.97
  2.04               65 degrees             (  0 degrees,     65 degrees)   0.97
  2.51               65 degrees             (  0 degrees,     65 degrees)   0.92
  3.07               65 degrees             (  0 degrees,     65 degrees)   0.93
------------------------------------------------------------------------------------
Mean localisation error: 2
------------------------------------------------------------------------------------

Here you see for several steps in time the predicted location together with the actual head orientation of the listener and the probability of the predicted location. As the probability was always quite high no head rotation was triggered.

For details on how to view more results, like the extracted binaural cues, have a look at Localisation - looking at the results in detail. And for more details on localisation and on how to switch off the head rotations of the model, see Localisation with and without head rotations.