Setting up an acoustic scene

Contemporary auditory models generally work in a feed-forward manner: pre-recorded or pre-generated signals are simply fed into the model and the model calculates and returns the results.

The Two!Ears Auditory Model behaves quite different in that respect - our framework allows to actively explore the provided acoustic scene: the model’s feedback mechanisms could, for instance, initiate a turn of the audio capture unit (dummy head) in order to disambiguate complex input and to arrive at a more precise scenario analysis. Such active exploration can be achieved, on the one hand, in the real world by using a dummy head in order to record binaural ear signals on the fly. Those dummy head should have a moveable head and will be mounted on a robotic platform to allow for translatory movements. For an introduction to that, see Use a robotic platform. On the other hand, this task can be addressed in a virtual way, setting up a simulated acoustic scene in the Binaural simulator. Below, we focus on the latter method.

There are two ways how the Binaural simulator creates the ear signals. One is to specify the entire acoustic scene using meta-data, pass appropriate HRTFs to the simulator, and, with that, render the scenario using the simulator’s Binaural renderer component. The other way is to use a mixture of meta-data and pre-recorded scene details that could be represented by BRIR or BRS files [Horbach1999]. The Binaural room scanning renderer can use this information (BRIRs) to represent sources placed in a room. Stepping further, BRS files enable the simulation of complex scenarios with loudspeaker arrays driven by spatial audio reproduction techniques. With this method the complete synthesised sound field of those reproduction methods could be investigated by binaural synthesis without the need of setting up all the required loudspeakers.

Note

These files can also be used in perceptual experiments with dynamic binaural synthesis (see for example [Wierstorf2014]); here, our model could help to predict the experimental outcome.

Binaural renderer

In the first example, a KEMAR HRTF data set is used together with a sound stimulus (cello) as static source to the left of the listener. Such an anechoic scene can be defined via the following XML-file. The file is called binaural_renderer.xml and can be found in the Examples/first_steps/setting_up_an_acoustic_scene folder:

<?xml version="1.0" encoding="utf-8"?>
<scene
  Renderer="ssr_binaural"
  BlockSize="4096"
  SampleRate="44100"
  HRIRs="impulse_responses/qu_kemar_anechoic/QU_KEMAR_anechoic_3m.sofa">
  <source Name="Cello"
          Type="point"
          Position="1 2 1.75">
    <buffer ChannelMapping="1"
            Type="fifo"
            File="stimuli/anechoic/instruments/anechoic_cello.wav"/>
  </source>
  <sink Name="Head"
        Position="0 0 1.75"
        UnitX="1 0 0"
        UnitZ="0 0 1"/>
</scene>

The file specifies the renderer type to be used, the simulator’s processing block size (the Two!Ears Auditory Model works in a completely block based manner), the HRTF data set to be loaded from the Database, the audio material provided, and the positions of the sources (here: cello) / sinks (here: KEMAR dummy head).

With that, the scene is defined. To test the correctness of the scene description, without having to invoke the full Two!Ears model framework, the binaural simulation can be run in a standalone version using the following statements:

>> sim = simulator.SimulatorConvexRoom('binaural_renderer.xml');
>> sim.set('Init',true);
>> signal = sim.getSignal();

This yields a variable named signal which stores the simulated binaural audio data. Matlab can replay this signal via:

>> sound(signal,sim.SampleRate);

The data can also be stored in a file using:

>> sim.Sinks.saveFile('binaural_renderer.wav',sim.SampleRate);

To finish the simulation and clean up temporary files, type:

>> sim.set('ShutDown',true);

Binaural room scanning renderer

In the second example the BRS renderer will use BRIR recordings to simulate a source placed within a (reverberant) room. Again, the scene is defined via an XML-file, this time named brs_renderer.xml:

<?xml version="1.0" encoding="utf-8"?>
<scene
  Renderer="ssr_brs"
  BlockSize="4096"
  SampleRate="44100"
  LengthOfSimulation="5.0">
  <source Type="point"
          Name="SoundSource"
          IRs="impulse_responses/qu_kemar_rooms/auditorium3/QU_KEMAR_Auditorium3_src2_xs+4.30_ys+3.42.sofa">
    <buffer ChannelMapping="1"
        Type="noise"/>
  </source>
  <sink Name="Head"
        Position="0 0 0"
        UnitX="0 1 0"
        UnitZ="0 0 1"/>
</scene>

The renderer type is set to ssr_brs and the BRIR is specified within the <source> section. A source position is no longer required, as it is inherently given by the corresponding BRIR measurement. Note that the KEMAR dummy head was looking towards the y-axis during the BRIR measurement. In the scene description, the UnitX vector defines the looking direction of the virtual head; thus UnitX becomes [0 1 0]. The audio material used is synthesised by the simulator’s built-in white noise generator. That way, an infinitely long signal is generated, and the LengthOfSimulation property has to be specified. The following commands allow to listen to the final simulation; the perceptual impression should correspond to a noise source placed in a larger room to the front-right of the listener:

>> sim = simulator.SimulatorConvexRoom('brs_renderer.xml',1);
>> signal = sim.getSignal();
>> sound(signal, sim.SampleRate);

Here, the sim.set('Init', true) line has been omitted; the 1 used in simulator initialisation serves the same purpose.

If you are looking for complex acoustic scenes (e.g., moving around in a room or simulating moving sources), please have a look at the Examples section of the Binaural simulator.

[Horbach1999]Horbach, U., Karamustafaoglu, A., Pellegrini, R., Mackensen, P., Theile, G. (1999), “Design and Applications of a Data-based Auralization System for Surround Sound,” 106th AES Convention, Paper 4976
[Wierstorf2014]Wierstorf, H. (2014), “Perceptual Assessment of Sound Field Synthesis,” PhD-thesis, TU Berlin