Estimating the Number of Sound Sources

Part of the Two!Ears Auditory Model is the Number of Sources knowledge source: NumberOfsourcesKS which can be instantiated to estimate the number of active sound source in an auditory scene. Each NumberOfSourcesKS needs number-of-sources model – this example shows one possibility for trainining such a model.

Step-by-step training of a number-of-sources model

Start Matlab, navigate into the example directory.

Caching dir

The code starts with a snippet that checks for the existence of a cache directory and create it if it does not already exist. The pipeline uses this cache directory for saving and re-using intermediate results.

startTwoEars();

cacheSystemDir = fullfile(getMFilePath(), '..', 'idPipeCache');
if ~exist(cacheSystemDir, 'dir')
    mkdir('../idPipeCache')
    if ~exist(cacheSystemDir, 'dir')
    error(['The cache directory required by the pipeline for saving intermediate data ', ...
        'does not exist and could not be created. Please create it and run the script again.']);
    end
end

Feature and model creators

The next code paragraph creates the basic pipeline object of type TwoEarsIdTrainPipe, and then sets the basic components: The block creator, the feature creator, the label creator and the model creator.

pipe = TwoEarsIdTrainPipe('cacheSystemDir', cacheSystemDir);
pipe.blockCreator = BlockCreators.MeanStandardBlockCreator( 0.5, 0.5/3 );
pipe.ksWrapper = DataProcs.DnnLocKsWrapper();
pipe.featureCreator = FeatureCreators.FeatureSetNSrcDetectionPlusModelOutputs();
pipe.labelCreator = LabelCreators.NumberOfSourcesLabeler( 'srcMinEnergy', -22 );
pipe.modelCreator = ModelTrainers.GlmNetLambdaSelectTrainer( ...
    'performanceMeasure', @PerformanceMeasures.MultinomialBAC, ...,
    'family', 'multinomial', ...
    'cvFolds', 4, ...
    'alpha', 0.99, ...
    'maxDataSize', floor( 2e9/(8*2194) ) );

In this case, an L1-regularized sparse logistic regression model will be trained through the use of the GlmNetLambdaSelectTrainer, which is a wrapper for GLMNET. A pile of auditory features will be used in this model, processed and compiled by the FeatureSetNSrcDetectionPlusModelOutputs feature creator. The block creator is responsible of cutting these auditory features into block each corresponding to a time-window. The label creator uses information from the scene configuration for assigning a label to each block that the model can target. Have a look into the respective sections to learn more!

Training and testing sets

The model will be trained using a particular set of sounds, specified in the trainset flist. For this example, the IEEE AASP single event sounds serve as training material.

pipe.trainset = ‘learned_models/IdentityKS/trainTestSets/IEEE_AASP_80pTrain_TrainSet_1.flist’; pipe.setupData();

Scene configuration

A “clean” scene configuration is used to train this model.

sc = SceneConfig.SceneConfiguration();
sc(1).addSource( SceneConfig.PointSource( ...
        'data', SceneConfig.FileListValGen( 'pipeInput' ) ) );
sc(1).addSource( SceneConfig.PointSource( ...
        'data', SceneConfig.FileListValGen( ...
               pipe.pipeline.trainSet('fileLabel',{{'type',{'general'}}},'fileName') ),...
        'offset', SceneConfig.ValGen( 'manual', 0 ) ),...
        'snr', SceneConfig.ValGen( 'manual', 10 ),...
        'loop', 'randomSeq', ...
        'azimuth', SceneConfig.ValGen( 'manual', 90 ) );
sc(1).addSource( SceneConfig.PointSource( ...
        'data', SceneConfig.FileListValGen( ...
               pipe.pipeline.trainSet('fileLabel',{{'type',{'general'}}},'fileName') ),...
        'offset', SceneConfig.ValGen( 'manual', 0 ) ),...
        'snr', SceneConfig.ValGen( 'manual', 10 ),...
        'loop', 'randomSeq', ...
        'azimuth', SceneConfig.ValGen( 'manual', -90 ) );

The sound sources are positioned at 0°, 90 and -90° relative. There is no reverberation (free-field conditions). Have a look into the respective training pipeline documentation part to get to know the many possibilities to configure the acoustic training scene.

Running the pipeline

After everything is set up, the pipeline has to be initialised and can then be run.

pipe.init( sc, 'fs', 16000 );
modelPath = pipe.pipeline.run( 'modelName', 'nSrcs', ...
                               'modelPath', 'nSrcs_training' );

Initialisation can take some time depending on the files for training and testing, and whether they are available through a local copy of the Two!Ears database, through the download cache of the remote Two!Ears database, or whether they have to be downloaded from there first. The time needed for actually running the pipeline can vary substantially, depending on

  • the total accumulated length of sound files used
  • the scene configuration – using reverberation or noise interference makes the binaural simulation take longer
  • the features having to be extracted by the Auditory front-end
  • the type of model (training) – there are big differences here, as the computational effort can be much higher for some models than for others (GLMNET, the one used here, is pretty fast)
  • and whether the files have been processed in this configuration before or not. The pipeline saves intermediate files after each processing stage (binaural simulation, auditory front-end, feature creation) for each sound file and each configuration, and it finds those files later, if a file is to be processed in the same (or partly the same) configuration. This way, a lot of time-consuming preprocessing can be saved. You can try it – interrupt the preprocessing at any moment by hitting ctrl+c, and restart the script. You will see that all processed files/stages won’t be done again.

The arguments 'fs', 16000 indicate that the pipeline will operate with 16 KHz signals. The pipeline takes care of resampling sounds if they are not already sampled at this rate accordingly.

After successful training, you should see something like

Running: MultiConfigurationsEarSignalProc
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...

Running: MultiConfigurationsAFEmodule
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...

Running: MultiConfigurationsFeatureProc
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...

Running: GatherFeaturesProc
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...

===================================
##   Training model "nSrcs"
===================================


==  Training model on trainSet...


Run on full trainSet...
GlmNet training with alpha=0.990000
   size(x) = 5040x1


Run cv to determine best lambda...
Starting run 1 of CV... GlmNet training with alpha=0.990000
   size(x) = 4111x1

...

Calculate Performance for all lambdas...................................................Done

-- Model is saved at C:\projekte\twoEars\twoears-examples\train_nSrcs_model\nSrcs_training --

The path returned indicates after running the training process contains the location of the model on your drive.

Model testing

A new model has been trained. To test it we repeat the above code, except that this time, we configure the pipeline to use test data instead of training data.

pipe.trainset = [];
pipe.testset = 'learned_models/IdentityKS/trainTestSets/IEEE_AASP_80pTrain_TestSet_1.flist';

The testset specifies files used for testing the trained model. This is not necessary for the model creation, it only serves as an immediate way of providing feedback about the model performance after training. Of course, the testset should only contain files that have not been used for training, in order to test for the model’s generalisation performance.

We will also replace the call for running the pipeline with:

modelPath = pipe.pipeline.run( 'modelName', 'nSrcs', ...
                               'modelPath', 'nSrcs_testing' );

Thus providing a new location for saving test results.

After successful testing, you should see something like

==  Testing model on testSet...



===================================
##   "nSrcs" Performance: 0.848237
===================================

 -- Model is saved at C:\projekte\twoEars\twoears-examples\train_nSrcs_model\nSrcs_testing --
>>

The stated performance is on the test set, and the path afterwards indicates the location of the model on your drive.