Train sound type identification models

Part of the Two!Ears Auditory Model is the Identity knowledge source: IdentityKS which can be instantiated (multiple times) to identify the type of auditory objects, like “speech”, “fire”, “knock” etc. Each IdentityKS needs a source type model – this example shows one possibility to train such a model. Have a look at the Identification of sound types to see how these models are being used in the Blackboard system.

The base folder for this example is examples/train_identification_model, with the example script file being trainAndTestCleanModel.m. Later in the model training process, new directories with names like Training.2015.08.03.14.57.21.786 will be created by the training pipeline, holding log files of the training, file lists of the used training and testing data, and of course the trained models. These are the models to be used in the IdentityKS, then. To see if everything is working, just run

>> trainAndTestCleanModel;

Example step-through

To dive into the example, load up Matlab, navigate into the example directory, and open trainAndTestCleanModel.m, which contains a function (also usable as a script). Let’s have a look before firing it up!

Start-up

First thing happening in there is the

startTwoEars();

command. This simply start the Two!Ears Auditory Model and adds all necessary paths to your Matlab paths.

Feature and model creators

The next code paragraph first creates the basic pipeline object of type TwoEarsIdTrainPipe, and then sets two defining options: The feature creator and the model creator.

pipe = TwoEarsIdTrainPipe();
pipe.featureCreator = featureCreators.FeatureSet1Blockmean();
pipe.modelCreator = modelTrainers.GlmNetLambdaSelectTrainer( ...
    'performanceMeasure', @performanceMeasures.BAC2, ...
    'cvFolds', 7, ...
    'alpha', 0.99 );

In this case, an L1-regularized sparse logistic regression model will be trained through the use of the GlmNetLambdaSelectTrainer, which is a wrapper for GLMNET. A pile of auditory features will be used in this model, processed and compiled by the FeatureSet1Blockmean feature creator. Have a look into the respective sections to learn more!

Training and testing sets

The models will be trained using a particular set of sounds, specified in the trainset flist. For this example, the IEEE AASP single event sounds serve as training material. There are sounds for several classes like “laughter”, “keys”, “speech”, etc. If you don’t call the trainAndTestCleanModel function with a different class name, a model for the “speech” class will be trained (this is specified in the third line). Irregardless of the class the model is trained for, all sounds listed in the flist (have a look) will be used for training – but only the ones belonging to the model class will serve as “positive” examples.

pipe.trainset = 'learned_models/IdentityKS/trainTestSets/IEEE_AASP_80pTrain_TrainSet_1.flist';
pipe.testset = 'learned_models/IdentityKS/trainTestSets/IEEE_AASP_80pTrain_TestSet_1.flist';

The testset specifies files used for testing the trained model. This is not necessary for the model creation, it only serves as an immediate way of providing feedback about the model performance after training. Of course the testset must only contain files that have not been used for training, to test for generalisation of the model.

Scene configuration

A “clean” scene configuration is used to train this model. That means: the sound sources are positioned at 0° azimuth relative to the head, there is no interfering noise, and no reverberation (free-field conditions). Have a look into the respective training pipeline documentation part to get to know the many possibilities to configure the acoustic training scene.

sc = dataProcs.SceneConfiguration(); % clean
pipe.setSceneConfig( [sc] );

Running the pipeline

After everything is set up, the pipeline has to be initialised and can then be run.

pipe.init();
modelPath = pipe.pipeline.run( {classname}, 0 );

Initialisation can take some time depending on the files for training and testing, and whether they are available through a local copy of the Two!Ears database, through the download cache of the remote Two!Ears database, or whether they have to be downloaded from there first. The time needed for actually running the pipeline can vary substantially, depending on

  • the total accumulated length of sound files used
  • the scene configuration – using reverberation or noise interference makes the binaural simulation take longer
  • the features having to be extracted by the Auditory front-end
  • the type of model (training) – there are big differences here, as the computational effort can be much higher for some models than for others (GLMNET, the one used here, is pretty fast)
  • and whether the files have been processed in this configuration before or not. The pipeline saves intermediate files after each processing stage (binaural simulation, auditory front-end, feature creation) for each sound file and each configuration, and it finds those files later, if a file is to be processed in the same (or partly the same) configuration. This way, a lot of time-consuming preprocessing can be saved. You can try it – interrupt the preprocessing at any moment by hitting ctrl+c, and restart the script. You will see that all processed files/stages won’t be done again.

After successful training and testing, you should see something like

Running: MultiConfigurationsEarSignalProc
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...

Running: MultiConfigurationsAFEmodule
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...

Running: MultiConfigurationsFeatureProc
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...

Running: GatherFeaturesProc
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...

===================================
##   Training model "speech"
===================================


==  Training model on trainSet...


Run on full trainSet...
GlmNet training with alpha=0.990000
   size(x) = 5040x846


Run cv to determine best lambda...
Starting run 1 of CV... GlmNet training with alpha=0.990000
   size(x) = 4111x846

Applying model to test set...
Done. Performance = 0.842686

...

Calculate Performance for all lambdas...................................................Done

==  Testing model on testSet...



===================================
##   "speech" Performance: 0.942548
===================================

 -- Model is saved at C:\projekte\twoEars\twoears-examples\train_identification_model\Training.2015.08.06.15.44.52.582 --
>>

The stated performance is on the test set, and the path afterwards indicates the location of the model on your drive.