Train sound type identification models¶
Part of the Two!Ears Auditory Model is the Identity knowledge source: IdentityKS which can be instantiated (multiple times) to identify the type of auditory objects, like “speech”, “fire”, “knock” etc. Each IdentityKS needs a source type model – this example shows one possibility to train such a model. Have a look at the Identification of sound types to see how these models are being used in the Blackboard system.
The base folder for this example is examples/train_identification_model
,
with the example script file being trainAndTestCleanModel.m
. Later in
the model training process, new directories with names like
Training.2015.08.03.14.57.21.786
will be created by the training pipeline,
holding log files of the training, file lists of the used training and testing
data, and of course the trained models. These are the models to be used in the
IdentityKS, then. To see if everything is working, just run
>> trainAndTestCleanModel;
Example step-through¶
To dive into the example, load up Matlab, navigate into the example directory,
and open trainAndTestCleanModel.m
, which contains a function (also
usable as a script). Let’s have a look before firing it up!
Start-up¶
First thing happening in there is the
startTwoEars();
command. This simply start the Two!Ears Auditory Model and adds all necessary paths to your Matlab paths.
Feature and model creators¶
The next code paragraph first creates the basic pipeline object of type TwoEarsIdTrainPipe, and then sets two defining options: The feature creator and the model creator.
pipe = TwoEarsIdTrainPipe();
pipe.featureCreator = featureCreators.FeatureSet1Blockmean();
pipe.modelCreator = modelTrainers.GlmNetLambdaSelectTrainer( ...
'performanceMeasure', @performanceMeasures.BAC2, ...
'cvFolds', 7, ...
'alpha', 0.99 );
In this case, an L1-regularized sparse logistic regression model will be trained through the use of the GlmNetLambdaSelectTrainer, which is a wrapper for GLMNET. A pile of auditory features will be used in this model, processed and compiled by the FeatureSet1Blockmean feature creator. Have a look into the respective sections to learn more!
Training and testing sets¶
The models will be trained using a particular set of sounds, specified in the
trainset flist. For this example, the IEEE AASP
single event sounds serve as training material. There are
sounds for several classes like “laughter”, “keys”, “speech”, etc. If you don’t
call the trainAndTestCleanModel
function with a different class name, a model
for the “speech” class will be trained (this is specified in the third line).
Irregardless of the class the model is trained for, all sounds listed in the
flist (have a look)
will be used for training – but only the ones belonging to the model class will
serve as “positive” examples.
pipe.trainset = 'learned_models/IdentityKS/trainTestSets/IEEE_AASP_80pTrain_TrainSet_1.flist';
pipe.testset = 'learned_models/IdentityKS/trainTestSets/IEEE_AASP_80pTrain_TestSet_1.flist';
The testset
specifies files used for testing the trained model. This is not
necessary for the model creation, it only serves as an immediate way of
providing feedback about the model performance after training. Of course the
testset
must only contain files that have not been used for training, to test
for generalisation of the model.
Scene configuration¶
A “clean” scene configuration is used to train this model. That means: the sound sources are positioned at 0° azimuth relative to the head, there is no interfering noise, and no reverberation (free-field conditions). Have a look into the respective training pipeline documentation part to get to know the many possibilities to configure the acoustic training scene.
sc = dataProcs.SceneConfiguration(); % clean
pipe.setSceneConfig( [sc] );
Running the pipeline¶
After everything is set up, the pipeline has to be initialised and can then be run.
pipe.init();
modelPath = pipe.pipeline.run( {classname}, 0 );
Initialisation can take some time depending on the files for training and testing, and whether they are available through a local copy of the Two!Ears database, through the download cache of the remote Two!Ears database, or whether they have to be downloaded from there first. The time needed for actually running the pipeline can vary substantially, depending on
- the total accumulated length of sound files used
- the scene configuration – using reverberation or noise interference makes the binaural simulation take longer
- the features having to be extracted by the Auditory front-end
- the type of model (training) – there are big differences here, as the computational effort can be much higher for some models than for others (GLMNET, the one used here, is pretty fast)
- and whether the files have been processed in this configuration before or not. The pipeline saves intermediate files after each processing stage (binaural simulation, auditory front-end, feature creation) for each sound file and each configuration, and it finds those files later, if a file is to be processed in the same (or partly the same) configuration. This way, a lot of time-consuming preprocessing can be saved. You can try it – interrupt the preprocessing at any moment by hitting ctrl+c, and restart the script. You will see that all processed files/stages won’t be done again.
After successful training and testing, you should see something like
Running: MultiConfigurationsEarSignalProc
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...
Running: MultiConfigurationsAFEmodule
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...
Running: MultiConfigurationsFeatureProc
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...
Running: GatherFeaturesProc
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...
===================================
## Training model "speech"
===================================
== Training model on trainSet...
Run on full trainSet...
GlmNet training with alpha=0.990000
size(x) = 5040x846
Run cv to determine best lambda...
Starting run 1 of CV... GlmNet training with alpha=0.990000
size(x) = 4111x846
Applying model to test set...
Done. Performance = 0.842686
...
Calculate Performance for all lambdas...................................................Done
== Testing model on testSet...
===================================
## "speech" Performance: 0.942548
===================================
-- Model is saved at C:\projekte\twoEars\twoears-examples\train_identification_model\Training.2015.08.06.15.44.52.582 --
>>
The stated performance is on the test set, and the path afterwards indicates the location of the model on your drive.