Train sound type identification models¶
Part of the Two!Ears Auditory Model is the Identity knowledge source: IdentityKS which can be instantiated (multiple times) to identify the type of auditory objects, like “speech”, “fire”, “knock” etc. Each IdentityKS needs a source type model – this example shows one possibility for trainining such a model. Have a look at the Identification of sound types to see how these models are being used in the Blackboard system.
The base folder for this example is examples/train_identification_model
,
with the example script file being trainAndTestCleanModel.m
. Later in
the model training process, new directories with names like
test_1vsAll_training
will be created by the training pipeline,
containing log files of the training, file lists of the used training and testing
data, and of course the trained models. These are the models that can be used with the
IdentityKS. To see if everything is working, just run
>> trainAndTestCleanModel;
Example step-through¶
To dive into the example, start Matlab, navigate into the example directory,
and open trainAndTestCleanModel.m
, which contains a function (also
usable as a script). Let’s have a look before firing it up!
Caching dir¶
The code starts with a snippet that checks for the existence of a cache directory and create it if it does not already exist. The pipeline uses this cache directory for saving and re-using intermediate results.
Feature and model creators¶
The next code paragraph creates the basic pipeline object of type TwoEarsIdTrainPipe, and then sets the basic components: The block creator, the feature creator, the label creator and the model creator.
pipe = TwoEarsIdTrainPipe('cacheSystemDir', cacheSystemDir);
pipe.blockCreator = BlockCreators.MeanStandardBlockCreator( 0.5, 0.5/3 );
pipe.featureCreator = FeatureCreators.FeatureSet5Blockmean();
oneVsRestLabeler = LabelCreators.MultiEventTypeLabeler( ...
'types', {{classname}}, 'negOut', 'rest' );
pipe.labelCreator = oneVsRestLabeler;
pipe.modelCreator = ModelTrainers.GlmNetLambdaSelectTrainer( ...
'performanceMeasure', @PerformanceMeasures.BAC2, ...
'cvFolds', 4, ...
'alpha', 0.99 ); % giving higher numerical stability (instead of 1.0)
In this case, an L1-regularized sparse logistic regression model will be trained
through the use of the GlmNetLambdaSelectTrainer, which is a wrapper
for GLMNET. A pile of auditory features will be used in this model, processed
and compiled by the sec-FeatureSet5Blockmean
feature creator.
The block creator is responsible of cutting these auditory features into blocks for each time-window.
The label creator uses the ground truth files for assigning a label to each block that the model can target.
Have a look into the respective sections to learn more!
Training and testing sets¶
The models will be trained using a particular set of sounds, specified in the
trainset flist. For this example, the IEEE AASP
single event sounds serve as training material. There are
sounds for several classes like “laughter”, “keys”, “speech”, etc. If you don’t
call the trainAndTestCleanModel
function with a different class name, a model
for the “speech” class will be trained (this is specified in the third line).
Regardless of the class the model is trained for, all sounds listed in the
flist (have a look)
will be used for training – but only the ones belonging to the model class will
serve as “positive” examples.
pipe.trainset = 'learned_models/IdentityKS/trainTestSets/IEEE_AASP_80pTrain_TrainSet_1.flist';
Scene configuration¶
A “clean” scene configuration is used to train this model.
sc = SceneConfig.SceneConfiguration();
sc.addSource( SceneConfig.PointSource( ...
'data', SceneConfig.FileListValGen( 'pipeInput' ) ) );
The sound sources are positioned at 0° azimuth relative to the head, there is no interfering noise, and no reverberation (free-field conditions). Have a look into the respective training pipeline documentation part to get to know the many possibilities to configure the acoustic training scene.
Running the pipeline¶
After everything is set up, the pipeline has to be initialised and can then be run.
pipe.init( sc, 'fs', 16000 );
modelPath = pipe.pipeline.run( 'modelName', classname, ...
'modelPath', 'test_1vsAll_training' );
Initialisation can take some time depending on the files for training and testing, and whether they are available through a local copy of the Two!Ears database, through the download cache of the remote Two!Ears database, or whether they have to be downloaded from there first. The time needed for actually running the pipeline can vary substantially, depending on
- the total accumulated length of sound files used
- the scene configuration – using reverberation or noise interference makes the binaural simulation take longer
- the features having to be extracted by the Auditory front-end
- the type of model (training) – there are big differences here, as the computational effort can be much higher for some models than for others (GLMNET, the one used here, is pretty fast)
- and whether the files have been processed in this configuration before or not.
The pipeline saves intermediate files after each processing stage (binaural
simulation, auditory front-end, feature creation) for each sound file and each
configuration, and it finds those files later, if a file is to be processed in
the same (or partly the same) configuration. This way, a lot of time-consuming
preprocessing can be saved. You can try it – interrupt the preprocessing at
any moment by hitting
ctrl+c
, and restart the script. You will see that all processed files/stages won’t be done again.
The arguments 'fs', 16000
indicate that the pipeline will operate with 16 KHz signals.
The pipeline takes care of resampling sounds if they are not already sampled at this rate accordingly.
After successful training, you should see something like
Running: MultiConfigurationsEarSignalProc
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...
Running: MultiConfigurationsAFEmodule
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...
Running: MultiConfigurationsFeatureProc
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...
Running: GatherFeaturesProc
==========================================
.C:\projekte\twoEars\wp1git\tmp\sound_databases\IEEE_AASP\alert\alert11.wav
...
===================================
## Training model "speech"
===================================
== Training model on trainSet...
Run on full trainSet...
GlmNet training with alpha=0.990000
size(x) = 5040x846
Run cv to determine best lambda...
Starting run 1 of CV... GlmNet training with alpha=0.990000
size(x) = 4111x846
...
Calculate Performance for all lambdas...................................................Done
-- Model is saved at C:\projekte\twoEars\twoears-examples\train_identification_model\test_1vsAll_training --
The path returned indicates after running the training process contains the location of the model on your drive.
Model testing¶
A new model has been trained. To test it we repeat the above code, except that this time, we configure the pipeline to use test data instead of training data.
pipe.trainset = [];
pipe.testset = 'learned_models/IdentityKS/trainTestSets/IEEE_AASP_80pTrain_TestSet_1.flist';
The testset
specifies files used for testing the trained model. This is not
necessary for the model creation, it only serves as an immediate way of
providing feedback about the model performance after training. Of course, the
testset
should only contain files that have not been used for training, in order to test
for the model’s generalisation performance.
After successful testing, you should see something like
== Testing model on testSet...
===================================
## "speech" Performance: 0.942548
===================================
-- Model is saved at C:\projekte\twoEars\twoears-examples\train_identification_model\test_1vsAll_testing --
>>
The stated performance is on the test set, and the path afterwards indicates the location of the model on your drive.