Localisation knowledge sources

Four knowledge sources work together to generate hypotheses of sound source azimuths: Location knowledge source, Confusion Detection knowledge source, Confusion Solving knowledge source, and Head Rotation knowledge source.

Location knowledge source: DnnLocationKS

Class DnnLocationKS implements knowledge about the statistical relationship between spatial cues and azimuth locations using DNNs. Currently the DNNs are trained on binaural cues from the Auditory front-end including CCF and ILD cues, as described in more details in [MaEtAl2015dnn].

This knowledge source requires signals from the Auditory front-end and thus inherits from the AuditoryFrontEndDepKS (Section Auditory signal dependent knowledge source superclass: AuditoryFrontEndDepKS) and needs to be bound to the AuditoryFrontEndKS’s KsFiredEvent. The canExecute precondition checks the energy level of the current signal block and localisation takes place only if there is an actual auditory event. After execution, a SourcesAzimuthsDistributionHypothesis containing a probability distribution of azimuth locations is placed on the blackboard (category sourcesAzimuthsDistributionHypotheses) and the event KsFiredEvent is notified.

binds to AuditoryFrontEndKS.KsFiredEvent
writes data category sourcesAzimuthsDistributionHypotheses
triggers event KsFiredEvent

Location knowledge source: GmmLocationKS

Class GmmLocationKS implements knowledge about the statistical relationship between spatial cues and azimuth locations. Currently we model the relationship using GMMs, which are trained on binaural cues from the Auditory front-end including ITD and ILD cues.

This knowledge source requires signals from the Auditory front-end and thus inherits from the AuditoryFrontEndDepKS (Section Auditory signal dependent knowledge source superclass: AuditoryFrontEndDepKS) and needs to be bound to the AuditoryFrontEndKS’s KsFiredEvent. The canExecute precondition checks the energy level of the current signal block and localisation takes place only if there is an actual auditory event. After execution, a SourcesAzimuthsDistributionHypothesis containing a probability distribution of azimuth locations is placed on the blackboard (category sourcesAzimuthsDistributionHypotheses) and the event KsFiredEvent is notified.

binds to AuditoryFrontEndKS.KsFiredEvent
writes data category sourcesAzimuthsDistributionHypotheses
triggers event KsFiredEvent

Confusion detection knowledge source: ConfusionKS

The ConfusionKS checks new location hypotheses and decides whether there is a confusion. A confusion emerges when there are more valid locations in the hypotheses than assumed auditory sources in the scene. In case of a confusion, a ConfusedLocations event is notified and the responsible location hypothesis is placed on the blackboard in the confusionHypotheses category. Otherwise, a PerceivedAzimuth object is added to the blackboard perceivedAzimuths data category, and the standard event is triggered.

binds to {Gmm|Dnn}LocationKS.KsFiredEvent
reads data category sourcesAzimuthsDistributionHypotheses
writes data category confusionHyptheses or perceivedAzimuths
triggers event ConfusedLocations or KsFiredEvent

Confusion solving knowledge source: ConfusionSolvingKS

The ConfusionSolvingKS solves localisation confusions by predicting the location probability distribution after head rotation, and comparing it with new location hypotheses received after head rotation is completed. The canExecute method will wait for new location hypotheses; when there is one, it will check whether the head has been turned, otherwise it will not execute. The confusion is then solved by using the old and the new location hypothesis, and a PerceivedAzimuth object is placed on the blackboard.

binds to ConfusionDetectionKS.ConfusedLocations
reads data category confusionHypotheses, headOrientation and sourcesAzimuthsDistributionHypotheses
writes data category perceivedAzimuths
triggers event KsFiredEvent

Head rotation knowledge source: RotationKS

The RotationKS has knowledge on how to move the robotic head in order to solve confusions in source localisation. If there is no other head rotation already scheduled, the knowledge source uses the robot interface to turn the head.

binds to ConfusionKS.ConfusedLocations
reads data category confusionHypotheses, headOrientation
writes data category headOrientation
[MaEtAl2015dnn]Ma, N., Brown, G. J. and May, T. (2015) Robust localisation of of multiple speakers exploiting deep neural networks and head movements. Proceedings of Interspeech‘15, pp.3302-3306, Dresden, Germany