Onset strength (onsetProc.m)

According to [Bregman1990], common onsets and offsets across frequency are important grouping cues that are utilised by the human auditory system to organise and integrate sounds originating from the same source. The onset processor is based on the rate-map representation, and therefore, the choice of the rate-map parameters, as listed in Table 24, will influence the output of the onset processor. The temporal resolution is controlled by the window size rm_wSizeSec and the step size rm_hSizeSec, respectively. The amount of temporal smoothing can be adjusted by the leaky integrator time constant rm_decaySec, which reduces the amount of temporal fluctuations in the rate-map. Onset are detected by measuring the frame-based increase in energy of the rate-map representation. This detection is performed based on the logarithmically-scaled energy, as suggested by [Klapuri1999]. It is possible to limit the strength of individual onsets to an upper limit, which is by default set to ons_maxOnsetdB = 30. A list of all parameters is presented in Table 26.

Table 26 List of parameters related to 'onset_strength'
Parameter Default Description
ons_maxOnsetdB 30 Upper limit for onset strength in dB

The resulting onset strength expressed in decibel, which is a function of time frame and frequency channel, is shown in Fig. 28. The two figures can be replicated by running the script DEMO_OnsetStrength.m. When considering speech as an input signal, it can be seen that onsets appear simultaneously across a broad frequency range and typically mark the beginning of an auditory event.

../../../_images/OnsetStrength.png

Fig. 28 Rate-map representation (left panel) of speech and the corresponding onset strength in decibel (right panel).

[Klapuri1999]Klapuri, A. (1999), “Sound onset detection by applying psychoacoustic knowledge,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3089–3092.