Rate-map (ratemapProc.m)

The rate-map represents a map of auditory nerve firing rates [Brown1994] and is frequently employed as a spectral feature in CASA systems [Wang2006], ASR  [Cooke2001] and speaker identification systems [May2012]. The rate-map is computed for individual frequency channels by smoothing the IHC signal representation with a leaky integrator that has a time constant of typically rm\_decaySec=8 ms. Then, the smoothed IHC signal is averaged across all samples within a time frame and thus the rate-map can be interpreted as an auditory spectrogram. Depending on whether the rate-map scaling rm_scaling has been set to ’magnitude’ or ’power’, either the magnitude or the squared samples are averaged within each time frame. The temporal resolution can be adjusted by the window size rm_wSizeSec and the step size rm_hSizeSec. Moreover, it is possible to control the shape of the window function rm_wname, which is used to weight the individual samples within a frame prior to averaging. The default rate-map parameters are listed in Table 24.

Table 24 List of parameters related to 'ratemap'.
Parameter Default Description
'rm_wname' 'hann' Window type
'rm_wSizeSec' 0.02 Window duration in s
'rm_hSizeSec' 0.01 Window step size in s
'rm_scaling' 'power' Rate-map scaling ('magnitude' or 'power')
'rm_decaySec' 0.008 Leaky integrator time constant in s

The rate-map is demonstrated by the script DEMO_Ratemap and the corresponding plots are presented in Fig. 27. The IHC representation of a speech signal is shown in the left panel, using a bank of 64 gammatone filters spaced between 80 and 8000 Hz. The corresponding rate-map representation scaled in dB is presented in the right panel.

../../../_images/Ratemap.png

Fig. 27 IHC representation of s speech signal using 64 auditory filters (left panel) and the corresponding rate-map representation (right panel).

[Brown1994]Brown, G. J. and Cooke, M. P. (1994), “Computational auditory scene analysis,” Computer Speech and Language 8(4), pp. 297–336.
[Cooke2001]Cooke, M., Green, P., Josifovski, L., and Vizinho, A. (2001), “Robust automatic speech recognition with missing and unreliable acoustic data,” Speech Communication 34(3), pp. 267–285.
[May2012]May, T., van de Par, S., and Kohlrausch, A. (2012), “Noise-robust speaker recognition combining missing data techniques and universal background modeling,” IEEE Transactions on Audio, Speech, and Language Processing 20(1), pp. 108–121.
[Wang2006]Wang, D. L. and Brown, G. J. (Eds.) (2006), Computational Auditory Scene Analysis: Principles, Algorithms and Applications, Wiley / IEEE Press.