Sound databases¶

This is a collection of different sound databases. A sound database can be anything that comes as a collection of related sound files like a speech corpus.

Speech databases¶

GRID corpus¶

GRID is a large multi-talker audiovisual sentence corpus to support joint computational-behavioural studies in speech perception. In brief, the corpus consists of high-quality audio and video (facial) recordings of 1000 sentences spoken by each of 34 talkers (18 male, 16 female). Sentences are of the form:

put red at G9 now

The database provided here is a subset of the original database containing 360 randomly selected sentences for each speaker. More details about GRID can be found on the GRID website or in the corresponding GRID paper.

License¶

The GRID corpus, together with transcriptions, is freely available for research use.

Usage¶

Each speaker comes with separate folder containing 360 sentences:

sound_databases/grid_subset/s1/bbaf2n.wav
...
sound_databases/grid_subset/s1/swwv9a.wav
sound_databases/grid_subset/s2/bbaf1n.wav
...
sound_databases/grid_subset/s34/swws3n.wav

All available files are listed in:

sound_databases/grid_subset/flist.txt

Acoustic scenes and events¶

IEEE AASP Challenge on Detection and Classification¶

This data set includes stereo recordings of acoustic environmental scenes as well as isolated events (of an office environment). It can be used for classification tasks of acoustic scenes and events.

We have put the isolated sounds into a folder structure processable for the Sound identification training pipeline, removed the printer class, as it was not very suited for the training, and added a “void” class with different kinds of noise (to be used as negative examples during training).

License¶

Creative Commons Attribution 2.0 UK: England & Wales

Usage¶

To use the data base for acoustic event classification, stereo WAV files from the following folders are of interest:

alert/
clearthroat/
cough/
doorslam/
drawer/
keyboard/
keys/
knock/
laughter/
mouse/
pageturn/
pendrop/
phone/
speech/
switch/
void/

In each folder different recordings of the corresponding class are provided along with an annotation file with on- and offset times.

In the scenes/ folder stereo recordings of the following acoustic scenes are provided:

bus

busystreet (busy street with heavy traffic)

office

openairmarket (open-air or semi-open market)

park

quietstreet (quiet street with mild traffic)

restaurant

supermarket

tube (train in the Transport for London, Underground and Overground, train

networks)

tubestation (tube station in the Transport for London, Underground and

Overground, train networks, either subterranean or supraterranean)

Each class contains ten recordings. Each recording is 30 s long. Files are named according to the class name, i.e. classXX.wav where XX is a two-digit, non-consecutive number.