BASS, an audio streaming server component

BASS is a GenoM3 component in charge of acquiring binaural audio data from a hardware sound interface, and making it available to other components, henceforth termed as its clients. It relies on the ALSA library to communicate with the hardware interface, hence working with any ALSA-compliant interface.

The component offers services to start and stop the acquisition of audio data, and streams the captured data on an output port. A sliding window of the most recent data is kept on the port, the size of which can be set at runtime (for instance, the port can be configured so as to keep the last 2 seconds of acquired signals).

The folder containing source files of the component is named bass-genom3 and is located in the RoboticPlatform folder of the software repository. All files that are referred to in this section are in the bass-genom3 folder.

BASS terminology

This section defines the notions and the vocabulary that BASS uses.

Interface and device
They are synonyms for the hardware board in charge of converting analog sound signals into digital streams.
Acquisition and capture
They are synonyms for retrieving audio data from microphones through an audio interface.
Binaural audio and channels
Binaural audio signals consist of two channels (like stereo audio), corresponding to left and right ears.
Samples and frames
A sample is a digital value encoding the signal on one channel at one point in time. A frame is a vector of samples, one from each channel, at one point in time (thus for binaural audio, a frame is two samples).
Chunks
In capturing state, the sound device regularly delivers blocks of new data to the BASS component. These blocks are called chunks. The size of these chunks (commonly given in number of frames) can be selected at the start of a new acquisition, and is fixed until its end.

Note

The above definitions can differ from other applications where the word frame may refer to data blocks of a few milliseconds. Here, these blocks are rather called chunks, a frame being a single point in time.

Services

The services offered by BASS are defined and documented in the description file bass.gen. This section lists them and provides additional details.

  • The ListDevices service can be called to display on standard output stream (stdout) the available sound devices that can be selected for the acquisition.

  • The Acquire service starts the acquisition of audio data and updates the output port with the captured data (see details about the port in section Output port). This service expects 4 input parameters, shown in Table 1.

    Table 1 Input parameters of the Acquire service of BASS
    Name Data type Default value Documentation
    device string "hw:1,0" ALSA name of the sound device
    sampleRate unsigned long 44100 Sample rate in Hz
    nFramesPerChunk unsigned long 2205 Chunk size in frames
    nChunksOnPort unsigned long 20 Port size in chunks

    The device parameter is the identifier of a sound device to use. The value for one connected device can be retrieved with the aforementioned ListDevices service. The nFramesPerChunk parameter is important, as smaller chunks will lead to shorter latency but also higher communication needs between the component and the device. Last, the nChunksOnPort parameter sets the number of chunks kept on the port. With the default values given above, 20 chunks of 2205 frames is a total of 44100 frames kept on the port, i.e. 1 second of audio data at the default sample rate.

    The Acquire service can return an exception if the configuration of the interface fails (e.g. the requested sample rate is not supported), if a problem occurs during the acquisition (e.g. the interface gets unplugged), etc. If an exception occurs, the user can get more information by reading the error message flushed on standard error stream (stderr).

  • The Stop service stops the acquisition of audio data. Note that the Acquire service also interrupts itself, so a new acquisition with different parameters can directly be started from a running one without having to call Stop.

Output port

The data captured by the Acquire service are streamed on an output port named Audio (defined in file bassInterface.gen). They are gathered in two arrays, one for each channel, updated with a FIFO design: every time a new chunk is retrieved from the hardware interface, the content of the arrays is shifted, deleting the oldest chunk of data and making room to the newest one, as detailed below.

At the beginning of the acquisition, the arrays are filled with zeros and the first captured chunks are progressively added. For instance, the state of one array before and after adding the 4th chunk is illustrated here (assuming that the port is longer than 4 chunks):

+-------+-------+-------+-------+-------+-------+-------+-------+
|                 zeros                 |   1   |   2   |   3   |
|                                       |       |       |       |
+-------+-------+-------+-------+-------+-------+-------+-------+
                                         /       /       /
                                        /       /       /
                                       /       /       /
+-------+-------+-------+-------+-------+-------+-------+-------+
|             zeros             |   1   |   2   |   3   |   4   |
|                               |       |       |       | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+

The length of the port is a round number of chunks, set with parameter nChunksOnPort of the Acquire service (noted \(nCOP\) below). The size of one chunk is also set when calling Acquire, with parameter nFramesPerChunk (noted \(nFPC\) below). Thus, the left and right arrays contain \(nFPC*nCOP\) samples each. Once the port is entirely filled with data (all beginning zeros have been erased), the oldest chunk is deleted as a new chunk arrives:

+-------+-------+-------+-------+-------+-------+-------+-------+
|   1   |   2   |   3   |          ...          | nCOP-1|  nCOP |
| (old) |       |       |                       |       |       |
+-------+-------+-------+-------+-------+-------+-------+-------+
         /       /                                       /
        /       /                                       /
       /       /                                       /
+-------+-------+-------+-------+-------+-------+-------+-------+
|   2   |   3   |   4   |          ...          |  nCOP | nCOP+1|
| (old) |       |       |                       |       | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+
         /       /                                       /
        /       /                                       /
       /       /                                       /
+-------+-------+-------+-------+-------+-------+-------+-------+
|   3   |   4   |   5   |          ...          | nCOP+1| nCOP+2|
|       |       |       |                       |       | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+

In order to let the clients keep track of the data and detect any loss, the port also publishes an index that indicates the number of frames that have been streamed since the beginning of the acquisition. In other words, it is the index of the last streamed frame, noted lastFrameIndex. The data structure of the Audio output port (defined in file bassStruct.idl) is summarised in Table 2.

Table 2 Data structure of the Audio output port of BASS
Name Data type Comment
sampleRate unsigned long sample rate in Hz
nChunksOnPort unsigned long number of chunks on the port (\(nCOP\))
nFramesPerChunk unsigned long number of frames per chunk (\(nFPC\))
lastFrameIndex unsigned long long index for tracking data
left sequence<long> audio data from left channel
right sequence<long> audio data from right channel
  • The fields sampleRate, nChunksOnPort and nFramesPerChunk are set as input parameters of the Acquire service.
  • The fields left and right are dynamic arrays (sequence<long> type) of \(nFPC*nCOP\) samples. Samples are signed integers coded on 32 bits (long type).
  • The index for tracking data is stored in the field lastFrameIndex. As this index is incremented by \(nFPC\) frames every time a new chunk is published on the port, it is important to check that it will not overflow. The index is therefore coded as an unsigned integer on 64 bits (unsigned long long type. With a sample rate of 192kHz for instance, the order of magnitude of the index overflow is a million years).

Example of use

The tutorial Stream binaural signals from BASS to Matlab is an example of use of BASS, using the matlab-genomix bridge. It shows how to invoke its services and how to retrieve the streamed audio data in Matlab.