BASS, an audio streaming server component¶
BASS is a GenoM3 component in charge of acquiring binaural audio data from a hardware sound interface, and making it available to other components, henceforth termed as its clients. It relies on the ALSA library to communicate with the hardware interface, hence working with any ALSA-compliant interface.
The component offers services to start and stop the acquisition of audio data, and streams the captured data on an output port. A sliding window of the most recent data is kept on the port, the size of which can be set at runtime (for instance, the port can be configured so as to keep the last 2 seconds of acquired signals).
The folder containing source files of the component is named bass-genom3 and
is located in the RoboticPlatform folder of the software repository. All
files that are referred to in this section are in the bass-genom3 folder.
BASS terminology¶
This section defines the notions and the vocabulary that BASS uses.
- Interface and device
- They are synonyms for the hardware board in charge of converting analog sound signals into digital streams.
- Acquisition and capture
- They are synonyms for retrieving audio data from microphones through an audio interface.
- Binaural audio and channels
- Binaural audio signals consist of two channels (like stereo audio), corresponding to left and right ears.
- Samples and frames
- A sample is a digital value encoding the signal on one channel at one point in time. A frame is a vector of samples, one from each channel, at one point in time (thus for binaural audio, a frame is two samples).
- Chunks
- In capturing state, the sound device regularly delivers blocks of new data to the BASS component. These blocks are called chunks. The size of these chunks (commonly given in number of frames) can be selected at the start of a new acquisition, and is fixed until its end.
Note
The above definitions can differ from other applications where the word frame may refer to data blocks of a few milliseconds. Here, these blocks are rather called chunks, a frame being a single point in time.
Services¶
The services offered by BASS are defined and documented in the description
file bass.gen. This section lists them and provides additional details.
The
ListDevicesservice can be called to display on standard output stream (stdout) the available sound devices that can be selected for the acquisition.The
Acquireservice starts the acquisition of audio data and updates the output port with the captured data (see details about the port in section Output port). This service expects 4 input parameters, shown in Table 1.Table 1 Input parameters of the Acquireservice of BASS¶Name Data type Default value Documentation devicestring"hw:1,0"ALSA name of the sound device sampleRateunsigned long44100Sample rate in Hz nFramesPerChunkunsigned long2205Chunk size in frames nChunksOnPortunsigned long20Port size in chunks The
deviceparameter is the identifier of a sound device to use. The value for one connected device can be retrieved with the aforementionedListDevicesservice. ThenFramesPerChunkparameter is important, as smaller chunks will lead to shorter latency but also higher communication needs between the component and the device. Last, thenChunksOnPortparameter sets the number of chunks kept on the port. With the default values given above, 20 chunks of 2205 frames is a total of 44100 frames kept on the port, i.e. 1 second of audio data at the default sample rate.The
Acquireservice can return an exception if the configuration of the interface fails (e.g. the requested sample rate is not supported), if a problem occurs during the acquisition (e.g. the interface gets unplugged), etc. If an exception occurs, the user can get more information by reading the error message flushed on standard error stream (stderr).The
Stopservice stops the acquisition of audio data. Note that theAcquireservice also interrupts itself, so a new acquisition with different parameters can directly be started from a running one without having to callStop.
Output port¶
The data captured by the Acquire service are streamed on an output port
named Audio (defined in file bassInterface.gen). They are gathered
in two arrays, one for each channel, updated with a FIFO design: every time a
new chunk is retrieved from the hardware interface, the content of the arrays is
shifted, deleting the oldest chunk of data and making room to the newest one, as
detailed below.
At the beginning of the acquisition, the arrays are filled with zeros and the first captured chunks are progressively added. For instance, the state of one array before and after adding the 4th chunk is illustrated here (assuming that the port is longer than 4 chunks):
+-------+-------+-------+-------+-------+-------+-------+-------+
| zeros | 1 | 2 | 3 |
| | | | |
+-------+-------+-------+-------+-------+-------+-------+-------+
/ / /
/ / /
/ / /
+-------+-------+-------+-------+-------+-------+-------+-------+
| zeros | 1 | 2 | 3 | 4 |
| | | | | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+
The length of the port is a round number of chunks, set with parameter
nChunksOnPort of the Acquire service (noted \(nCOP\) below). The
size of one chunk is also set when calling Acquire, with parameter
nFramesPerChunk (noted \(nFPC\) below). Thus, the left and right arrays
contain \(nFPC*nCOP\) samples each. Once the port is entirely filled with
data (all beginning zeros have been erased), the oldest chunk is deleted as a
new chunk arrives:
+-------+-------+-------+-------+-------+-------+-------+-------+
| 1 | 2 | 3 | ... | nCOP-1| nCOP |
| (old) | | | | | |
+-------+-------+-------+-------+-------+-------+-------+-------+
/ / /
/ / /
/ / /
+-------+-------+-------+-------+-------+-------+-------+-------+
| 2 | 3 | 4 | ... | nCOP | nCOP+1|
| (old) | | | | | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+
/ / /
/ / /
/ / /
+-------+-------+-------+-------+-------+-------+-------+-------+
| 3 | 4 | 5 | ... | nCOP+1| nCOP+2|
| | | | | | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+
In order to let the clients keep track of the data and detect any loss, the port
also publishes an index that indicates the number of frames that have been
streamed since the beginning of the acquisition. In other words, it is the index
of the last streamed frame, noted lastFrameIndex. The data structure of the
Audio output port (defined in file bassStruct.idl) is summarised in
Table 2.
| Name | Data type | Comment |
|---|---|---|
sampleRate |
unsigned long |
sample rate in Hz |
nChunksOnPort |
unsigned long |
number of chunks on the port (\(nCOP\)) |
nFramesPerChunk |
unsigned long |
number of frames per chunk (\(nFPC\)) |
lastFrameIndex |
unsigned long long |
index for tracking data |
left |
sequence<long> |
audio data from left channel |
right |
sequence<long> |
audio data from right channel |
- The fields
sampleRate,nChunksOnPortandnFramesPerChunkare set as input parameters of theAcquireservice. - The fields
leftandrightare dynamic arrays (sequence<long>type) of \(nFPC*nCOP\) samples. Samples are signed integers coded on 32 bits (longtype). - The index for tracking data is stored in the field
lastFrameIndex. As this index is incremented by \(nFPC\) frames every time a new chunk is published on the port, it is important to check that it will not overflow. The index is therefore coded as an unsigned integer on 64 bits (unsigned long longtype. With a sample rate of 192kHz for instance, the order of magnitude of the index overflow is a million years).
Example of use¶
The tutorial Stream binaural signals from BASS to Matlab is an example of use of BASS, using the matlab-genomix bridge. It shows how to invoke its services and how to retrieve the streamed audio data in Matlab.