BASS, an audio streaming server component¶
BASS is a GenoM3 component in charge of acquiring binaural audio data from a hardware sound interface, and making it available to other components, henceforth termed as its clients. It relies on the ALSA library to communicate with the hardware interface, hence working with any ALSA-compliant interface.
The component offers services to start and stop the acquisition of audio data, and streams the captured data on an output port. A sliding window of the most recent data is kept on the port, the size of which can be set at runtime (for instance, the port can be configured so as to keep the last 2 seconds of acquired signals).
The folder containing source files of the component is named bass-genom3
and
is located in the RoboticPlatform
folder of the software repository. All
files that are referred to in this section are in the bass-genom3
folder.
BASS terminology¶
This section defines the notions and the vocabulary that BASS uses.
- Interface and device
- They are synonyms for the hardware board in charge of converting analog sound signals into digital streams.
- Acquisition and capture
- They are synonyms for retrieving audio data from microphones through an audio interface.
- Binaural audio and channels
- Binaural audio signals consist of two channels (like stereo audio), corresponding to left and right ears.
- Samples and frames
- A sample is a digital value encoding the signal on one channel at one point in time. A frame is a vector of samples, one from each channel, at one point in time (thus for binaural audio, a frame is two samples).
- Chunks
- In capturing state, the sound device regularly delivers blocks of new data to the BASS component. These blocks are called chunks. The size of these chunks (commonly given in number of frames) can be selected at the start of a new acquisition, and is fixed until its end.
Note
The above definitions can differ from other applications where the word frame may refer to data blocks of a few milliseconds. Here, these blocks are rather called chunks, a frame being a single point in time.
Services¶
The services offered by BASS are defined and documented in the description
file bass.gen
. This section lists them and provides additional details.
The
ListDevices
service can be called to display on standard output stream (stdout) the available sound devices that can be selected for the acquisition.The
Acquire
service starts the acquisition of audio data and updates the output port with the captured data (see details about the port in section Output port). This service expects 4 input parameters, shown in Table 1.¶ Name Data type Default value Documentation device
string
"hw:1,0"
ALSA name of the sound device sampleRate
unsigned long
44100
Sample rate in Hz nFramesPerChunk
unsigned long
2205
Chunk size in frames nChunksOnPort
unsigned long
20
Port size in chunks The
device
parameter is the identifier of a sound device to use. The value for one connected device can be retrieved with the aforementionedListDevices
service. ThenFramesPerChunk
parameter is important, as smaller chunks will lead to shorter latency but also higher communication needs between the component and the device. Last, thenChunksOnPort
parameter sets the number of chunks kept on the port. With the default values given above, 20 chunks of 2205 frames is a total of 44100 frames kept on the port, i.e. 1 second of audio data at the default sample rate.The
Acquire
service can return an exception if the configuration of the interface fails (e.g. the requested sample rate is not supported), if a problem occurs during the acquisition (e.g. the interface gets unplugged), etc. If an exception occurs, the user can get more information by reading the error message flushed on standard error stream (stderr).The
Stop
service stops the acquisition of audio data. Note that theAcquire
service also interrupts itself, so a new acquisition with different parameters can directly be started from a running one without having to callStop
.
Output port¶
The data captured by the Acquire
service are streamed on an output port
named Audio
(defined in file bassInterface.gen
). They are gathered
in two arrays, one for each channel, updated with a FIFO design: every time a
new chunk is retrieved from the hardware interface, the content of the arrays is
shifted, deleting the oldest chunk of data and making room to the newest one, as
detailed below.
At the beginning of the acquisition, the arrays are filled with zeros and the first captured chunks are progressively added. For instance, the state of one array before and after adding the 4th chunk is illustrated here (assuming that the port is longer than 4 chunks):
+-------+-------+-------+-------+-------+-------+-------+-------+
| zeros | 1 | 2 | 3 |
| | | | |
+-------+-------+-------+-------+-------+-------+-------+-------+
/ / /
/ / /
/ / /
+-------+-------+-------+-------+-------+-------+-------+-------+
| zeros | 1 | 2 | 3 | 4 |
| | | | | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+
The length of the port is a round number of chunks, set with parameter
nChunksOnPort
of the Acquire
service (noted \(nCOP\) below). The
size of one chunk is also set when calling Acquire
, with parameter
nFramesPerChunk
(noted \(nFPC\) below). Thus, the left and right arrays
contain \(nFPC*nCOP\) samples each. Once the port is entirely filled with
data (all beginning zeros have been erased), the oldest chunk is deleted as a
new chunk arrives:
+-------+-------+-------+-------+-------+-------+-------+-------+
| 1 | 2 | 3 | ... | nCOP-1| nCOP |
| (old) | | | | | |
+-------+-------+-------+-------+-------+-------+-------+-------+
/ / /
/ / /
/ / /
+-------+-------+-------+-------+-------+-------+-------+-------+
| 2 | 3 | 4 | ... | nCOP | nCOP+1|
| (old) | | | | | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+
/ / /
/ / /
/ / /
+-------+-------+-------+-------+-------+-------+-------+-------+
| 3 | 4 | 5 | ... | nCOP+1| nCOP+2|
| | | | | | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+
In order to let the clients keep track of the data and detect any loss, the port
also publishes an index that indicates the number of frames that have been
streamed since the beginning of the acquisition. In other words, it is the index
of the last streamed frame, noted lastFrameIndex
. The data structure of the
Audio
output port (defined in file bassStruct.idl
) is summarised in
Table 2.
Name | Data type | Comment |
---|---|---|
sampleRate |
unsigned long |
sample rate in Hz |
nChunksOnPort |
unsigned long |
number of chunks on the port (\(nCOP\)) |
nFramesPerChunk |
unsigned long |
number of frames per chunk (\(nFPC\)) |
lastFrameIndex |
unsigned long long |
index for tracking data |
left |
sequence<long> |
audio data from left channel |
right |
sequence<long> |
audio data from right channel |
- The fields
sampleRate
,nChunksOnPort
andnFramesPerChunk
are set as input parameters of theAcquire
service. - The fields
left
andright
are dynamic arrays (sequence<long>
type) of \(nFPC*nCOP\) samples. Samples are signed integers coded on 32 bits (long
type). - The index for tracking data is stored in the field
lastFrameIndex
. As this index is incremented by \(nFPC\) frames every time a new chunk is published on the port, it is important to check that it will not overflow. The index is therefore coded as an unsigned integer on 64 bits (unsigned long long
type. With a sample rate of 192kHz for instance, the order of magnitude of the index overflow is a million years).
Example of use¶
The tutorial Stream binaural signals from BASS to Matlab is an example of use of BASS, using the matlab-genomix bridge. It shows how to invoke its services and how to retrieve the streamed audio data in Matlab.