Audio streaming¶
BASS, an audio streaming server component¶
BASS is a GenoM3 component in charge of acquiring binaural audio data from a hardware sound interface, and making it available to other components, henceforth termed as its clients. It relies on the ALSA library to communicate with the hardware interface, hence working with any ALSA-compliant interface.
The component offers services to start and stop the acquisition of audio data, and streams the captured data on an output port. A sliding window of the most recent data is kept on the port, the size of which can be set at runtime (for instance, the port can be configured so as to keep the last 2 seconds of acquired signals).
The folder containing source files of the component is named bass-genom3
and
is located in the RoboticPlatform
folder of the software repository. All
files that are referred to in this section are in the bass-genom3
folder.
BASS terminology¶
This section defines the notions and the vocabulary that BASS uses.
- Interface and device
- They are synonyms for the hardware board in charge of converting analog sound signals into digital streams.
- Acquisition and capture
- They are synonyms for retrieving audio data from microphones through an audio interface.
- Binaural audio and channels
- Binaural audio signals consist of two channels (like stereo audio), corresponding to left and right ears.
- Samples and frames
- A sample is a digital value encoding the signal on one channel at one point in time. A frame is a vector of samples, one from each channel, at one point in time (thus for binaural audio, a frame is two samples).
- Chunks
- In capturing state, the sound device regularly delivers blocks of new data to the BASS component. These blocks are called chunks. The size of these chunks (commonly given in number of frames) can be selected at the start of a new acquisition, and is fixed until its end.
Note
The above definitions can differ from other applications where the word frame may refer to data blocks of a few milliseconds. Here, these blocks are rather called chunks, a frame being a single point in time.
Services¶
The services offered by BASS are defined and documented in the description
file bass.gen
. This section lists them and provides additional details.
The
ListDevices
service can be called to display on standard output stream (stdout) the available sound devices that can be selected for the acquisition.The
Acquire
service starts the acquisition of audio data and updates the output port with the captured data (see details about the port in section Output port). This service expects 4 input parameters, shown in Table 1.¶ Name Data type Default value Documentation device
string
"hw:1,0"
ALSA name of the sound device sampleRate
unsigned long
44100
Sample rate in Hz nFramesPerChunk
unsigned long
2205
Chunk size in frames nChunksOnPort
unsigned long
20
Port size in chunks The
device
parameter is the identifier of a sound device to use. The value for one connected device can be retrieved with the aforementionedListDevices
service. ThenFramesPerChunk
parameter is important, as smaller chunks will lead to shorter latency but also higher communication needs between the component and the device. Last, thenChunksOnPort
parameter sets the number of chunks kept on the port. With the default values given above, 20 chunks of 2205 frames is a total of 44100 frames kept on the port, i.e. 1 second of audio data at the default sample rate.The
Acquire
service can return an exception if the configuration of the interface fails (e.g. the requested sample rate is not supported), if a problem occurs during the acquisition (e.g. the interface gets unplugged), etc. If an exception occurs, the user can get more information by reading the error message flushed on standard error stream (stderr).The
Stop
service stops the acquisition of audio data. Note that theAcquire
service also interrupts itself, so a new acquisition with different parameters can directly be started from a running one without having to callStop
.
Output port¶
The data captured by the Acquire
service are streamed on an output port
named Audio
(defined in file bassInterface.gen
). They are gathered
in two arrays, one for each channel, updated with a FIFO design: every time a
new chunk is retrieved from the hardware interface, the content of the arrays is
shifted, deleting the oldest chunk of data and making room to the newest one, as
detailed below.
At the beginning of the acquisition, the arrays are filled with zeros and the first captured chunks are progressively added. For instance, the state of one array before and after adding the 4th chunk is illustrated here (assuming that the port is longer than 4 chunks):
+-------+-------+-------+-------+-------+-------+-------+-------+
| zeros | 1 | 2 | 3 |
| | | | |
+-------+-------+-------+-------+-------+-------+-------+-------+
/ / /
/ / /
/ / /
+-------+-------+-------+-------+-------+-------+-------+-------+
| zeros | 1 | 2 | 3 | 4 |
| | | | | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+
The length of the port is a round number of chunks, set with parameter
nChunksOnPort
of the Acquire
service (noted \(nCOP\) below). The
size of one chunk is also set when calling Acquire
, with parameter
nFramesPerChunk
(noted \(nFPC\) below). Thus, the left and right arrays
contain \(nFPC*nCOP\) samples each. Once the port is entirely filled with
data (all beginning zeros have been erased), the oldest chunk is deleted as a
new chunk arrives:
+-------+-------+-------+-------+-------+-------+-------+-------+
| 1 | 2 | 3 | ... | nCOP-1| nCOP |
| (old) | | | | | |
+-------+-------+-------+-------+-------+-------+-------+-------+
/ / /
/ / /
/ / /
+-------+-------+-------+-------+-------+-------+-------+-------+
| 2 | 3 | 4 | ... | nCOP | nCOP+1|
| (old) | | | | | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+
/ / /
/ / /
/ / /
+-------+-------+-------+-------+-------+-------+-------+-------+
| 3 | 4 | 5 | ... | nCOP+1| nCOP+2|
| | | | | | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+
In order to let the clients keep track of the data and detect any loss, the port
also publishes an index that indicates the number of frames that have been
streamed since the beginning of the acquisition. In other words, it is the index
of the last streamed frame, noted lastFrameIndex
. The data structure of the
Audio
output port (defined in file bassStruct.idl
) is summarised in
Table 2.
Name | Data type | Comment |
---|---|---|
sampleRate |
unsigned long |
sample rate in Hz |
nChunksOnPort |
unsigned long |
number of chunks on the port (\(nCOP\)) |
nFramesPerChunk |
unsigned long |
number of frames per chunk (\(nFPC\)) |
lastFrameIndex |
unsigned long long |
index for tracking data |
left |
sequence<long> |
audio data from left channel |
right |
sequence<long> |
audio data from right channel |
- The fields
sampleRate
,nChunksOnPort
andnFramesPerChunk
are set as input parameters of theAcquire
service. - The fields
left
andright
are dynamic arrays (sequence<long>
type) of \(nFPC*nCOP\) samples. Samples are signed integers coded on 32 bits (long
type). - The index for tracking data is stored in the field
lastFrameIndex
. As this index is incremented by \(nFPC\) frames every time a new chunk is published on the port, it is important to check that it will not overflow. The index is therefore coded as an unsigned integer on 64 bits (unsigned long long
type. With a sample rate of 192kHz for instance, the order of magnitude of the index overflow is a million years).
Example of use¶
The tutorial Stream binaural signals from BASS to Matlab is an example of use of BASS, using the matlab-genomix bridge. It shows how to invoke its services and how to retrieve the streamed audio data in Matlab.
Writing a client of BASS¶
This section provides information about designing clients of the BASS component. It first defines a formal algorithm that clients could use, and shows a sample implementation in a GenoM3 component called BASC.
An algorithm for clients of BASS¶
A client of BASS can face many different situations:
- It may need just a single block of data (e.g. we could think of a client requesting 2 seconds of audio data to learn noise properties), and the block may be longer than the total size of the port.
- It may indefinitely request new blocks of data, with the requirement that they follow each other without frame loss between two consecutive blocks.
- It may request data faster than the port update rate. Or conversely, it may not read the port often enough, leading to data loss. And in case of data loss, it must know how many frames were lost.
Let us define \(nFOP\) as the total number of frames on the port. The data
structure used on the Audio
port, of type portStruct
, is recalled in
Table 3.
Field | Data type | Comment |
---|---|---|
sampleRate |
unsigned integer | sample rate in Hz |
nChunksOnPort |
unsigned integer | number of chunks on the port (\(nCOP\)) |
nFramesPerChunk |
unsigned integer | number of frames per chunk (\(nFPC\)) |
lastFrameIndex |
unsigned integer | index for tracking data |
left[nFOP] |
array of integers | arrays of \(nFOP\) samples, with \(nFOP = nFPC * nCOP\) |
right[nFOP] |
array of integers |
The left
and right
fields are arrays of \(nFOP\) samples each,
updated as FIFOs (c.f. related section in BASS documentation). What the client need is to copy blocks of given size
\(N\) from these arrays. Let us define a function getAudioData
, taking
\(N\) as input and returning one copied block.
In order not to miss any frame between blocks retrieved with two consecutive
calls, the getAudioData
function also takes both as input and output an
index of the next frame to be read, noted \(nfr\). For instance, the
client get a first block of \(N\) frames starting from a given index
\(nfr\). The function must return, along with the block of data, the new
value of the index, corresponding to the next frame right after the first
retrieved block. Then, the client can call the function again with this new
value for \(nfr\), so as to get the second block starting from this point.
For the very first block, the client can choose \(nfr\) according to the
current value of the lastFrameIndex
.
- If it wants to pick data from the existing frames on the port, \(nfr\) is
chosen to be less than
lastFrameIndex
. - If it wants to get fresher data (frames that are not yet on the port but will
be published shortly on it), \(nfr\) is chosen to be greater than
lastFrameIndex
.
With a call to getAudioData
, the client requests a block of given size
\(N\), but the function may not be able to return a full block of \(N\)
frames. Indeed, the ending point for the desired block may be a frame that is
not yet published by the server. So, the function returns the number \(n\)
of frames it can get (\(n \le N\)). The client can then call the function
again and ask for the remaining frames in a loop until the requested block is
complete. Here are examples of when this can occur:
- The client requests data more often than they are captured by the microphones (which should be the regular case, because a slower client will end up losing data).
- The client requests a block which is longer than the total number of frames on the port (\(N > nFOP\)).
- At first call, if the client sets \(nfr\) to a greater value than
lastFrameIndex
, thengetAudioData
will not return any available frame (\(n = 0\)). But the client can keep calling the function in a loop until it gets the requested frames.
Finally, in case of data loss, getAudioData
also returns the number of
frames that were lost. The retrieved block then starts at the first frame that
is still available on the port (the oldest one).
Below is the getAudioData
function written with formal syntax:
function: getAudioData
|
| inputs: integer N (number of frames the client wants to get)
| portStruct Audio (data from the the output port of bass)
|
| outputs: integer n (number of frames the function was able to get)
| integer loss (number of lost frames, 0 if no loss)
| array(int) l[n] (the retrieved data block from left channel)
| array(int) r[n] (the retrieved data block from right channel)
|
| in&out: integer nfr (index of the Next Frame to Read)
|
| local: integer nFOP (total number of Frames On the Port)
| integer lfi (Index of the Last Frame on the port)
| integer ofi (Index of the Oldest Frame on the port)
| integer pos (current position in the left and right arrays)
|
| algorithm
| |
| | nFOP <- Audio.nFramesPerChunk * Audio.nChunksOnPort
| | lfi <- Audio.lastFrameIndex
| | ofi <- max(0, lfi - nFOP + 1) //if the acquisition just started and the
| | //port is not full yet, ofi equals 0
| | /* Detect a data loss */
| | loss <- 0
| | if (nfr < ofi)
| | | loss <- ofi - nfr
| | | nfr <- ofi
| | end if
| |
| | /* Compute the starting position in the left and right input arrays */
| | pos <- nFOP - (lfi - nfr + 1)
| |
| | /* Fill the output arrays l and r */
| | n <- 0
| | while (n < N AND pos < nFOP)
| | | l[n] <- Audio.left[pos]
| | | r[n] <- Audio.right[pos]
| | | n <- n + 1
| | | pos <- pos + 1
| | | nfr <- nfr + 1
| | end while
| |
| | return (l[], r[], n, nfr, loss)
| |
| end algorithm
|
end function
Sample implementation in a GenoM3 component¶
BASC is a GenoM3 component that acts as a client of BASS. It can be connected to the output port of BASS, and implements the above generic algorithm to retrieve blocks of audio signals. BASC does not perform any processing on the data. It only shows what a GenoM3 client of BASS could look like.
The folder containing source files of the component is named basc-genom3
and
is located in the RoboticPlatform
folder of the software repository. All
files that are referred to in this section are in the basc-genom3
folder.
Services¶
BASC runs the above algorithm in a service called GetBlocks
, defined in
description file basc.gen
. It expects the following input parameters:
Name Data type Default value Documentation nBlocks
unsigned long
1
Amount of blocks, 0 for unlimited nFramesPerBlock
unsigned long
12000
Block size in frames startOffs
long
-12000
Starting offset (past < 0, future > 0)
The first parameter nBlocks
sets the number of blocks that the service must
get, or can be set to 0 to run the service indefinitely. The second parameter
nFramesPerBlock
sets the block size (\(N\) in the previous
section). Last, the startOffs
parameter sets the index of the first frame to
read, relatively to the current value of the Last Frame Index on the port. For
instance, if startOffs
is -1000 and lastFrameIndex
is 43000 when the
service is called, then the first frame to read (\(nfr\) in previous
section) will have index 42000. With the given default values, GetBlocks
will get 1 block of 12000 frames, taking the last 12000 frames on the port.
At any moment, the GetBlocks service can be interrupted by calling a service
named Stop
.
Execution example¶
Assume that the GetBlocks
service runs at a period of, say, 250ms (defined
in the dotgen file). So every 250ms, it calls the getAudioData
function,
either to request a new block or to complete the current block if the previous
call could not return a full one. Depending on the sample rate, if the requested
block size (parameter nFramesPerBlock
) is too small so that one block is
less than 250ms, then the client will eventually loose some frames. On the other
hand, if a block lasts more than 250ms, then the client will request data at a
rate higher than their update frequency, which should be fine.
This can be tested: assume that the sampling rate is 44100Hz. So, 250ms is 11025
frames. Calling GetBlocks
with nFramesPerBlock < 11025
leads to data
loss. Following is an example with the matlab-genomix client.
% The middleware, genomix, bass and basc should be running
% Connect to genomix
>> client = genomix.client
% Load the server BASS and the client BASC
>> bass = client.load('bass')
>> basc = client.load('basc')
% Connect the input port of BASC to the output port of BASS
>> basc.connect_port('Audio', 'bass/Audio')
% Start the acquisition (using default values)
rAcquire = bass.Acquire('-a', 'hw:1,0', 44100, 2205, 20)
%%% EXAMPLE 1: get one block of 2 seconds (88200 frames at 44100Hz)
>> basc.GetBlocks(1, 88200, 0)
% In the terminal where it runs, BASC prints:
%
% Requested 88200 frames, got 11025.
% Requested 77175 frames, got 11025.
% Requested 66150 frames, got 11025.
% Requested 55125 frames, got 11025.
% Requested 44100 frames, got 11025.
% Requested 33075 frames, got 11025.
% Requested 22050 frames, got 11025.
% Requested 11025 frames, got 11025.
% A new block is ready to be processed.
%
% Each line 'Requested N frames, got n.' indicates the number N of frames
% requested by BASC, and the number n it got in return. The component keeps
% requesting frames until it has formed a block of 88200 frames.
%%% EXAMPLE 2: get unlimited number of blocks, with nFramesPerBlock < 11025
>> rGetBlocks = basc.GetBlocks('-a', 0, 10000, 0)
% After a few seconds, BASC prints:
%
% Requested 10000 frames, got 10000.
% !!Lost 1025 frames!!
% A new block is ready to be processed.
% Requested 10000 frames, got 10000.
% !!Lost 1025 frames!!
% A new block is ready to be processed.
%
% At each attempt to get the following block, some frames are lost because
% BASC does not read the port of BASS often enough.
% Stop the running GetBlocks service
>> basc.Stop()
%%% EXAMPLE 3: get unlimited number of blocks, with nFramesPerBlock > 11025
>> rGetBlocks = basc.GetBlocks('-a', 0, 12000,0)
% Here, as BASC reads the port slighlty faster than its update rate, the
% retrieved block is sometimes incomplete, for instance:
%
% Requested 12000 frames, got 12000.
% A new block is ready to be processed.
% Requested 12000 frames, got 12000.
% A new block is ready to be processed.
% Requested 12000 frames, got 11550.
% Requested 450 frames, got 450.
% A new block is ready to be processed.
% Requested 12000 frames, got 12000.
% A new block is ready to be processed.
%
% The two consecutive lines 'Requested...' show that a first call only get
% 11550 frames, so the component makes a second request to get the remaining
% part of 12000 - 11550 = 450 frames.
% Stop the running GetBlocks service
>> basc.Stop()
% Kill the components
>> bass.kill()
>> basc.kill()
% Remove the used objects in Matlab
>> delete(bass)
>> delete(basc)
>> delete(client)
% The remaining processes (the middleware and genomix) can be killed
The getAudioData
function in charge of getting the requested block is
written in C in file codels/basc_read_codels.c
. The codels
of the GetBlocks
service are also written in this file, with comments to
explain the overall process followed by BASC.