Audio streaming

BASS, an audio streaming server component

BASS is a GenoM3 component in charge of acquiring binaural audio data from a hardware sound interface, and making it available to other components, henceforth termed as its clients. It relies on the ALSA library to communicate with the hardware interface, hence working with any ALSA-compliant interface.

The component offers services to start and stop the acquisition of audio data, and streams the captured data on an output port. A sliding window of the most recent data is kept on the port, the size of which can be set at runtime (for instance, the port can be configured so as to keep the last 2 seconds of acquired signals).

The folder containing source files of the component is named bass-genom3 and is located in the RoboticPlatform folder of the software repository. All files that are referred to in this section are in the bass-genom3 folder.

BASS terminology

This section defines the notions and the vocabulary that BASS uses.

Interface and device
They are synonyms for the hardware board in charge of converting analog sound signals into digital streams.
Acquisition and capture
They are synonyms for retrieving audio data from microphones through an audio interface.
Binaural audio and channels
Binaural audio signals consist of two channels (like stereo audio), corresponding to left and right ears.
Samples and frames
A sample is a digital value encoding the signal on one channel at one point in time. A frame is a vector of samples, one from each channel, at one point in time (thus for binaural audio, a frame is two samples).
Chunks
In capturing state, the sound device regularly delivers blocks of new data to the BASS component. These blocks are called chunks. The size of these chunks (commonly given in number of frames) can be selected at the start of a new acquisition, and is fixed until its end.

Note

The above definitions can differ from other applications where the word frame may refer to data blocks of a few milliseconds. Here, these blocks are rather called chunks, a frame being a single point in time.

Services

The services offered by BASS are defined and documented in the description file bass.gen. This section lists them and provides additional details.

  • The ListDevices service can be called to display on standard output stream (stdout) the available sound devices that can be selected for the acquisition.

  • The Acquire service starts the acquisition of audio data and updates the output port with the captured data (see details about the port in section Output port). This service expects 4 input parameters, shown in Table 1.

    Table 1 Input parameters of the Acquire service of BASS
    Name Data type Default value Documentation
    device string "hw:1,0" ALSA name of the sound device
    sampleRate unsigned long 44100 Sample rate in Hz
    nFramesPerChunk unsigned long 2205 Chunk size in frames
    nChunksOnPort unsigned long 20 Port size in chunks

    The device parameter is the identifier of a sound device to use. The value for one connected device can be retrieved with the aforementioned ListDevices service. The nFramesPerChunk parameter is important, as smaller chunks will lead to shorter latency but also higher communication needs between the component and the device. Last, the nChunksOnPort parameter sets the number of chunks kept on the port. With the default values given above, 20 chunks of 2205 frames is a total of 44100 frames kept on the port, i.e. 1 second of audio data at the default sample rate.

    The Acquire service can return an exception if the configuration of the interface fails (e.g. the requested sample rate is not supported), if a problem occurs during the acquisition (e.g. the interface gets unplugged), etc. If an exception occurs, the user can get more information by reading the error message flushed on standard error stream (stderr).

  • The Stop service stops the acquisition of audio data. Note that the Acquire service also interrupts itself, so a new acquisition with different parameters can directly be started from a running one without having to call Stop.

Output port

The data captured by the Acquire service are streamed on an output port named Audio (defined in file bassInterface.gen). They are gathered in two arrays, one for each channel, updated with a FIFO design: every time a new chunk is retrieved from the hardware interface, the content of the arrays is shifted, deleting the oldest chunk of data and making room to the newest one, as detailed below.

At the beginning of the acquisition, the arrays are filled with zeros and the first captured chunks are progressively added. For instance, the state of one array before and after adding the 4th chunk is illustrated here (assuming that the port is longer than 4 chunks):

+-------+-------+-------+-------+-------+-------+-------+-------+
|                 zeros                 |   1   |   2   |   3   |
|                                       |       |       |       |
+-------+-------+-------+-------+-------+-------+-------+-------+
                                         /       /       /
                                        /       /       /
                                       /       /       /
+-------+-------+-------+-------+-------+-------+-------+-------+
|             zeros             |   1   |   2   |   3   |   4   |
|                               |       |       |       | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+

The length of the port is a round number of chunks, set with parameter nChunksOnPort of the Acquire service (noted \(nCOP\) below). The size of one chunk is also set when calling Acquire, with parameter nFramesPerChunk (noted \(nFPC\) below). Thus, the left and right arrays contain \(nFPC*nCOP\) samples each. Once the port is entirely filled with data (all beginning zeros have been erased), the oldest chunk is deleted as a new chunk arrives:

+-------+-------+-------+-------+-------+-------+-------+-------+
|   1   |   2   |   3   |          ...          | nCOP-1|  nCOP |
| (old) |       |       |                       |       |       |
+-------+-------+-------+-------+-------+-------+-------+-------+
         /       /                                       /
        /       /                                       /
       /       /                                       /
+-------+-------+-------+-------+-------+-------+-------+-------+
|   2   |   3   |   4   |          ...          |  nCOP | nCOP+1|
| (old) |       |       |                       |       | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+
         /       /                                       /
        /       /                                       /
       /       /                                       /
+-------+-------+-------+-------+-------+-------+-------+-------+
|   3   |   4   |   5   |          ...          | nCOP+1| nCOP+2|
|       |       |       |                       |       | (new) |
+-------+-------+-------+-------+-------+-------+-------+-------+

In order to let the clients keep track of the data and detect any loss, the port also publishes an index that indicates the number of frames that have been streamed since the beginning of the acquisition. In other words, it is the index of the last streamed frame, noted lastFrameIndex. The data structure of the Audio output port (defined in file bassStruct.idl) is summarised in Table 2.

Table 2 Data structure of the Audio output port of BASS
Name Data type Comment
sampleRate unsigned long sample rate in Hz
nChunksOnPort unsigned long number of chunks on the port (\(nCOP\))
nFramesPerChunk unsigned long number of frames per chunk (\(nFPC\))
lastFrameIndex unsigned long long index for tracking data
left sequence<long> audio data from left channel
right sequence<long> audio data from right channel
  • The fields sampleRate, nChunksOnPort and nFramesPerChunk are set as input parameters of the Acquire service.
  • The fields left and right are dynamic arrays (sequence<long> type) of \(nFPC*nCOP\) samples. Samples are signed integers coded on 32 bits (long type).
  • The index for tracking data is stored in the field lastFrameIndex. As this index is incremented by \(nFPC\) frames every time a new chunk is published on the port, it is important to check that it will not overflow. The index is therefore coded as an unsigned integer on 64 bits (unsigned long long type. With a sample rate of 192kHz for instance, the order of magnitude of the index overflow is a million years).

Example of use

The tutorial Stream binaural signals from BASS to Matlab is an example of use of BASS, using the matlab-genomix bridge. It shows how to invoke its services and how to retrieve the streamed audio data in Matlab.

Writing a client of BASS

This section provides information about designing clients of the BASS component. It first defines a formal algorithm that clients could use, and shows a sample implementation in a GenoM3 component called BASC.

An algorithm for clients of BASS

A client of BASS can face many different situations:

  • It may need just a single block of data (e.g. we could think of a client requesting 2 seconds of audio data to learn noise properties), and the block may be longer than the total size of the port.
  • It may indefinitely request new blocks of data, with the requirement that they follow each other without frame loss between two consecutive blocks.
  • It may request data faster than the port update rate. Or conversely, it may not read the port often enough, leading to data loss. And in case of data loss, it must know how many frames were lost.

Let us define \(nFOP\) as the total number of frames on the port. The data structure used on the Audio port, of type portStruct, is recalled in Table 3.

Table 3 Formal definition of the data type portStruct used on the Audio port of BASS
Field Data type Comment
sampleRate unsigned integer sample rate in Hz
nChunksOnPort unsigned integer number of chunks on the port (\(nCOP\))
nFramesPerChunk unsigned integer number of frames per chunk (\(nFPC\))
lastFrameIndex unsigned integer index for tracking data
left[nFOP] array of integers arrays of \(nFOP\) samples, with \(nFOP = nFPC * nCOP\)
right[nFOP] array of integers

The left and right fields are arrays of \(nFOP\) samples each, updated as FIFOs (c.f. related section in BASS documentation). What the client need is to copy blocks of given size \(N\) from these arrays. Let us define a function getAudioData, taking \(N\) as input and returning one copied block.

In order not to miss any frame between blocks retrieved with two consecutive calls, the getAudioData function also takes both as input and output an index of the next frame to be read, noted \(nfr\). For instance, the client get a first block of \(N\) frames starting from a given index \(nfr\). The function must return, along with the block of data, the new value of the index, corresponding to the next frame right after the first retrieved block. Then, the client can call the function again with this new value for \(nfr\), so as to get the second block starting from this point.

For the very first block, the client can choose \(nfr\) according to the current value of the lastFrameIndex.

  • If it wants to pick data from the existing frames on the port, \(nfr\) is chosen to be less than lastFrameIndex.
  • If it wants to get fresher data (frames that are not yet on the port but will be published shortly on it), \(nfr\) is chosen to be greater than lastFrameIndex.

With a call to getAudioData, the client requests a block of given size \(N\), but the function may not be able to return a full block of \(N\) frames. Indeed, the ending point for the desired block may be a frame that is not yet published by the server. So, the function returns the number \(n\) of frames it can get (\(n \le N\)). The client can then call the function again and ask for the remaining frames in a loop until the requested block is complete. Here are examples of when this can occur:

  • The client requests data more often than they are captured by the microphones (which should be the regular case, because a slower client will end up losing data).
  • The client requests a block which is longer than the total number of frames on the port (\(N > nFOP\)).
  • At first call, if the client sets \(nfr\) to a greater value than lastFrameIndex, then getAudioData will not return any available frame (\(n = 0\)). But the client can keep calling the function in a loop until it gets the requested frames.

Finally, in case of data loss, getAudioData also returns the number of frames that were lost. The retrieved block then starts at the first frame that is still available on the port (the oldest one).

Below is the getAudioData function written with formal syntax:

function: getAudioData
|
| inputs:  integer     N     (number of frames the client wants to get)
|          portStruct  Audio (data from the the output port of bass)
|
| outputs: integer     n     (number of frames the function was able to get)
|          integer     loss  (number of lost frames, 0 if no loss)
|          array(int)  l[n]  (the retrieved data block from left channel)
|          array(int)  r[n]  (the retrieved data block from right channel)
|
| in&out:  integer     nfr   (index of the Next Frame to Read)
|
| local:   integer     nFOP  (total number of Frames On the Port)
|          integer     lfi   (Index of the Last Frame on the port)
|          integer     ofi   (Index of the Oldest Frame on the port)
|          integer     pos   (current position in the left and right arrays)
|
| algorithm
|  |
|  | nFOP <- Audio.nFramesPerChunk * Audio.nChunksOnPort
|  | lfi <- Audio.lastFrameIndex
|  | ofi <- max(0, lfi - nFOP + 1) //if the acquisition just started and the
|  |                               //port is not full yet, ofi equals 0
|  | /* Detect a data loss */
|  | loss <- 0
|  | if (nfr < ofi)
|  |  | loss <- ofi - nfr
|  |  | nfr <- ofi
|  | end if
|  |
|  | /* Compute the starting position in the left and right input arrays */
|  | pos <- nFOP - (lfi - nfr + 1)
|  |
|  | /* Fill the output arrays l and r */
|  | n <- 0
|  | while (n < N AND pos < nFOP)
|  |  | l[n] <- Audio.left[pos]
|  |  | r[n] <- Audio.right[pos]
|  |  | n <- n + 1
|  |  | pos <- pos + 1
|  |  | nfr <- nfr + 1
|  | end while
|  |
|  | return (l[], r[], n, nfr, loss)
|  |
| end algorithm
|
end function

Sample implementation in a GenoM3 component

BASC is a GenoM3 component that acts as a client of BASS. It can be connected to the output port of BASS, and implements the above generic algorithm to retrieve blocks of audio signals. BASC does not perform any processing on the data. It only shows what a GenoM3 client of BASS could look like.

The folder containing source files of the component is named basc-genom3 and is located in the RoboticPlatform folder of the software repository. All files that are referred to in this section are in the basc-genom3 folder.

Services

BASC runs the above algorithm in a service called GetBlocks, defined in description file basc.gen. It expects the following input parameters:

Name Data type Default value Documentation
nBlocks unsigned long 1 Amount of blocks, 0 for unlimited
nFramesPerBlock unsigned long 12000 Block size in frames
startOffs long -12000 Starting offset (past < 0, future > 0)

The first parameter nBlocks sets the number of blocks that the service must get, or can be set to 0 to run the service indefinitely. The second parameter nFramesPerBlock sets the block size (\(N\) in the previous section). Last, the startOffs parameter sets the index of the first frame to read, relatively to the current value of the Last Frame Index on the port. For instance, if startOffs is -1000 and lastFrameIndex is 43000 when the service is called, then the first frame to read (\(nfr\) in previous section) will have index 42000. With the given default values, GetBlocks will get 1 block of 12000 frames, taking the last 12000 frames on the port.

At any moment, the GetBlocks service can be interrupted by calling a service named Stop.

Execution example

Assume that the GetBlocks service runs at a period of, say, 250ms (defined in the dotgen file). So every 250ms, it calls the getAudioData function, either to request a new block or to complete the current block if the previous call could not return a full one. Depending on the sample rate, if the requested block size (parameter nFramesPerBlock) is too small so that one block is less than 250ms, then the client will eventually loose some frames. On the other hand, if a block lasts more than 250ms, then the client will request data at a rate higher than their update frequency, which should be fine.

This can be tested: assume that the sampling rate is 44100Hz. So, 250ms is 11025 frames. Calling GetBlocks with nFramesPerBlock < 11025 leads to data loss. Following is an example with the matlab-genomix client.

% The middleware, genomix, bass and basc should be running

% Connect to genomix
>> client = genomix.client

% Load the server BASS and the client BASC
>> bass = client.load('bass')
>> basc = client.load('basc')

% Connect the input port of BASC to the output port of BASS
>> basc.connect_port('Audio', 'bass/Audio')

% Start the acquisition (using default values)
rAcquire = bass.Acquire('-a', 'hw:1,0', 44100, 2205, 20)

%%% EXAMPLE 1: get one block of 2 seconds (88200 frames at 44100Hz)
>> basc.GetBlocks(1, 88200, 0)

% In the terminal where it runs, BASC prints:
%
%    Requested  88200 frames, got  11025.
%    Requested  77175 frames, got  11025.
%    Requested  66150 frames, got  11025.
%    Requested  55125 frames, got  11025.
%    Requested  44100 frames, got  11025.
%    Requested  33075 frames, got  11025.
%    Requested  22050 frames, got  11025.
%    Requested  11025 frames, got  11025.
%    A new block is ready to be processed.
%
% Each line 'Requested N frames, got n.' indicates the number N of frames
% requested by BASC, and the number n it got in return. The component keeps
% requesting frames until it has formed a block of 88200 frames.

%%% EXAMPLE 2: get unlimited number of blocks, with nFramesPerBlock < 11025
>> rGetBlocks = basc.GetBlocks('-a', 0, 10000, 0)

% After a few seconds, BASC prints:
%
%    Requested  10000 frames, got  10000.
%    !!Lost 1025 frames!!
%    A new block is ready to be processed.
%    Requested  10000 frames, got  10000.
%    !!Lost 1025 frames!!
%    A new block is ready to be processed.
%
% At each attempt to get the following block, some frames are lost because
% BASC does not read the port of BASS often enough.

% Stop the running GetBlocks service
>> basc.Stop()

%%% EXAMPLE 3: get unlimited number of blocks, with nFramesPerBlock > 11025
>> rGetBlocks = basc.GetBlocks('-a', 0, 12000,0)

% Here, as BASC reads the port slighlty faster than its update rate, the
% retrieved block is sometimes incomplete, for instance:
%
%    Requested  12000 frames, got  12000.
%    A new block is ready to be processed.
%    Requested  12000 frames, got  12000.
%    A new block is ready to be processed.
%    Requested  12000 frames, got  11550.
%    Requested    450 frames, got    450.
%    A new block is ready to be processed.
%    Requested  12000 frames, got  12000.
%    A new block is ready to be processed.
%
% The two consecutive lines 'Requested...' show that a first call only get
% 11550 frames, so the component  makes a second request to get the remaining
% part of 12000 - 11550 = 450 frames.

% Stop the running GetBlocks service
>> basc.Stop()

% Kill the components
>> bass.kill()
>> basc.kill()

% Remove the used objects in Matlab
>> delete(bass)
>> delete(basc)
>> delete(client)

% The remaining processes (the middleware and genomix) can be killed

The getAudioData function in charge of getting the requested block is written in C in file codels/basc_read_codels.c. The codels of the GetBlocks service are also written in this file, with comments to explain the overall process followed by BASC.