INTERNATIONAL
ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE
DE NORMALISATION
ISO/IEC JTC 1/SC 29/WG
11
CODING OF MOVING
PICTURES AND AUDIO
ISO/IEC JTC 1/SC 29/WG 11 N7464
Poznań, PL – July 2005
|
Source: |
Audio |
|
Title: |
Description of Spectral Band Replication |
|
Status: |
Approved |
Introduction
MPEG-4 SBR, (Spectral Band Replication) is a bandwidth extension tool used in combination with e.g. the AAC general audio codec. When integrated into the MPEG AAC codec, a significant improvement of the performance is available, which can be used to lower the bitrate or improve the audio quality. This is achieved by replicating the highband, i.e. the high frequency part of the spectrum. A small amount of data representing a parametric description of the highband is encoded and used in the decoding process. The data rate is by far below the data rate required when using conventional AAC coding of the highband.
The Spectral Bandwidth Extension tool is combination with AAC –LC forms the High Efficiency AAC Profile, and the SBR Tool in combination with AAC-LC and the MPEG-4 Parametric stereo tool forms the High Efficiency AAC v2 profile.
The SBR tool comes in two versions, SBR-HQ (High Quality) and SBR-LP (Low Power).
Motivation
The tool enables full bandwidth audio coding at arbitrary bitrates.
Overview of technology
In traditional perceptual audio coding, quantisation noise is added to the audio signal. Assuming a sufficiently high bitrate, the inserted quantisation noise will be kept under the masking threshold and therefore be inaudible (see Figure 1 a). At reduced bitrates, this masking threshold will be violated (see Figure 1 b). Coding artifacts become audible. Thus, if bitrate is restricted, usually audio bandwidth will be limited (see Figure 1 c). The result will sound duller, but cleaner.
![]() |
![]() |
![]() |
![]() |
![]() |
Figure 1, The SBR principle
With the SBR tool, the following is carried out:
Thus, a significant bitrate reduction is achieved while maintaining good audio quality, or alternatively an improved audio quality is achieved while maintaining the bitrate.
Thus, the SBR principle stipulates that the missing high frequency region of a low pass filtered signal can be recovered based on the existing low pass signal and a small amount of control data. The required control data is estimated in the encoder given the original wide-band signal. The combination of SBR with a core coder (in this example AAC-LC as defined by the High Efficiency AAC profile) is a dual rate system, where the underlying AAC encoder/decoder is operated at half the sampling rate of the SBR encoder/decoder. The basic principle of the this encoder is depicted in Figure 2.

Figure 2, The SBR encoder
In the SBR encoder, where the wide band signal is available, control parameters are estimated in order to ensure that the high frequency reconstruction results in a reconstructed highband that is perceptually as similar as possible to the original highband. The majority of the control data is used for a spectral envelope representation. The spectral envelope information has varying time and frequency resolution to be able to control the SBR process as good as possible, with as little bitrate overhead as possible. The other control data mainly strives to control the tonal-to-noise ratio of the highband.
The SBR enhanced decoder can be roughly divided into the modules depicted in REF _Ref110225549 Figure
3.
Figure 3, The SBR decoder
All SBR processing is done in the QMF domain. Hence, the output from the underlying AAC decoder is firstly analyzed with a 32 channel QMF filterbank. Secondly, the HF generator module recreates the highband by patching QMF subbands from the existing lowband to the high band. Furthermore inverse filtering is done on a per QMF subband basis, based on the control data obtained from the bitstream. The envelope adjuster modifies the spectral envelope of the regenerated highband, and adds additional components such as noise and sinusoids, all according to the control data in the bitstream. Since all operations are done in the QMF domain the final step of the decoder is a QMF synthesis to retain a time-domain signal. Given that the QMF analysis is done on 32 QMF subbands for 1024 time-domain samples, and the high frequency reconstruction results in 64 QMF subbands upon which the synthesis is done producing 2048 time-domain samples, an up-sampling by a factor of two is obtained.
Target applications
The technology is suited for any application where the full audio bandwidth cannot be sufficiently well coded by a wave-form coder. This makes it an excellent tool for application such as digital radio transmission, such as Digital Radio Mondial, digital TV transmission such as DVB, and mobile music services such as streaming and music download services, as well as internet streaming.
The following figure illustrates the performance of the SBR tool when combined with AAC-LC, i.e. the High Efficiency AAC profile. The figure is taken from the formal verification test report (WG11/N6009).

Figure 4, Test results
The test shows the performance of SBR in combination with AAC at 32kbps and 48kbps compared to MPEG-4 AAC at 48 and 60kbps. It is clear that SBR at 48kbps is equal to AAC at 60kbps, and that SBR at 32kbps is significantly better than AAC at 48kbps.