previous | contents | next

MPEG-4/CELP with MPE

MPEG-4/CELP with MPE is the most complete combination of the tools in MPEG-4 Natural Speech Coding Tools. It provides all the three new functionalities. Therefore, it is useful to explain MPEG-4/CELP with MPE in more detail to show how these functionalities are realized in the algorithm.
 

Figure 10: MPEG-4/CELP with MPE

MPEG-4/CELP with MPE
 

A blockdiagram of the encoder of MPEG-4/CELP with MPE is depicted in Fig. 10. It consists of three modules; a CELP core encoder, a bitrate scalable (BRS) tool, and a bandwidth extension (BWE) tool. The CELP core encoder provides the basic coding functions which have been explained with Fig. 9. The BRS tool is used to provide the bitrate scalability.
 
The residual of the narrowband signal, mode information, LP coefficients, quantized LSP coefficients, and multipulse excitation signal are transferred from the core encoder to the BRS tool as the input signals. The BWE tool is used for the bandwidth scalability. Quantized LSP coefficients and the pitch delay indexes as well as the wideband speech to be encoded are supplied from the core encoder to the BWE tool. In addition to these input signals, the narrowband multipulse excitation is needed in the BWE tool. This excitation is supplied from either the BRS tool when the bitrate scalability is implemented, or from the core encoder. When the bandwidth scalability is provided, a downsampled narrowband signal is supplied to the core encoder. Because of this downsampling operation, an additional 5-ms look-ahead of the input signal is necessary for wideband signals.
 

CELP Core Encoder  TOP

Fig. 11 depicts a blockdiagram of the CELP core encoder. It performs LP analysis and pitch analysis on the input speech signal. The obtained LP coefficients, the pitch lag (phase or delay), the pitch and MPE gains, and the excitation signal are encoded as well as mode information. The LP coefficients in the LSP domain are encoded frame by frame by predictive VQ. The pitch lag is encoded subframe by subframe by adaptive codebooks. The MPE is modeled by multiple pulses whose positions and polarities (+/-1) are encoded. The pitch and the MPE gains are normalized by an average subframe power followed by multimode encoding [13]. The average subframe power is scalar-quantized in each frame.
 

Figure 11: CELP Core Encoder

CELP Core Encoder
 

LSP Quantization  TOP

A two-stage PPM-VQ (Partial Prediction and Multistage Vector Quantization) [16] is employed for LSP quantization. This quantizer, as shown in Fig. 12, operates either in the standard VQ mode or in the PPM-VQ mode which utilizes interframe prediction, depending on the quantization errors. The standard VQ mode operates as a common two-stage VQ which quantizes the error of the first stage in the second stage. On the contrary, in the PPM-VQ mode, the difference Tn between the input LSP Cn and its predicted output is quantized as in:
 

Formel 1

The second term of (1) is the predicted output which is obtained from the quantized output V1n of the first stage and the quantized LSP Qn of the first stage and the quantized LSP Qn-1 in the previous frame. ßp stands for the prediction coefficient and is set to 0.5 in MPEG-4/CELP.
 
PPM-VQ provides good coding quality in both stationary and nonstationary speech sections by appropriately selecting the predictive VQ or the standard VQ. Transmission-error propagation dies out quickly because prediction is employed only in the second stage. The number of LSPs is 10 for the narrowband and 20 for the wideband case. Because the wideband mode has twice as many parameters, two narrowband quantizers connected in parallel are used; one for the first 10 parameters and the other for the rest, respectively. The number of bits used for LSP quantization is 22 for the narrowband and 46 for the wideband (25 for the first ten coefficients and 21 for the rest). The codebook has 1120 words for the narrowband and 2560 words for the wideband.
 

Figure 12: PPM-VQ (Partial Prediction and Multistage Vector Quantization)

PPM-VQ
 

Multipulse Excitation  TOP

The multipulse excitation µn has L pulses as in:

Formel 2

where N stands for the subframe size, and mi and smi are the position and the magnitude of the i-th pulse, respectively. The pulse position is selected from Mi candidates which are defined by the Algebraic code [17][18] for each pulse. The pulse magnitude is represented only by its polarity for bit reduction. Such a simplified excitation model contributes to reduced computations compared with conventional CELP codebooks at a low bitrate with a small number of pulses. On the other hand, reduction of computations is necessary for a high bitrate with more available pulses. For example, MPE encoding by tree search [19] provides easy bitrate control by adjusting the number of pulses. Efficient coding techniques by combination search of the pulse position and polarity and by VQ of the pulse polarity [13] may also be applied. These additional techniques help us avoid reduced quality and heuristic parameter setting caused by well-known preselection techniques and focused search [18] for the pulse position.
 

Bitrate Scalable (BRS) Tool  TOP

A blockdiagram of the BRS tool [20] is shown in Fig. 13. The actual signal to be encoded in the BRS tool is the residual, which is defined as the difference between the input signal and the output of the LP synthesis filter (local decode signal), supplied from the core encoder. This combination of the core encoder and the BRS tool can be considered as multistage encoding of the MPE. However, there is no feedback path for the residual in the BRS tool connected to the MPE in the core encoder. The excitation signal in the BRS tool has no in uence on the adaptive codebook in the core encoder. This guarantees that the adaptive codebook in the core decoder at any site is identical to that in the encoder (in terms of the codewords), which leads to the minimum quality degradation for the frame-by-frame bitrate change. The BRS tool adaptively controls the pulse positions so that none of them coincides with a position used in the core encoder. This adaptive pulse position control contributes to more efficient multistage encoding.
 

Figure 13: Bitrate Scalable (BRS) Tool

Bitrate Scalable (BRS) Tool
 

Bandwidth Extension (BWE) Tool  TOP

Fig. 14 exhibits a blockdiagram of bandwidth extension (BWE) tool [21]. The BWE tool is also a CELP-based encoder and encodes the frequency components which are not processed by the narrowband core encoder as well as a fraction of the narrowband components which have not been encoded. Quantized LSP coefficients and excitation signals of the narrowband components are supplied from the core encoder, in addition to the pitch delay index.
 

Figure 14: Bandwith Extension (BWE) Tool

Bandwith Extension (BWE) Tool
 

LSP Quantization  TOP

A blockdiagram of LSP quantization in the BWE tool is shown in Fig. 15. Predicted wideband LSPs are subtracted from the input LSPs and the residuals are vector-quantized. The indexes to the codevectors of the codebook are incorporated in the output bitstream. The vector-quantized residual is added to the predicted wideband LSPs to reconstruct quantized LSPs. These quantized LSPs are supplied to the LP synthesis filter. The predicted wideband LSPs, fwb(i) for i = 1,...,Nwb, are constructed by adding estimated wideband LSPs fest(i) for i= 1,...,Nwb to interframe prediction of the quantized residuals based on a moving-average as in:

Formel 3

where ap(i) is the interframe prediction coefficient and P is the prediction order. cp(i) is the quantized prediction residual in the p-th previous frame. Estimated wideband LSPs are obtained by scaling the quantized narrowband LSPs to the wideband as shown in the following equation with a scaling factor b(i).

Formel 4

fnb(i) represents the i-th quantized narrowband LSP. This algorithm provides better quantization precision for low-order LSPs (fwb(i), i = 1,...,Nnb) as well as for high-order LSPs (fwb(i), i = Nnb + 1,...,Nwb). This is because the residual LSPs to be vector-quantized contain narrowband LSP residuals which have not been taken care of in the narrowband core encoder.
 

Figure 15: LSP Quantization in the BWE Tool

LSP Quantization in the BWE Tool
 

Multipulse Excitation  TOP

The excitation signal in the bandwidth extension tool is represented by an adaptive codebook, two MPE signals, and their gains as shown in Fig. 14. The pitch delay of the adaptive codebook is searched for from the vicinity of its estimation obtained from the narrowband pitch-delay. One of the two MPE signals (MP1) is an upsampled version of the narrowband MPE signal and the other (MP2) is an exclusive MPE signal in the bandwidth extension tool. The adaptive codebook and the gains for MP2 are vector-quantized and the gains for MP1 are scalar-quantized. These quantizations are performed to minimize the perceptually weighted error.
 

TOP
previous | contents | next