Natural Speech Coding Tools
Overview of the MPEG-4 Natural Speech Coding Tools
MPEG-4 Natural Speech Coding Tool Set [10] provides a generic coding framework for a wide range of applications with speech signals. Its bitrate coverage spans from as low as 2 kbit/s to 23.4 kbit/s. Two different bandwidths of the input speech signal are covered, namely, 4 kHz and 7 kHz. MPEG-4 Natural Speech Coding Tool Set contains two algorithms: HVXC (Harmonic Vector eXcitation Coding) and CELP (Code Excited Linear Predictive coding). HVXC is used at a low bitrate of 2 or 4 kbit/s. Higher bitrates than 4 kbit/s in addition to 3.85 kbit/s are covered by CELP. The algorithmic delay by either of these algorithms is comparable to that of other standards for two-way communications, therefore, MPEG-4 Natural Speech Coding Tool Set is also applicable to such applications. Storage of speech data and broadcast are also promising applications of MPEG-4 Natural Speech Coding Tool Set. The specifications of MPEG-4 Natural Speech Coding Tool Set are summarized in Tab. 1.
Table I: Specifications of MPEG-4 Natural Speech Coding Tools HVXC Sampling Frequency 8 kHz Bandwidth 300 - 3400 Hz Bitrate [bit/s] 2000 and 4000 Frame Size 20 ms Delay 33.5 - 56 ms Features Multibitrate Coding,
Bitrate ScalabilityCELP Sampling Frequency 8 kHz 16 kHz Bandwidth 300 - 3400 Hz 50 - 700 Hz Bitrate [bit/s] 3850 - 12200
28 Bitrates10900 - 23800
30 BitratesFrame Size 10 - 40 ms 10 - 20 ms Delay 15 -45 ms 15 - 26.75 ms Features Mulibitrate Coding,
Bitrate Scalability,
Bandwidth Scalability
MPEG-4 is based on tools each of which can be combined according to the user needs. HVXC consists of LSP (line spectral pair) VQ (vector quantization) tool and harmonic VQ tool. RPE (regular pulse excitation) tool, MPE (multipulse excitation) tool, and LSP VQ tool form CELP. RPE tool is allowed only for the wideband mode because of its simplicity at the expense of the quality. LSP VQ tool is common in both HVXC and CELP. MPEG-4 Natural Speech Coding Tools are illustrated in Fig. 6.
Figure 6: MPEG-4 Natural Speech Coding Tool Set
![]()
Functionalities of MPEG-4 Natural Speech Coding Tools
MPEG-4 Natural Speech Coding Tools are different from other existing speech coding standards such as ITU-T G.723.1 and G.729 in the following three new functionalities: multibitrate coding (An arbitrary bitrate may be selected with a 200 bit/s step by simply changing the parameter values), bitrate scalable coding, and bandwidth scalable coding. Actually, these new functionalities characterize MPEG-4 Natural Speech Coding Tools. It should be noted that the bandwidth scalability is available only for CELP.
Multibitrate Coding
Multibitrate coding provides flexible bitrate selection with the same coding algorithm. It has not been available and different codecs were needed for different bitrates. In multibitrate coding, a bitrate is selected among multiple available bitrates upon establishment of a connection between the communicating parties. The bitrate for CELP may be selected with as small a step as 0.2 kbit/s. The frame length, the number of subframes per frame, and selection of the excitation codebook are modified for different bitrates [11]. For HVXC, 2 or 4 kbit/s can be selected as the bitrate.
In addition to multibitrate coding, bitrate control with a smaller step of the bitrate is available for CELP by fine-rate control (FRC). In addition to multibitrate coding, some additional bitrates not available by multibitrate coding are provided by FRC. The bitrate may bedeviated frame by frame from a specified bitrate according to the input-signal characteristics. When the spectral envelope, approximated by the LP synthesis filter, has small variations in time, transmission of liner-prediction coefficients may be skipped once every two frames for a reduced average bitrate [12].
Scalable Coding
Bitrate and bandwidth scalabilities are useful for multicast transmission. The bitrate and the bandwidth can be independently selected for each receiver by simply stripping off a part of the bitstream. Scalabilities necessitate only a single encoder to transmit the same data to multiple points connected at different rates. Such a case can be found in connections between a cellular network with mobile terminals and a digital network with fixed multimedia terminals as well as in multipoint teleconferencing. The encoder generates a single common bitstream by scalable coding for all the recipients instead of independent bitstreams at different bitrates.
The scalable bitstream has a layered structure with the core bitstream and enhancement bitstreams. The bitrate control is performed by adjusting the combination of the enhancement bitstreams depending on the specified bitrate. The core bitstream guarantees, at least, reconstruction of the original speech signal with a minimum speech quality. Additional enhancement bitstreams, which may be available depending on the network condition, will increase the quality of the decoded signal. HVXC and CELP may be used to generate the core bitstream when the enhancement bitstreams are generated by TwinVQ or AAC. They can also generate both the core and the enhancement bitstreams. Scalabilities in MPEG-4/CELP are depicted in Fig. 7.
Figure 7: Scalabilities in MPEG-4/CELP
![]()
Scalabilities include bitrate scalability and bandwidth scalability. These scalabilities reduce signal distortion or achieve better speech quality with high frequency components by adding enhancement bitstreams to the core bitstream. These enhancement bitstreams contain detailed characteristics of the input signal or components in higher frequency bands. For example, the output of Decoder A in Fig. 7 is the minimum-quality signal decoded from the 6 kbit/s core bitstream. The Decoder B output is a high-quality signal decoded from an 8 kbit/s bitstream. Decoder C provides a higher-quality signal decoded from a 12 kbit/s bitstream. On the other hand, the Decoder D output has a wider bandwidth. This wideband signal is decoded from a 22 kbit/s bitstream. The high-frequency components of 10 kbit/s provides increased naturalness than Decoder C. Bandwidth scalability is provided only by the MPE tool. The unit bitrate for the enhancement bitstreams in bitrate scalability is 2 kbit/s for the narrowband and 4 kbit/s for the wideband. In case of bandwidth scalable coding, the unit bitrate for the enhancement bitstreams depends on the total bitrate and is summarized in Tab. II.
Table II: Bandwidth Scalable Bitstreams Core Bitstream (bit/s) Enhancement Bitstream (bit/s) 3850 - 4650 9200, 10400, 11600, 12400 4900 - 5500 9467, 10667, 11867, 12667 5700 - 10700 10000, 11200, 12400, 13200 11000 - 12200 11600, 12800, 14000, 14800