INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC1/SC29/WG11N1512
MPEG 97

February 1997 / Sevilla

Source: Convenor of mpeg (iso/iec jtc1/sc29/wg11)
Status: Approved at 38th WG11 meeting
Subject: mpeg Public Release
Date: 21 February 1997

Highlights

The Moving Pictures Experts Groups (MPEG) met for the 38th time, in Seville, Spain, from 17-21 February 1997. The major topic of discussion was the object-based audiovisual coding standard MPEG-4, which will reach the state of 'Committee Draft' in November '97.

Among the 300 participants, representatives from authors' organizations and copyright management organizations were present to discuss Intellectual Property Rights issues related to using MPEG-4 encoded content. MPEG has recognized that dealing with 'content-related IPR issues' in an early stage is an important factor for the success of MPEG-4.

MPEG has started to build an MPEG-4 Player: a program that plays back audio and visual objects, coded according to the latest state of the MPEG-4 standard. The MPEG-4 player, which is to be made publicly available, will include several decoders for natural and synthetic audiovisual material, as well as one of the major innovations that MPEG-4 brings: the 'compositor'. The function of this compositor is to arrange, according to 'composition information' carried in the bitstream, the decoded aural and visual material. This compositing applies to space (sounds with a virtual location, the placing of visual objects on the screen) and time (synchronization). The first working Player is expected to be ready in the 2nd half of '97. A complementary 'Recorder' will also be built, to be completed half a year later. Some encoded material for use with the player will, however, start to be available when the player is ready. The Player and Recorder will be built using software donated by participating companies.

The ongoing work on requirements for MPEG-4 shows an increasing interest from broadcasters in MPEG-4, who think that the object-paradigm of MPEG-4 can give them more functionality. They are interested in such features as local advertisement insertion; these advertisements could even be objects inserted in an existing program. Their interest goes beyond traditional TV broadcast, as MPEG-4 offers the possibility to do information broadcasts in which the user can decide for himself what audio and visual elements should be presented. An example: a user can choose between the Spanish and the English news reader, and whether or not they want stock exchange information running on the bottom of their screen. The broadcasters also believe that they can achieve similar quality at lower bitrates through the usage of the object-based tools.

MPEG was pleased to receive the details of a proposal from MIT's Media Lab, registered at the preceding meeting, for new tools addressing synthetic audio. The tools allow generating synthetic speech and music, using symbolic representations. The tools will be evaluated by the audio group, along with the tools for natural audio.

MPEG was also pleased with Microsoft's donation of its C++ implementation of MPEG-4 encoder software, releasing copyright to ISO. Decoder software had already been made available, both by the European ACTS MoMuSys project (in C) and Microsoft (in C++).

A framework for parametric scene description has been designed, extending the VRML specification, to allow 2D and 3D composition of streamed audiovisual information.

The DSM-CC Multimedia Integration Framework (DMIF) has aligned its work with the rest of MPEG-4. The DMIF work will now be carried out in two phases, the first phase (signaling for MPEG-4) to be completed in 1998, the second phase (extended functionalities) in 1999.

Parallel to the at this stage very intensive work on MPEG-4, the experts group devoted some meeting time to further defining the goals of a new MPEG-standard, called MPEG-7. MPEG-7 will specify multimedia content descriptors, to allow fast and efficient search for that content. The work item is attracting new people to MPEG, and more experts are welcomed. During the next meeting of MPEG at Bristol University, a seminar on MPEG-7 will be held, with invited experts from the field of multimedia content retrieval. This seminar will take place on the 9th of April. The seminar is open to all interested people. A description of MPEG-7 can be found at: http://drogo.cselt.it/mpeg/mpeg_7.htm

The second edition of the MPEG-2 Audio Standard, 13818-3, was approved. The second edition incorporates additional beneficial features identified during the early stages of usage of the Standard and clarifies details that might previously have caused some confusion to the users.

Details

The remainder of this document gives information organized according to the different subgroups. It expends on the highlights mentioned above.

Video

At the Sevilla meeting, the first results of coding interlaced video with the MPEG-4 video compression have been presented. The goal is to allow coding arbitrarily shaped objects at high resolution and recorded with line interlace, the most common format of video nowadays.

For the coding of objects with arbitrary shape, a further compression can be expected. It has not yet been accepted for the final standard because the complexity of the different proposals has to be investigated. It is a major concern in MPEG-4 that the specification has a reasonable complexity.

The ability to cope with transmission errors has been enhanced, by putting temporal references right after resynchronisation markers. The compression of the transparency of video objects ('alpha channels') has been improved, especially in the case that these are not perfectly opaque.

Before the meeting in Spain, a number of so-called "core experiments" have been carried out. In the time between this meeting in Spain and the next meeting in April 1997, some new core experiments on promising algorithms will be carried out. If at least two participants find, independently from one another, an important improvement, the new technique will be considered for inclusion into the standard. In this way, the highest performance of the final standardized specification is guaranteed.

The meeting has released the second version of the Working Draft of MPEG-4 "visual", a draft of the final standard. The name "visual" has been chosen to cover the standardized tools for coding of natural video and synthetically generated input. The final standard shall have the official state of "Committee Draft" in November 1997. For its common experiments, the video group continues developing a reference encoder, called "Verification Model" (VM).

Systems

In the area of composition, a framework for parametric scene description has been completed. Its starting point is the VRML specification, that is extended to allow 2D and 3D composition of streamed audiovisual information. This framework takes into account the new content streamed formats designed by MPEG-4 experts such as arbitrarily shaped natural images, synthetics objects such as facial and body animations and natural and synthetic audio. The framework also includes the specification of a bit efficient binary format for scene transmission (the BIFS), which will be the final format standardized.

MPEG-4 is designing its multiplex to operate in a world in which the amount of different networks is growing. This means that heterogeneous digital infrastructures to transport encoded audiovisual information should be supported. The Systems group will develop an intermediate layer (the FlexMux), designed to adapt MPEG-4 streaming data in a flexible manner to the infrastructure. Whenever possible, interfaces to the existing transport layers will be defined.

A challenging activity has been set up in order to achieve in July a demonstration of what could be the added value of the standardization of a set of APIs. Major actors (Sun, Intel, ...) are actively working on this subject. Positive results may lead to the standardization of these APIs in the November '97 Committee Draft.

Audio

MPEG-2 Audio

The second edition of the MPEG-2 Audio Standard, 13818-3, was approved. The edition incorporates additional beneficial features identified during the early stages of usage of the Standard and clarifies details that might previously have caused some confusion to the users.

A comprehensive review of the MPEG-2 Advanced Audio Coding Draft International Standard resulted in revisions, and it is fully expected that the Standard will be confirmed during the April 1997 meeting. A set of guidance notes explaining the purpose of the second edition has been prepared. These notes will be available on the MPEG web site, http://drogo.cselt.it/mpeg/

The Advanced Audio Coding DIS was further developed during the meeting in line with early National Body comments that were available at the start of the Sevilla meeting. These are to be carried through to the Bristol meeting, April 1997, where formal amalgamation and approval will take place. Test plans have been prepared to establish the mono and stereo performance of AAC, as well as that of the scaleable sampling rate profile.

The supporting AAC Technical Report (providing software modules) is progressing well and will soon be available to the community at large.

MPEG-4 Audio work continues apace, and new versions of the Working Draft and software Verification Model have been prepared.

A number of core experiments have been successfully completed, advancing the technical work on MPEG-4 audio. Examples of this technical improvement are the improved quality for the CELP- based speech coder core and several improvements adding scalability.

A total of seven audio-relevant responses were received to the MPEG-4 Call for Proposals. A number of these have been identified as being suitable for core experiment evaluations. The remaining proposals will be subject to pre-screening tests and if shown to be sufficiently good will then be subjected to more rigorous tests.

Functionalities and tools for MPEG-4 Audio

New versions of the MPEG-4 Audio Working Draft and Verification Model were developed at this meeting.

MPEG-4 audio coding integrates the worlds of synthetic and natural coding of audio. The synthetic coding part is comprised of tools for the realization of symbolically defined music and speech. This includes MIDI and Text-to-Speech systems. Furthermore, tools for the 3-D localization of sound are included, allowing the creation of artificial sound environments using artificial and natural sources.

Synthetic audio is described by first defining a set of 'instrument' modules that can create and process audio signals under the control of a script or score file. An instrument is a small network of signal processing primitives that can emulate the effects of a natural acoustic instrument. A script or score is a time-sequenced set of commands that invokes various instruments at specific times to contribute their output to an overall music performance. Other instruments, serving the function of effects processors (reverberators, spatializers, mixers), can be similarly invoked to receive and process the outputs of the performing instruments. These actions can not only realize a music composition but can also organize any other kind of audio, such as speech, sound effects and general ambience. Likewise, the audio sources can themselves be natural sounds, perhaps emanating from an audio channel decoder, thus enabling synthetic and natural sources to be merged with complete timing accuracy.

MPEG-4 standardizes natural audio coding at bitrates ranging from 2 kbit/s up to 64 kbit/s. The presence of the MPEG-2 AAC standard within the MPEG-4 tool set will provide for compression of general audio at the highest quality. For the bitrates from 2 kbit/s up to 64 kbit/s, the MPEG-4 standard normalizes the bitstream syntax and decoding processes in terms of a set of tools. In order to achieve high audio quality within the full range of bitrates and at the same time provide the extra functionalities, three types of coder have been defined. The lowest bitrate range between about 2 and 6 kbit/s, mostly used for speech coding at 8 kHz sampling frequency, is covered by parametric coding techniques. Coding at the medium bitrates between about 6 and 24 kbit/s uses Code Excited Linear Predictive (CELP) coding techniques. In this region, two sampling rates, 8 and 16 kHz, are used to support a broader range of audio signals (other than speech). For the bitrates typically starting at about 16 kbit/s, time to frequency coding techniques are applied. The audio signals in this region typically have bandwidths starting at 8 kHz.

A number of functionalities are provided to facilitate a wide variety of applications which could range from intelligible speech to high quality multichannel audio. Examples of the functionalities are speed control, pitch change, error resilience and scalability in terms of bitrate, bandwidth, error robustness, complexity, etc. as defined below.

To allow for smooth transitions between the bitrates and to allow for bitrate and bandwidth scalability, a general framework has been defined. This is illustrated in figure 1.

Figure 1

By adding enhancements to a coder operating at a low bitrate, both the coding quality and the audio bandwidth can be improved. These enhancements are realized within a single coder or alternatively by combining different techniques.


Background information

The next MPEG meeting will take place in Bristol, UK, from 7 to 11 April 1997.

For further information on this press release and MPEG work in general, please contact:

Dr. Leonardo Chiariglione, (Convenor of MPEG)
CSELT
Via G. Reiss Romoli, 274
10148 Torino, ITALY
Tel.: +39 11 228 6120; Fax: +39 11 228 6299
Email: leonardo.chiariglione@cselt.it

or refer to the MPEG homepage:

http://drogo.cselt.it/mpeg