ISO/IEC JTC1/SC29/WG11N1512
MPEG 97
February 1997 / Sevilla
| Source: | Convenor of mpeg (iso/iec jtc1/sc29/wg11) | |
| Status: | Approved at 38th WG11 meeting | |
| Subject: | mpeg Public Release | |
| Date: | 21 February 1997 |
Highlights
The Moving Pictures Experts Groups (MPEG) met for the 38th
time, in Seville, Spain, from 17-21 February 1997. The major
topic of discussion was the object-based audiovisual coding
standard MPEG-4, which will reach the state of 'Committee Draft'
in November '97.
Among the 300 participants, representatives from authors'
organizations and copyright management organizations were present
to discuss Intellectual Property Rights issues related to using
MPEG-4 encoded content. MPEG has recognized that dealing with
'content-related IPR issues' in an early stage is an important
factor for the success of MPEG-4.
MPEG has started to build an MPEG-4 Player: a program
that plays back audio and visual objects, coded according to the
latest state of the MPEG-4 standard. The MPEG-4 player, which is
to be made publicly available, will include several decoders for
natural and synthetic audiovisual material, as well as one of the
major innovations that MPEG-4 brings: the 'compositor'. The
function of this compositor is to arrange, according to
'composition information' carried in the bitstream, the decoded
aural and visual material. This compositing applies to space
(sounds with a virtual location, the placing of visual objects on
the screen) and time (synchronization). The first working Player
is expected to be ready in the 2nd half of '97. A
complementary 'Recorder' will also be built, to be completed half
a year later. Some encoded material for use with the player will,
however, start to be available when the player is ready. The
Player and Recorder will be built using software donated by
participating companies.
The ongoing work on requirements for MPEG-4 shows an
increasing interest from broadcasters in MPEG-4, who think that
the object-paradigm of MPEG-4 can give them more functionality.
They are interested in such features as local advertisement
insertion; these advertisements could even be objects inserted in
an existing program. Their interest goes beyond traditional TV
broadcast, as MPEG-4 offers the possibility to do information
broadcasts in which the user can decide for himself what audio
and visual elements should be presented. An example: a user can
choose between the Spanish and the English news reader, and
whether or not they want stock exchange information running on
the bottom of their screen. The broadcasters also believe that
they can achieve similar quality at lower bitrates through the
usage of the object-based tools.
MPEG was pleased to receive the details of a proposal from
MIT's Media Lab, registered at the preceding meeting, for new
tools addressing synthetic audio. The tools allow generating
synthetic speech and music, using symbolic representations. The
tools will be evaluated by the audio group, along with the tools
for natural audio.
MPEG was also pleased with Microsoft's donation of its C++
implementation of MPEG-4 encoder software, releasing copyright to
ISO. Decoder software had already been made available, both by
the European ACTS MoMuSys project (in C) and Microsoft (in C++).
A framework for parametric scene description has been
designed, extending the VRML specification, to allow 2D and 3D
composition of streamed audiovisual information.
The DSM-CC Multimedia Integration Framework (DMIF) has aligned
its work with the rest of MPEG-4. The DMIF work will now be
carried out in two phases, the first phase (signaling for MPEG-4)
to be completed in 1998, the second phase (extended
functionalities) in 1999.
Parallel to the at this stage very intensive work on MPEG-4,
the experts group devoted some meeting time to further defining
the goals of a new MPEG-standard, called MPEG-7. MPEG-7 will
specify multimedia content descriptors, to allow fast and
efficient search for that content. The work item is attracting
new people to MPEG, and more experts are welcomed. During the
next meeting of MPEG at Bristol University, a seminar on MPEG-7
will be held, with invited experts from the field of multimedia
content retrieval. This seminar will take place on the 9th
of April. The seminar is open to all interested people. A
description of MPEG-7 can be found at: http://drogo.cselt.it/mpeg/mpeg_7.htm
The second edition of the MPEG-2 Audio Standard, 13818-3, was
approved. The second edition incorporates additional beneficial
features identified during the early stages of usage of the
Standard and clarifies details that might previously have caused
some confusion to the users.
Details
The remainder of this document gives information organized
according to the different subgroups. It expends on the
highlights mentioned above.
Video
At the Sevilla meeting, the first results of coding interlaced
video with the MPEG-4 video compression have been presented. The
goal is to allow coding arbitrarily shaped objects at high
resolution and recorded with line interlace, the most common
format of video nowadays.
For the coding of objects with arbitrary shape, a further
compression can be expected. It has not yet been accepted for the
final standard because the complexity of the different proposals
has to be investigated. It is a major concern in MPEG-4 that the
specification has a reasonable complexity.
The ability to cope with transmission errors has been
enhanced, by putting temporal references right after
resynchronisation markers. The compression of the transparency of
video objects ('alpha channels') has been improved, especially in
the case that these are not perfectly opaque.
Before the meeting in Spain, a number of so-called "core
experiments" have been carried out. In the time between this
meeting in Spain and the next meeting in April 1997, some new
core experiments on promising algorithms will be carried out. If
at least two participants find, independently from one another,
an important improvement, the new technique will be considered
for inclusion into the standard. In this way, the highest
performance of the final standardized specification is
guaranteed.
The meeting has released the second version of the Working
Draft of MPEG-4 "visual", a draft of the final
standard. The name "visual" has been chosen to cover
the standardized tools for coding of natural video and synthetically
generated input. The final standard shall have the official state
of "Committee Draft" in November 1997. For its common
experiments, the video group continues developing a reference
encoder, called "Verification Model" (VM).
Systems
In the area of composition, a framework for parametric scene
description has been completed. Its starting point is the VRML
specification, that is extended to allow 2D and 3D composition of
streamed audiovisual information. This framework takes into
account the new content streamed formats designed by MPEG-4
experts such as arbitrarily shaped natural images, synthetics
objects such as facial and body animations and natural and
synthetic audio. The framework also includes the specification of
a bit efficient binary format for scene transmission (the BIFS),
which will be the final format standardized.
MPEG-4 is designing its multiplex to operate in a world in
which the amount of different networks is growing. This means
that heterogeneous digital infrastructures to transport encoded
audiovisual information should be supported. The Systems group
will develop an intermediate layer (the FlexMux), designed to
adapt MPEG-4 streaming data in a flexible manner to the
infrastructure. Whenever possible, interfaces to the existing
transport layers will be defined.
A challenging activity has been set up in order to achieve in
July a demonstration of what could be the added value of the
standardization of a set of APIs. Major actors (Sun, Intel, ...)
are actively working on this subject. Positive results may lead
to the standardization of these APIs in the November '97
Committee Draft.
Audio
MPEG-2 Audio
The second edition of the MPEG-2 Audio Standard, 13818-3, was approved. The edition incorporates additional beneficial features identified during the early stages of usage of the Standard and clarifies details that might previously have caused some confusion to the users.
A comprehensive review of the MPEG-2 Advanced Audio Coding
Draft International Standard resulted in revisions, and it is
fully expected that the Standard will be confirmed during the
April 1997 meeting. A set of guidance notes explaining the
purpose of the second edition has been prepared. These notes will
be available on the MPEG web site,
http://drogo.cselt.it/mpeg/
The Advanced Audio Coding DIS was further developed during the meeting in line with early National Body comments that were available at the start of the Sevilla meeting. These are to be carried through to the Bristol meeting, April 1997, where formal amalgamation and approval will take place. Test plans have been prepared to establish the mono and stereo performance of AAC, as well as that of the scaleable sampling rate profile.
The supporting AAC Technical Report (providing software modules) is progressing well and will soon be available to the community at large.
MPEG-4 Audio work continues apace, and new versions of the Working Draft and software Verification Model have been prepared.
A number of core experiments have been successfully completed, advancing the technical work on MPEG-4 audio. Examples of this technical improvement are the improved quality for the CELP- based speech coder core and several improvements adding scalability.
A total of seven audio-relevant responses were received to the
MPEG-4 Call for Proposals. A number of these have been identified
as being suitable for core experiment evaluations. The remaining
proposals will be subject to pre-screening tests and if shown to
be sufficiently good will then be subjected to more rigorous
tests.
Functionalities and tools for MPEG-4 Audio
New versions of the MPEG-4 Audio Working Draft and Verification Model were developed at this meeting.
MPEG-4 audio coding integrates the worlds of synthetic and natural coding of audio. The synthetic coding part is comprised of tools for the realization of symbolically defined music and speech. This includes MIDI and Text-to-Speech systems. Furthermore, tools for the 3-D localization of sound are included, allowing the creation of artificial sound environments using artificial and natural sources.
Synthetic audio is described by first defining a set of 'instrument' modules that can create and process audio signals under the control of a script or score file. An instrument is a small network of signal processing primitives that can emulate the effects of a natural acoustic instrument. A script or score is a time-sequenced set of commands that invokes various instruments at specific times to contribute their output to an overall music performance. Other instruments, serving the function of effects processors (reverberators, spatializers, mixers), can be similarly invoked to receive and process the outputs of the performing instruments. These actions can not only realize a music composition but can also organize any other kind of audio, such as speech, sound effects and general ambience. Likewise, the audio sources can themselves be natural sounds, perhaps emanating from an audio channel decoder, thus enabling synthetic and natural sources to be merged with complete timing accuracy.
MPEG-4 standardizes natural audio coding at bitrates ranging from 2 kbit/s up to 64 kbit/s. The presence of the MPEG-2 AAC standard within the MPEG-4 tool set will provide for compression of general audio at the highest quality. For the bitrates from 2 kbit/s up to 64 kbit/s, the MPEG-4 standard normalizes the bitstream syntax and decoding processes in terms of a set of tools. In order to achieve high audio quality within the full range of bitrates and at the same time provide the extra functionalities, three types of coder have been defined. The lowest bitrate range between about 2 and 6 kbit/s, mostly used for speech coding at 8 kHz sampling frequency, is covered by parametric coding techniques. Coding at the medium bitrates between about 6 and 24 kbit/s uses Code Excited Linear Predictive (CELP) coding techniques. In this region, two sampling rates, 8 and 16 kHz, are used to support a broader range of audio signals (other than speech). For the bitrates typically starting at about 16 kbit/s, time to frequency coding techniques are applied. The audio signals in this region typically have bandwidths starting at 8 kHz.
A number of functionalities are provided to facilitate a wide variety of applications which could range from intelligible speech to high quality multichannel audio. Examples of the functionalities are speed control, pitch change, error resilience and scalability in terms of bitrate, bandwidth, error robustness, complexity, etc. as defined below.
To allow for smooth transitions between the bitrates and to allow for bitrate and bandwidth scalability, a general framework has been defined. This is illustrated in figure 1.
Figure 1
By adding enhancements to a coder operating at a low bitrate,
both the coding quality and the audio bandwidth can be improved.
These enhancements are realized within a single coder or
alternatively by combining different techniques.
Background information
The next MPEG meeting will take place in Bristol, UK, from 7 to 11 April 1997.
For further information on this press release and MPEG work in general, please contact:
Dr. Leonardo Chiariglione, (Convenor of MPEG)
CSELT
Via G. Reiss Romoli, 274
10148 Torino, ITALY
Tel.: +39 11 228 6120; Fax: +39 11 228 6299
Email: leonardo.chiariglione@cselt.it
or refer to the MPEG homepage: