INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11 N4668
March 2002
Source: WG11 (MPEG)
Status: Final
Title: MPEG-4 Overview - (V.21 Jeju Version)
Editor: Rob Koenen (rob.koenen@m4if.org)
All comments, corrections, suggestions and additions to this document are welcome, and should be send to both the editor and the chairman of MPEG’s Requirements Group: Fernando Pereira, fp@lx.it.pt
Overview of the MPEG-4 Standard
MPEG-4 is an ISO/IEC standard developed by MPEG (Moving Picture Experts Group), the committee that also developed the Emmy Award winning standards known as MPEG-1 and MPEG-2. These standards made interactive video on CD-ROM, DVD and Digital Television possible. MPEG-4 is the result of another international effort involving hundreds of researchers and engineers from all over the world. MPEG-4, with formal as its ISO/IEC designation 'ISO/IEC 14496', was finalized in October 1998 and became an International Standard in the first months of 1999. The fully backward compatible extensions under the title of MPEG-4 Version 2 were frozen at the end of 1999, to acquire the formal International Standard Status early in 2000. Several extensions were added since and work on some specific work-items work is still in progress.
MPEG-4 builds on the proven success of three fields:
MPEG-4 provides the standardized technological elements enabling the integration of the production, distribution and content access paradigms of the three fields.
More information about MPEG-4 can be found at MPEG’s home page (case sensitive): http://mpeg.chiariglione.org This web page contains links to a wealth of information about MPEG, including much about MPEG-4, many publicly available documents, several lists of ‘Frequently Asked Questions’ and links to other MPEG-4 web pages.
The standard can be bought from ISO, send mail to sales@iso.ch. Notably, the complete software for MPEG-4 version 1 can be bought on a CD ROM, for 56 Swiss Francs. It can also be downloaded for free from ISO’s website: www.iso.ch/ittf - look under publicly available standards and then for “14496-5”. This software is free of copyright restrictions when used for implementing MPEG-4 compliant technology. (This does not mean that the software is free of patents).
As well, much information is available from the MPEG-4 Industry Forum, M4IF, http://www.m4if.org. See section 7, The MPEG-4 Industry Forum.
This document gives an overview of the MPEG-4 standard, explaining which pieces of technology it includes and what sort of applications are supported by this
technology.
|
|
Scope and
features of the MPEG-4 standard |
|
|
|
Coded
representation of media objects |
|
|
|
Composition of
media objects |
|
|
|
Description and
synchronization of streaming data for media objects |
|
|
|
Delivery of
streaming data |
|
|
|
Interaction
with media objects |
|
|
|
Management and
Identification of Intellectual Property |
|
|
|
Versions in
MPEG-4 |
|
|
|
Major
Functionalities in MPEG-4 |
|
|
|
Transport |
|
|
|
DMIF |
|
|
|
Systems |
|
|
|
Audio |
|
|
|
Visual |
|
|
|
Extensions
Underway |
|
|
|
IPMP Extensions |
|
|
|
The Animation
Framework eXtension, AFX |
|
|
|
Multi User
Worlds |
|
|
|
Advanced Video
Coding |
|
|
|
Audio
Extensions |
|
|
|
Profiles in
MPEG-4 |
|
|
|
Visual Profiles |
|
|
|
Audio Profiles |
|
|
|
Graphics
Profiles |
|
|
|
Scene Graph
Profiles |
|
|
|
MPEG-J Profiles |
|
|
|
Object
Descriptor Profile |
|
|
|
Verification
Testing: checking MPEG’s performance |
|
|
|
Video |
|
|
|
Audio |
|
|
|
The MPEG-4
Industry Forum |
|
|
|
Licensing of
patents necessary to implement MPEG-4 |
|
|
|
Roles in
Licensing MPEG-4 |
|
|
|
Licensing
Situation |
|
|
|
Deployment of
MPEG-4 |
|
|
|
Detailed
technical description of MPEG-4 DMIF and Systems |
|
|
|
Transport of
MPEG-4 |
|
|
|
DMIF |
|
|
|
Demultiplexing,
synchronization and description of streaming data |
|
|
|
Advanced
Synchronization (FlexTime) Model |
|
|
|
Syntax
Description |
|
|
|
Binary Format
for Scene description: BIFS |
|
|
|
User
interaction |
|
|
|
Content-related
IPR identification and protection |
|
|
|
MPEG-4 File
Format |
|
|
|
MPEG-J |
|
|
|
Object Content
Information |
|
|
|
Detailed
technical description of MPEG-4 Visual |
|
|
|
Natural
Textures, Images and Video |
|
|
|
Structure of
the tools for representing natural video |
|
|
|
The MPEG-4
Video Image Coding Scheme |
|
|
|
Coding of
Textures and Still Images |
|
|
|
Synthetic
Objects |
|
|
|
Detailed
technical description of MPEG-4 Audio |
|
|
|
Natural Sound |
|
|
|
Synthesized
Sound |
|
|
|
Detailed
Description of current development |
|
|
|
IPMP Extensions |
|
|
|
The Animation
Framework eXtension, AFX |
|
|
|
Multi User
Worlds |
|
|
|
Advanced Video
Coding |
|
|
|
Audio
Extensions |
|
|
|
Annexes |
|
|
The MPEG-4
development process |
||
|
Organization of
work in MPEG |
||
|
Glossary and
Acronyms |
The MPEG-4 standard provides a set of technologies to satisfy the needs of authors, service providers and end users alike.
For all parties involved, MPEG seeks to avoid a multitude of proprietary, non-interworking formats and players.
MPEG-4 achieves these goals by providing standardized ways to:
The following sections illustrate the MPEG-4 functionalities described above, using the audiovisual scene depicted in Figure 1.
MPEG-4 audiovisual scenes are composed of several media objects, organized in a hierarchical fashion. At the leaves of the hierarchy, we find primitive media objects, such as:
MPEG-4 standardizes a number of such primitive media objects, capable of representing both natural and synthetic content types, which can be either 2- or 3-dimensional. In addition to the media objects mentioned above and shown in Figure 1, MPEG-4 defines the coded representation of objects such as:
A media object in its coded form consists of descriptive elements that allow handling the object in an audiovisual scene as well as of associated streaming data, if needed. It is important to note that in its coded form, each media object can be represented independent of its surroundings or background.
The coded representation of media objects is as efficient as possible while taking into account the desired functionalities. Examples of such functionalities are error robustness, easy extraction and editing of an object, or having an object available in a scaleable
form.
Figure 1 explains the way in which an audiovisual scene in MPEG-4 is described as composed of individual objects. The figure contains compound media objects that group primitive media objects together. Primitive media objects correspond to leaves in the descriptive tree while compound media objects encompass entire sub-trees. As an example: the visual object corresponding to the talking person and the corresponding voice are tied together to form a new compound media object, containing both the aural and visual components of that talking person.
Such grouping allows authors to construct complex scenes, and enables consumers to manipulate meaningful (sets of)
objects.
More generally, MPEG-4 provides a standardized way to describe a scene, allowing for example to:
The scene description builds on several concepts from the Virtual Reality Modeling language (VRML) in terms of both its structure and the functionality of object composition nodes and extends it to fully enable the aforementioned features.
Figure 1 - an example of an MPEG-4 Scene
Media objects may need streaming data, which is conveyed in one or more elementary streams. An object descriptor identifies all streams associated to one media object. This allows handling hierarchically encoded data as well as the association of meta-information about the content (called ‘object content information’) and the intellectual property rights associated with it.
Each stream itself is characterized by a set of descriptors for configuration information, e.g., to determine the required decoder resources and the precision of encoded timing information. Furthermore the descriptors may carry hints to the Quality of Service (QoS) it requests for transmission (e.g., maximum bit rate, bit error rate, priority, etc.)
Synchronization of elementary streams is achieved through time stamping of individual access units within elementary streams. The synchronization layer manages the identification of such access units and the time stamping. Independent of the media type, this layer allows identification of the type of access unit (e.g., video or audio frames, scene description commands) in elementary streams, recovery of the media object’s or scene description’s time base, and it enables synchronization among them. The syntax of this layer is configurable in a large number of ways, allowing use in a broad spectrum of
systems.
The synchronized delivery of streaming information from source to destination, exploiting different QoS as available from the network, is specified in terms of the synchronization layer and a delivery layer containing a two-layer multiplexer, as depicted in Figure 2.
The first multiplexing layer is managed according to the DMIF specification, part 6 of the MPEG4 standard. (DMIF stands for Delivery Multimedia Integration Framework) This multiplex may be embodied by the MPEG-defined FlexMux tool, which allows grouping of Elementary Streams (ESs) with a low multiplexing overhead. Multiplexing at this layer may be used, for example, to group ES with similar QoS requirements, reduce the number of network connections or the end to end delay.
The “TransMux” (Transport Multiplexing) layer in Figure 2 models the layer that offers transport services matching the requested QoS. Only the interface to this layer is specified by MPEG-4 while the concrete mapping of the data packets and control signaling must be done in collaboration with the bodies that have jurisdiction over the respective transport protocol. Any suitable existing transport protocol stack such as (RTP)/UDP/IP, (AAL5)/ATM, or MPEG-2’s Transport Stream over a suitable link layer may become a specific TransMux instance. The choice is left to the end user/service provider, and allows MPEG-4 to be used in a wide variety of operation environments.
Figure 2 - The MPEG-4 System Layer Model
Use of the FlexMux multiplexing tool is optional and, as shown in Figure 2, this layer may be empty if the underlying TransMux instance provides all the required functionality. The synchronization layer, however, is always present.
With regard to Figure 2, it is possible to:
Parts of the control functionalities are available only in conjunction with a transport control entity like the DMIF framework.
In general, the user observes a scene that is composed following the design of the scene’s author. Depending on the degree of freedom allowed by the author, however, the user has the possibility to interact with the scene. Operations a user may be allowed to perform include:
More complex kinds of behavior can also be triggered, e.g. a virtual phone rings, the user answers and a communication link is established.
It is important to have the possibility to identify intellectual property in MPEG-4 media objects. Therefore, MPEG has worked with representatives of different creative industries in the definition of syntax and tools to support this. A full elaboration of the requirements for the identification of intellectual property can be found in ‘Management and Protection of Intellectual Property in MPEG-4, which is publicly available from the MPEG home page.
MPEG-4 incorporates identification the intellectual property by storing unique identifiers, which are issued by international numbering systems (e.g. ISAN, ISRC, etc. ). These numbers can be applied to identify a current rights holder of a media object. Since not all content is identified by such a number, MPEG-4 Version 1 offers the possibility to identify intellectual property by a key-value pair (e.g.:»composer«/»John Smith«). Also, MPEG-4 offers a standardized interface that is integrated tightly into the Systems layer to people who want to use systems that control access to intellectual property. With this interface, proprietary control systems can be easily amalgamated with the standardized part of the decoder.
MPEG-4 Version 1 was approved by MPEG in December 1998; version 2 was frozen in December 1999. After these two major versions, more tools were added in subsequent amendments that could be qualified as versions, even though they are harder to recognize as such. Recognizing the versions is not too important, however; it is more important to distinguish Profiles. Existing tools and profiles from any version are never replaced in subsequent versions; technology is always added to MPEG4 in the form of new profiles. Figure 3 below depicts the relationship between the versions. Version 2 is a backward compatible extension of Version 1, and version 3 is a backward compatible extension of Version 2 – and so on. The versions of all major parts of the MPEG-4 Standard (Systems, Audio, Video, DMIF) were synchronized; after that, the different parts took their own paths.
Figure 3 - relation between MPEG-4 Versions
The Systems layer of Version later versions is backward compatible with all earlier versions. In the area of Systems, Audio and Visual, new versions add Profiles, do not change existing ones. In fact, it is very important to note that existing systems will always remain compliant, because Profiles will never be changed in retrospect, and neither will the Systems Syntax, at least not in a backward-incompatible way.
This section contains, in an itemized fashion, the major functionalities that the different parts of the MPEG-4 Standard offers in the finalized MPEG-4 Version 1. Description of the functionalities can be found in the following sections.
In principle, MPEG-4 does not define transport layers. In a number of cases, adaptation to a specific existing transport layer has been defined:
DMIF, or Delivery Multimedia Integration Framework, is an interface between the application and the transport, that allows the MPEG-4 application developer to stop worrying about that transport. A single application can run on different transport layers when supported by the right DMIF instantiation.
MPEG-4 DMIF supports the following functionalities:
As explained above, MPEG-4 defines a toolbox of advanced compression algorithms for audio and visual information. The data streams (Elementary Streams, ES) that result from the coding process can be transmitted or stored separately, and need to be composed so as to create the actual multimedia presentation at the receiver side.
The systems part of the MPEG-4 addresses the description of the relationship between the audio-visual components that constitute a scene. The relationship is described at two main
levels.
Other issues addressed by MPEG-4 Systems:
MPEG-4 Audio facilitates a wide variety of applications which could range from intelligible speech to high quality multichannel audio, and from natural sounds to synthesized sounds. In particular, it supports the highly efficient representation of audio objects consisting of:
3.4.1 General Audio Signals
Support for coding general audio ranging from very low bitrates up to high quality is provided by transform coding techniques. With this functionality, a wide range of bitrates and bandwidths is covered. It starts at a bitrate of 6 kbit/s and a bandwidth below 4 kHz and extends to broadcast quality audio from mono up to multichannel. High quality can be achieved with low delays. Parametric Audio Coding allows sound manipulation at low speeds. Fine Granularity Scalability (or FGS, scalability resolution down to 1 kbit/s per channel)
3.4.2 Speech signals
Speech coding can be done using bitrates from 2 kbit/s up to 24 kbit/s using the speech coding tools. Lower bitrates, such as an average of 1.2 kbit/s, are also possible when variable rate coding is allowed. Low delay is possible for communications applications. When using the HVXC tools, speed and pitch can be modified under user control during playback. If the CELP tools are used, a change of the playback speed can be achieved by using and additional tool for effects processing.
3.4.3 Synthetic Audio
MPEG-4 Structured Audio is a language to describe 'instruments' (little programs that generate sound) and 'scores' (input that drives those objects). These objects are not necessarily musical instruments, they are in essence mathematical formulae, that could generate the sound of a piano, that of falling water – or something 'unheard' in nature.
3.4.4 Synthesized SpeechScalable
TTS coders bitrate range from 200 bit/s to 1.2 Kbit/s which allows a text, or a text with prosodic parameters (pitch contour, phoneme duration, and so on), as its inputs to generate intelligible synthetic speech.
The MPEG-4 Visual standard allows the hybrid coding of natural (pixel based) images and video together with synthetic (computer generated) scenes. This enables, for example, the virtual presence of videoconferencing participants. To this end, the Visual standard comprises tools and algorithms supporting the coding of natural (pixel based) still images and video sequences as well as tools to support the compression of synthetic 2-D and 3-D graphic geometry parameters (i.e. compression of wire grid parameters, synthetic text).
The subsections below give an itemized overview of functionalities that the tools and algorithms of in the MPEG-4 visual standard.
3.5.1 Formats Supported
The following formats and bitrates are be supported by MPEG-4 Visual :
3.5.2 Compression Efficiency
3.5.3 Content-Based Functionalities
3.5.4 Scalability of Textures, Images and Video
3.5.5 Shape and Alpha Channel Coding
3.5.6 Robustness in Error Prone Environments
Error resilience allows accessing image and video over a wide range of storage and transmission media. This includes the useful operation of image and video compression algorithms in error-prone environments at low bit-rates (i.e., less than 64 Kbps). There are tools that address both the band-limited nature and error resiliency aspects of access over wireless networks.
3.5.7 Face and Body Animation
The ‘Face and Body Animation’ tools in the standard allow sending parameters that can define, calibrate and animate synthetic faces and bodies. These models themselves are not standardized by MPEG-4, only the parameters are, although there is a way to send, e.g., a well-defined face to a decoder.
The tools include:
3.5.8 Coding of 2-D Meshes with Implicit Structure
2D mesh coding includes:
3.5.9 Coding of 3-D Polygonal Meshes
MPEG-4 provides a suite of tools for coding 3-D polygonal meshes. Polygonal meshes are widely used as a generic representation of 3-D objects. The underlying technologies compress the connectivity, geometry, and properties such as shading normals, colors and texture coordinates of 3-D polygonal meshes.
The Animation Framework eXtension (AFX, see further down) will provide more elaborate tools for 2D and 3D synthetic objects.
MPEG is currently working on a number of extensions:
The Animation Framework extension (AFX – pronounced ‘effects’) provides an integrated toolbox for building attractive and powerful synthetic MPEG-4 environments. The framework defines a collection of interoperable tool categories that collaborate to produce a reusable architecture for interactive animated contents. In the context of AFX, a tool represents functionality such as a BIFS node, a synthetic stream, or an audio-visual stream.
AFX utilizes and enhances existing MPEG-4 tools, while keeping backward-compatibility, by offering:
Compression of animated paths and animated models is required for improving the transmission and storage efficiency of representations for dynamic and static tools.
Work is ongoing on MPEG-4 part 10, 'Advanced Video Coding', This codec is being developed jointly with ITU-T, in the so-called Joint Video Team (JVT). The JVT unites the standard world's video coding experts in a single group. The work currently underway is based on earlier work in ITU-T on H.264 (formerly H.26L). H.264 and MPEG-4 part 10 will be the same. MPEG-4 AVC/H.26L4 is slated to be ready by the end of 2002.
There are two work items underway for improving audio coding efficiency even further.
a) Bandwidth extension
Bandwidth extension is a tool that gives a better quality perception over the existing audio signal, while keeping the existing signal backward compatible.
MPEG is investigating bandwidth extensions, and may standardize of one or both of:
A single technology that addresses both of these signals is preferred. This technology shall be both forward and backward compatible with existing MPEG-4 technology. In other words, an MPEG-4 decoder can decode an enhanced stream and a new technology decoder can decode an MPEG-4 stream. There are two possible configurations for the enhanced stream: MPEG-4 AAC streams can carry the enhancement information in the DataStreamElement, while all MPEG-4 systems know the concept of elementary streams, which allow second Elementary Stream for a given audio object, containing the enhancement information.
b) Parametric coding
The MPEG-4 standard already provides a parametric coding scheme for coding of general audio signals for low bit-rates (HILN, "Harmonic Individual Lines and Noise"). The extension investigates parametric coding of general audio signals for the higher quality range, to extend the capabilities currently provided by HILN. Whenever possible this technology will build upon the existing MPEG-4 HILN technology.
MPEG-4 provides a large and rich set of tools for the coding of audio-visual objects. In order to allow effective implementations of the standard, subsets of the MPEG-4 Systems, Visual, and Audio tool sets have been identified, that can be used for specific applications. These subsets, called ‘Profiles’, limit the tool set a decoder has to implement. For each of these Profiles, one or more Levels have been set, restricting the computational complexity. The approach is similar to MPEG-2, where the most well known Profile/Level combination is ‘Main Profile @ Main Level’. A Profile@Level combination allows:
Profiles exist for various types of media content (audio, visual, and graphics) and for scene descriptions. MPEG does not prescribe or advise combinations of these Profiles, but care has been taken that good matches exist between the different areas.
The visual part of the standard provides profiles for the coding of natural, synthetic, and synthetic/natural hybrid visual content. There are five profiles for natural video content:
The profiles for synthetic and synthetic/natural hybrid visual content are:
Version 2 adds the following Profiles for natural video:
The Version 2 profiles for synthetic and synthetic/natural hybrid visual content are:
In subsequent Versions, the following Profiles were added:
Four Audio Profiles have been defined in MPEG-4 V.1:
Another four Profiles were added in MPEG-4 V.2:
Graphics Profiles define which graphical and textual elements can be used in a scene. These profiles are defined in the Systems part of the standard:
5.3.1 Profiles under Definition or Consideration
The following profiles were under development at the time of writing this Overview; their inclusion in the standard was highly likely, but not guaranteed.
Scene Graph Profiles (or Scene Description Profiles), defined in the Systems part of the standard, allow audiovisual scenes with audio-only, 2-dimensional, 3-dimensional or mixed 2-D/3-D content.
5.4.1 Profiles under definition
At the time of writing, the following profiles were likely to be defined:
Two MPEG-J Profiles exist: Personal and Main:
The personal profile addresses a range of constrained devices including mobile and portable devices. Examples of such devices are cell video phones, PDAs, personal gaming devices. This profile includes the following packages of MPEG-J APIs:
- Network
- Scene
- Resource
The Main profile addresses a range of consumer devices including entertainment devices. Examples of such devices are set top boxes, computer based multimedia systems etc. It is a superset of the Personal profile. Apart from the packages in the Personal profile, this profile includes the following packages of the MPEG-J APIs:
- Decoder
- Decoder Functionality
- Section Filter and Service Information
The Object
Descriptor Profile includes the following tools:
Object Descriptor
(OD) tool
Sync
Layer (SL) tool
Object
Content Information (OCI) tool
Intellectual
Property Management and Protection (IPMP) tool
Currently, only one
profile is defined that includes all these tools. The main reason for defining
this profile is not subsetting the tools, but rather defining levels for them.
This applies especially to the Sync Layer tool, as MPEG-4 allows multiple time
bases to exist. In the context of Levels for this Profile, restrictions can be
defined, e.g. to allow only a single time base.
MPEG
carries out verification tests to check whether the standard delivers what it
promises.
The
test results can be found on MPEG's home page, http://www.cselt.it/mpeg/quality_tests.htm
The
main results are described below; more verification tests are planned.
A number of
MPEG-4's capabilities have been formally evaluated using subjective tests.
Coding efficiency, although not the only MPEG-4 functionality, is an important
selling point of MPEG‑4, and
one that has been tested more thoroughly. Also error robustness has been put to
rigorous tests. Furthermore, scalability tests were done and for one specific
profile the temporal resolution stability was examined. Many of these tests
address a specific profile.
In this Low and
Medium Bitrates Test, frame-based sequences were examined, with MPEG-1 as a
reference. (MPEG-2 would be identical for the progressive sequences used, except
that MPEG‑1 is a bit more efficient as it uses less overhead for header
information). The test uses typical test sequences for CIF and QCIF resolutions,
encoded with the same rate control for both MPEG-1 and MPEG-4 to compare the
coding algorithms without the impact of different rate control schemes. The test
was performed for low bit rates starting at 40 kbps to medium bit rate up to 768
kbps.
The tests of
the Coding Efficiency functionality show a clear superiority of MPEG-4 toward
MPEG-1 at both the low and medium bit rate coding conditions whatever the
criticality of the scene. The human subjects have consistently chose MPEG-4 as
statistically significantly superior by one point difference for a full scale of
five points.
The verification
tests for Content Based Coding compare the visual quality of object-based versus
frame-based coding. The major objective was to ensure that object-based coding
can be supported without impacting the visual quality. Test content was chosen
to cover a wide variety of simulation conditions, including video segments with
various types of motions and encoding complexities. Additionally, test
conditions were established to cover low bit rates ranging from 256kb/s to
384kb/s, as well as high bit-rates ranging from 512kb/s to 1.15Mb/s.
The results of the tests clearly demonstrated that object-based
functionality is provided by MPEG-4 with no overhead or loss in terms of visual
quality, when compared to frame-based coding. There is no statistically
significant difference among any object-based case and the relevant frame-based
ones. Hence the conclusion: MPEG-4 is able to provide content-based
functionality without introducing any loss in terms of visual quality.
The formal
verification tests on Advanced Coding Efficiency (ACE) Profile were performed to
check whether three new Version 2 tools, as included the MPEG-4 Visual Version 2
ACE Profile (Global Motion Compensation, Quarter Pel Motion Compensation and
Shape-adaptive DCT) enhance the coding efficiency compared with MPEG-4 Visual
Version 1. The tests explored the performance of the ACE Profile and the MPEG-4
Visual Version 1 Main Profile in the object-based low bit rate case, the
frame-based low bit rate case and the frame-based high bit rate case. The
results obtained show a clear superiority of the ACE Profile compared with the
Main Profile; more in detail:
For
the object based case, the quality provided by the ACE Profile at 256 kb/s
is equal to the quality provided by Main Profile at 384 kb/s.
For
the frame based at low bit rate case, the quality provided by the ACE
Profile at 128 kb/s and 256 kb/s is equal to the quality provided by Main
Profile at 256 kb/s and 384 kb/s respectively.
For
the frame based at high bit rate case, the quality provided by the ACE
Profile at 768 kb/s is equal to the quality provided by Main Profile at 1024
kb/s.
When interpreting
these results, it must be noted that the MPEG-4 Main Profile is already more
efficient than MPEG-1 and MPEG-2.
The performance of error resilient video in the MPEG-4 Simple Profile was
evaluated in subjective tests simulating MPEG-4 video carried in a realistic
multiplex and over ditto radio channels, at bitrates between 32 kbit/s and 384
kbit/s. The test used a simulation of the residual errors after channel coding
at bit error rates up to 10-3, and the average length of the burst
errors was about 10ms. The test methodology was based on a continuous quality
evaluation over a period of three minutes. In such a test, subjects constantly
score the degradation they experience.
The results show that the average video quality achieved on the mobile
channel is high, that the impact of errors is effectively kept local by the
tools in MPEG-4 video, and that the video quality recovers quickly at the end of
periods of error. These excellent results were achieved with very low overheads,
less than those typically associated with the GOP structure used in MPEG-1 and
MPEG-2 video.
The performance of error resilient video in MPEG-4 ARTS Profile was checked
in subjective tests similar to those mentioned in the previous section, at
bitrates between 32 kbit/s and 128 kbit/s. In this case, the residual errors
after channel coding was up to 10-3, and the average length of the
burst errors was about 10 ms (called “critical”) or 1 ms (called “very
critical” - this one is more critical because the same amount of errors is
more spread over the bitstream than in the “critical” case).
The results show a
clear superiority of the ARTS Profile over the Simple Profile for both the error
cases (“critical” and “very critical”). More in detail the ARTS Profile
outperforms Simple Profile in the recovery time from transmission errors.
Furthermore ARTS Profile in the “critical” error condition provides results
that for most of the test time are close to a complete transparency, while
Simple Profile is still severely affected by errors. These
excellent results were achieved with very low overheads and very fast error
recovery provided the NEWPRED, and under low delay conditions.
This
test explored the performance of a video
codec using the Dynamic Resolution Conversion
technique that adapts the resolution to the video content and to circumstances
in real-time. Active scene content was coded at 64
kb/s, 96 kb/s and 128 kb/s
datarates. The results show that
at 64 kbit/s, it outperforms the already effective Simple Profile operating at
96 kbit/s, and at 96 kb/s, the visual quality is equally to that of the Simple
profile at 128 kbit/s. (The Simple profile already compares well to other,
existing systems.)