INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11N
March 2000
MPEG-4 Video
1. What are the main differences with respect to MPEG-1/2 in terms of requirements and functionalities?
The most important goal of both MPEG-1 and MPEG-2 was to make the storage and transmission of digital AV material more efficient, by compressing the data. Therefore, they deal with frame-based video and audio. Interaction with the content is limited to the video frame level, with its associated audio.
The new MPEG-4 standard goes beyond these goals by specifying a description of digital AV scenes in the form of AV objects that have certain relations in space and time.
Starting from this structure, MPEG-4 will:
Moreover, MPEG-4 will allow universal access to multimedia information, by taking into account specificities of a wide variety of networks.
2. What are the formats that are supported?
MPEG-4 Video supports :
These supported formats are given by the Requirements Group, and may still change in the future.
3. What are the bitrates that are supported ?
MPEG-4 Video is optimized for :
It shall support both constant bitrate (CBR) and variable bitrate (VBR).
These supported bitrates modes are given by the Requirements Group, and may still change in the future.
4. How is it related with ITU-T studies (namely H.263+)?
Although the schedules and requirements for MPEG-4 video and ITU-Ts video coding efforts are different, their requirements do overlap in many ways, for example both efforts seek at least to define a standard that can efficiently code natural video at bit rates in the range of 24Kbps to 64Kbps. Because of this overlap in requirements, many participants in MPEG-4 Video also participate in the ITU-Ts LBC effort. Through these common members and through formal liaison statements the MPEG-4 Video and ITU-T frequently share their results with each other in hopes of improving their standards and achieving a high degree of interoperability between their standards.
5. Where do the needs come from? What are the targeted applications?
At the beginning of the work on MPEG-4, the objective of the new standard was to address very low bitrate coding issues. But its target was considerably modified in order to take the changes in the audiovisual environment into account, by addressing the new demands that arise in a world in which more and more audiovisual material is exchange in digital form. New issues that were to be covered by a standard were for instance interactivity with the content, and improved compression for storage and transmission of limited capacities.
Targeted applications are for instance :
6. What are exactly the functionalities that are supported by MPEG-4 Video?
MPEG-4 supports eight key functionalities, that can be gathered around three classes:
There are 5 different object types for representing natural video information.
The Simple object type uses a subset of the tools in Core, and Core in return uses a subset of the tools in Main. The tools in the Simple Scalable object type are a superset of the tools in Simple, while The N-bit object type is a superset of Core (and hence also of Simple).
There is one special object type for representing still natural visual information:
The following object types use synthetic tools, some of them in combination with natural video texture:
The visual profiles determine which visual object types can be present in the scene. This is also the way they are defined: as a list of admissible object types. Quite a few of them correspond to the most complicated object that they support, and they also have similar names. Below we will list the profiles and mention some application areas. Note again that these are only suggestions and that profiles were not designed for specific applications. This is also why their names are generic and refer to tools rather than applications or services.
A partial hierarchy exists in the visual profiles, the same hierarchy that we described above for the corresponding object types. This means that Main is a superset of Core, which in itself is a superset of Simple. N-bit is a superset of Core. Simple Scalable is a superset of Simple, in such a way that the Simple profile can decode the base layer of Simple Scalable bitstream.
Work in MPEG is ongoing for amendments (additions) to the standard, beyond Version 2. MPEG has started research on MPEG-4 for Studio applications, notably in the visual area, which requires considerably higher bitrates than are currently supported. If this work is indeed continued (and there is every reason to believe that it will) then several new (visual) profiles are anticipated. Another research item probably leading to one or more new visual profiles is visual fine-grain scalability, a much desired feature already present in MPEG-4 audio.
9. How does it compare to existing standards in terms of compression efficiency?
MPEG-4 will provide for a wide range of bitrates (see FAQ 4) equal or better compression efficiency.
10. Does a description of the algorithm exist? Is it available?
It exists : the MPEG-4 Video verification model (VM) is the description of the tools and algorithms currently forming MPEG-4 video. After the first competitive phase between different algorithms, currently the emphasis is shifted on the second collaborative phase to improve the performance of these tools, and make the Video VM converge to a more stabilized status.
The Video VM is refined with every MPEG meeting by (more or less) slight modifications, or inclusions of new techniques. New proposals of algorithms are cross-checked by a «core-experiment» process, where these algorithms are independently implemented and verified according to predefined test conditions by at least two independent institutions. If it is agreed by the whole group that the new algorithm offers new functionalities or better performance in terms of coding efficiency or implementation complexity, the algorithm is included in the MPEG-4 Video VM.
The textual form of the MPEG Video VM is verified by independent software implementations (see FAQ 10) to make sure that the text is written clearly enough to be understood by experts in the field of video compression.
The VM description is available to every MPEG member.
11. Does an implementation of the video codec exits? Is it available? How can we get it : any registration, payment, problems of copyrights and patents?
There exist two official MPEG implementations of the Video codec : one in C provided by the European project ACTS-MOMUSYS, and one in C++ provided by Microsoft. These both programs implement what is described in the VM description document (see FAQ 9), and are evolving from meeting to meeting by integrating the changes of the description.
Any MPEG member can download freely the source code of these implementation, which are available on MPEG ftp site. MPEG copyright then applies.
12. Does also a hardware implementation of the video codec exist?
Several companies are working on special purpose hardware to support MPEG-4 video. Given the extra functionalities with respect to previous MPEG standards, it can be expected that MPEG-4 hardware implementations will have to provide more flexibility. This could for example be achieved by programmable processors or embedded cores in application specific VLSI. One of the reasons is that MPEG-4 is supporting arbitrarily shaped objects in contrast to previous block based schemes. However, rectangular shaped objects are supported as a special case in MPEG-4.
See FAQ 11 for more details on computational complexity.
13. What is the computational complexity of the codec?
The MPEG-4 Video VM consists of several tools showing different demands in computational power. For several application fields a collection of these tools are specified as «profiles», which have different computational demands. Beyond the type of (de)coding profiles/tools used, the computational complexity depends also on image size and frame rate.
It is expected, that for small (QCIF, 176x144 pixels) video formats decoding is possible at acceptable frame rate with an average PC at IS stage (<98), enabling solutions for cost sensitive markets. Larger image formats might require special hardware accelerators or special processor extensions at the decoder, as well as at the encoder. By proper selection of MPEG-4 tools, similar or lower complexity as MPEG-1,-2 and ITU-T H.263 (at same image size and frame rate) seems possible for specific applications.
Some of the new MPEG-4 functionalities require higher computational power, but provide higher compression efficiency or new functionalities. Techniques which are considered to add computational complexity compared to the previous MPEG video standards include: shape coding of arbitrarily shaped objects, sprite generation, macroblock padding for arbitrarily shaped objects and rendering system at the decoder. Experience with previous video standards has shown that for computational intensive algorithms fast HW and SW implementations were found soon (compare e.g. DCT/IDCT or Motion Estimation).
Beside the computational complexity the memory requirement is also essential for cost sensitive markets. One of the factors determining the required memory is the number and size of video objects supported. For profiles with low complexity demands the maximum number of video objects is limited.
14. How is segmentation handled?
To employ some new audiovisual functionalities of MPEG-4, arbitrarily shaped video objects have to be provided by a segmentation process, which can be performed offline (non-realtime) or online (realtime), by automatic tools or semi-automatically. Though some segmentation techniques are under study within and outside MPEG, the segmentation process will not be part of the MPEG-4 standard.
A lot of content is already available under the form of video objects (e.g. graphic games). For cases when segmentaion is necessary, the features will depend on the application.
For example, for realtime communication applications it is regarded to be useful to have a segmentation of the background and foreground object (i.e. the communication partner), which could be a rough segmentation performed at real time. For broadcast applications, techniques like «blue-boxing» (i.e. chroma keying) have been used for some time now at television studios and can be applied for MPEG-4 to gain pre-segmented video material. Other applications of segmentation techniques include object tracking, enabling MPEG-4 interactivity.
15. Where are you in the process of specifying the MPEG-4 standard?
The visual parts of MPEG-4 version 1 and version 2 have been approved as International Standards. However there is still much work required for version 3 (studio profile) and version 4 (streaming profile).
16. Who is the chairperson of the video group? How can I contact him?
MPEG is composed of main groups, each one being in charge of a different key issue of the standard to come. Among them, we have : Requirements, Video, Audio, SNHC (Synthetic and Natural Hybrid Coding), Systems, Implementation, and Tests.
The Video group is chaired by Thomas Sikora (sikora@hhi.de).
17. Are there some e-mail reflectors? What are their scope and purpose? How to subscribe?
There exist an MPEG general reflector, which is the support for all MPEG general matters. In order to subscribe to this reflector, each individual has to address a request to his MPEG head of national delegation (from the National Body of his country).
Then, the different MPEG groups often settle their own reflectors, in order to coordinate the work of their members between meetings, and allow open discussions on arising problems or important issues.
Any MPEG member can subscribe to these reflectors by simply contacting the chairpersons of their corresponding groups