INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO

 

ISO/IEC JTC1/SC29/WG11N
March 2000

 

MPEG-4 Video 

Frequently Asked Questions

 

 

1. What are the main differences with respect to MPEG-1/2 in terms of requirements and functionalities?

The most important goal of both MPEG-1 and MPEG-2 was to make the storage and transmission of digital AV material more efficient, by compressing the data. Therefore, they deal with ‘frame-based video’ and audio. Interaction with the content is limited to the video frame level, with its associated audio.

The new MPEG-4 standard goes beyond these goals by specifying a description of digital AV scenes in the form of ‘AV objects’ that have certain relations in space and time.

Starting from this structure, MPEG-4 will:

Moreover, MPEG-4 will allow ‘universal access’ to multimedia information, by taking into account specificities of a wide variety of networks.

 2. What are the formats that are supported?

 MPEG-4 Video supports :

These supported formats are given by the Requirements Group, and may still change in the future.

 3. What are the bitrates that are supported ?

MPEG-4 Video is optimized for :

It shall support both constant bitrate (CBR) and variable bitrate (VBR).

These supported bitrates modes are given by the Requirements Group, and may still change in the future.

 4. How is it related with ITU-T studies (namely H.263+)?

Although the schedules and requirements for MPEG-4 video and ITU-T’s video coding efforts are different, their requirements do overlap in many ways, for example both efforts seek at least to define a standard that can efficiently code natural video at bit rates in the range of 24Kbps to 64Kbps. Because of this overlap in requirements, many participants in MPEG-4 Video also participate in the ITU-T’s LBC effort. Through these common members and through formal liaison statements the MPEG-4 Video and ITU-T frequently share their results with each other in hopes of improving their standards and achieving a high degree of interoperability between their standards.

 5. Where do the needs come from? What are the targeted applications?

At the beginning of the work on MPEG-4, the objective of the new standard was to address very low bitrate coding issues. But its target was considerably modified in order to take the changes in the audiovisual environment into account, by addressing the new demands that arise in a world in which more and more audiovisual material is exchange in digital form. New issues that were to be covered by a standard were for instance interactivity with the content, and improved compression for storage and transmission of limited capacities.

Targeted applications are for instance :

 6. What are exactly the functionalities that are supported by MPEG-4 Video?

MPEG-4 supports eight key functionalities, that can be gathered around three classes:

 

  1. What are the different Visual Object types supported by MPEG-4?

There are 5 different object types for representing natural video information.

  1. The Simple object type is an error resilient, rectangular natural video object of arbitrary height/width ratio, developed for low bitrates. It uses relatively simple and inexpensive coding tools, based on I (Intra) and P (Predicted) VOPs (Video Object Planes, the MPEG-4 term for frames)
  2. The Simple Scalable object type is a scalable extension of Simple, which gives temporal and spatial scalability using Simple as the base layer. The enhancement layer is still rectangular.
  3. The Core object type uses a tool superset of Simple, giving better quality through the use of bi-directional interpolation (B-VOPs), and it has binary shape. It supports scalability based on sending extra P (predicted) VOPs. Note that binary shape can include a constant transparency but excludes the variable transparency offered by grey-scale shape coding.
  4. The Main object type is the video object that gives the highest quality. Compared to Core, it also supports grey-scale shape, sprites, and interlaced content in addition to progressive material.
  5. The N-bit object type is equal to the Core object type but it can vary the pixel depth from 4 to 12 bits for the luminance as well as the chrominance planes.
  6. The Simple object type uses a subset of the tools in Core, and Core in return uses a subset of the tools in Main. The tools in the Simple Scalable object type are a superset of the tools in Simple, while The N-bit object type is a superset of Core (and hence also of Simple).

    There is one special object type for representing still natural visual information:

  7. The Still Scalable Texture object type gives an arbitrary shape still image that uses wavelet coding for scalability and incremental download and build-up.
  8. The following object types use synthetic tools, some of them in combination with natural video texture:

  9. The Animated 2D Mesh object type combines the synthetic mesh (either rectangular or Delaunay topology) with natural video. The natural video coding uses the same tools as the Core object type. This video can be mapped onto the mesh and deformed by moving the points in the mesh. It gives interesting animation possibilities. Note that the object can be of arbitrary (binary) shape.
  10. The Basic Animated Texture object type allows mesh animation with arbitrary shape still images (the same images as used for the still scalable texture object type, see above)
  11. The last object type is the Simple Face object type, which has the tools for facial animation. This object type does not define what the face looks like, and the animation can be applied to any local model of choice. Note that MPEG-4 does include tools to download a pre-defined face to the decoder, but these tools are not mandatory in the simple face object type.

 

  1. What are the different profiles supported by MPEG-4?

The visual profiles determine which visual object types can be present in the scene. This is also the way they are defined: as a list of admissible object types. Quite a few of them correspond to the most complicated object that they support, and they also have similar names. Below we will list the profiles and mention some application areas. Note again that these are only suggestions and that profiles were not designed for specific applications. This is also why their names are generic and refer to tools rather than applications or services.

  1. The Simple Profile only accepts objects of type Simple, and was created with low complexity applications in mind. The first usage is mobile use of (audio)visual services, and the second is putting very low complexity video on the Internet. Also small camera devices recording moving video to, e.g., disk or memory chips, can make good use of this profile. It supports up to four objects in the scene with, at the lowest level, a maximum total surface of a QCIF picture. There are 3 levels for the Simple Profile with bitrates from 64 to 384 kbit/s..
    The levels also define the maximum total surface for the objects and the amount of macroblocks per second that the decoder needs to be able to decode. Further, they define the size of various (hypothetical) buffers needed for decoding. While the maximum total object size is defined, the aspect ratio is not prescribed. This gives maximum creative freedom. It could be used for instance in a personal computer screen, where a very wide or a very tall object could be created, or several smaller objects in various places on the screen, not confined to a typical QCIF area.
    The same level philosophy is followed for restricting the complexity of the natural video objects in all the visual profiles.
  2. The Simple Scalable Profile can supply scalable coding in the same operational environments as foreseen for Simple, and has 2 levels defined.
  3. The Core Profile accepts Core and Simple object types. It is useful for higher quality interactive services, combining good quality with limited complexity and supporting arbitrary shape objects. Also mobile broadcast services could be supported by this profile. The maximum bitrate is 384 kbit in Level 1 and 2 Mbit/s Level 2. While the levels do not prescribe the visual session size, they are created with a certain session size in mind, called the ‘typical visual session size’. For Simple this was QCIF, for Core it is QCIF and CIF for the two levels respectively. The amount of macroblocks is chosen such that a scene using this typical session size can have overlapping objects and still be ‘filled’.
  4. The Main Profile was created with broadcast services in mind, addressing progressive as well as interlaced material. It combines the highest quality with the versatility of arbitrarily shaped object using grey-scale coding. The highest level accepts up to 32 objects (of Simple, Core, or Main type) for a maximum total bitrate of 38 Mbit/s.
  5. The N-bit profile is useful for areas that use thermal imagers, such as surveillance applications. Also medical applications may want to use the enhanced pixel depth giving a larger dynamic range in colour and luminance. It accepts objects of type Simple, Core, and N-bit. Currently only one level is defined.
  6. The Scaleable Texture Profile is meant for audiographic applications. It was requested by companies that want to build mobile devices, which combine sound with synchronously displayed pictures, and possibly BIFS-based graphics, in very simple terminals.
  7. The Simple Face Profile accepts only objects of type Simple Face. Depending on the level, either one or a maximum of 4 faces can appear in the scene, e.g., for a virtual meeting. Bitrates remain very low; even for the second level, 32 kbit/s is more than adequate for driving a maximum of four faces.
  8. The Hybrid Profile allows combining both natural and synthetic objects in the same scene while keeping complexity reasonable. On the natural side, it compares to the Core Profile, while on the synthetic side, it adds animated meshes, scalable textures, and animated faces — a rich set of tools for creating attractive hybrid natural and synthetic content. This profile can be used to place ‘real’ objects into a synthetic world and also to do the opposite, adding synthetic objects to a natural environment
  9. The Basic Animated Texture Profile allows animation of still pictures and facial animation. Attractive content can be created at very low bitrates.

A partial hierarchy exists in the visual profiles, the same hierarchy that we described above for the corresponding object types. This means that Main is a superset of Core, which in itself is a superset of Simple. N-bit is a superset of Core. Simple Scalable is a superset of Simple, in such a way that the Simple profile can decode the base layer of Simple Scalable bitstream.

 Work in MPEG is ongoing for amendments (additions) to the standard, beyond Version 2. MPEG has started research on MPEG-4 for Studio applications, notably in the visual area, which requires considerably higher bitrates than are currently supported. If this work is indeed continued (and there is every reason to believe that it will) then several new (visual) profiles are anticipated. Another research item probably leading to one or more new visual profiles is visual fine-grain scalability, a much desired feature already present in MPEG-4 audio.

 9. How does it compare to existing standards in terms of compression efficiency?

MPEG-4 will provide for a wide range of bitrates (see FAQ 4) equal or better compression efficiency.

 10. Does a description of the algorithm exist? Is it available?

It exists : the MPEG-4 Video verification model (VM) is the description of the tools and algorithms currently forming MPEG-4 video. After the first competitive phase between different algorithms, currently the emphasis is shifted on the second collaborative phase to improve the performance of these tools, and make the Video VM converge to a more stabilized status.

The Video VM is refined with every MPEG meeting by (more or less) slight modifications, or inclusions of new techniques. New proposals of algorithms are cross-checked by a «core-experiment» process, where these algorithms are independently implemented and verified according to predefined test conditions by at least two independent institutions. If it is agreed by the whole group that the new algorithm offers new functionalities or better performance in terms of coding efficiency or implementation complexity, the algorithm is included in the MPEG-4 Video VM.

The textual form of the MPEG Video VM is verified by independent software implementations (see FAQ 10) to make sure that the text is written clearly enough to be understood by experts in the field of video compression.

The VM description is available to every MPEG member.

 11. Does an implementation of the video codec exits? Is it available? How can we get it : any registration, payment, problems of copyrights and patents?

There exist two official MPEG implementations of the Video codec : one in C provided by the European project ACTS-MOMUSYS, and one in C++ provided by Microsoft. These both programs implement what is described in the VM description document (see FAQ 9), and are evolving from meeting to meeting by integrating the changes of the description.

Any MPEG member can download freely the source code of these implementation, which are available on MPEG ftp site. MPEG copyright then applies.

 12. Does also a hardware implementation of the video codec exist?

Several companies are working on special purpose hardware to support MPEG-4 video. Given the extra functionalities with respect to previous MPEG standards, it can be expected that MPEG-4 hardware implementations will have to provide more flexibility. This could for example be achieved by programmable processors or embedded cores in application specific VLSI. One of the reasons is that MPEG-4 is supporting arbitrarily shaped objects in contrast to previous block based schemes. However, rectangular shaped objects are supported as a special case in MPEG-4.

See FAQ 11 for more details on computational complexity.

 13. What is the computational complexity of the codec?

The MPEG-4 Video VM consists of several tools showing different demands in computational power. For several application fields a collection of these tools are specified as «profiles», which have different computational demands. Beyond the type of (de)coding profiles/tools used, the computational complexity depends also on image size and frame rate.

It is expected, that for small (QCIF, 176x144 pixels) video formats decoding is possible at acceptable frame rate with an average PC at IS stage (<98), enabling solutions for cost sensitive markets. Larger image formats might require special hardware accelerators or special processor extensions at the decoder, as well as at the encoder. By proper selection of MPEG-4 tools, similar or lower complexity as MPEG-1,-2 and ITU-T H.263 (at same image size and frame rate) seems possible for specific applications.

Some of the new MPEG-4 functionalities require higher computational power, but provide higher compression efficiency or new functionalities. Techniques which are considered to add computational complexity compared to the previous MPEG video standards include: shape coding of arbitrarily shaped objects, sprite generation, macroblock padding for arbitrarily shaped objects and rendering system at the decoder. Experience with previous video standards has shown that for computational intensive algorithms fast HW and SW implementations were found soon (compare e.g. DCT/IDCT or Motion Estimation).

Beside the computational complexity the memory requirement is also essential for cost sensitive markets. One of the factors determining the required memory is the number and size of video objects supported. For profiles with low complexity demands the maximum number of video objects is limited.

 14. How is segmentation handled?

To employ some new audiovisual functionalities of MPEG-4, arbitrarily shaped video objects have to be provided by a segmentation process, which can be performed offline (non-realtime) or online (realtime), by automatic tools or semi-automatically. Though some segmentation techniques are under study within and outside MPEG, the segmentation process will not be part of the MPEG-4 standard.

A lot of content is already available under the form of video objects (e.g. graphic games). For cases when segmentaion is necessary, the features will depend on the application.

For example, for realtime communication applications it is regarded to be useful to have a segmentation of the background and foreground object (i.e. the communication partner), which could be a rough segmentation performed at real time. For broadcast applications, techniques like «blue-boxing» (i.e. chroma keying) have been used for some time now at television studios and can be applied for MPEG-4 to gain pre-segmented video material. Other applications of segmentation techniques include object tracking, enabling MPEG-4 interactivity.

 15. Where are you in the process of specifying the MPEG-4 standard?

The visual parts of MPEG-4 version 1 and version 2 have been approved as International Standards. However there is still much work required for version 3 (studio profile) and version 4 (streaming profile).

16. Who is the chairperson of the video group? How can I contact him?

MPEG is composed of main groups, each one being in charge of a different key issue of the standard to come. Among them, we have : Requirements, Video, Audio, SNHC (Synthetic and Natural Hybrid Coding), Systems, Implementation, and Tests.

The Video group is chaired by Thomas Sikora (sikora@hhi.de).

17. Are there some e-mail reflectors? What are their scope and purpose? How to subscribe?

There exist an MPEG general reflector, which is the support for all MPEG general matters. In order to subscribe to this reflector, each individual has to address a request to his MPEG head of national delegation (from the National Body of his country).

Then, the different MPEG groups often settle their own reflectors, in order to coordinate the work of their members between meetings, and allow open discussions on arising problems or important issues.

Any MPEG member can subscribe to these reflectors by simply contacting the chairpersons of their corresponding groups