The vision and the role of MPEG-4 in the future of multimedia
Leonardo Chiariglione, CSELT - Italy

 The multimedia world of today did not happen the way it was thought it would happen. Instead of people receiving Gbit/s of multimedia data through optical fibres, today people receive digital television programs through satellite and cable, talk and exchange messages and files on mobile phones, watch movies fron DVD, find all sorts of information and entertain all sorts of relationships on the web, listen to music compilations downloaded from the web, watch postage stamp size video from the web and play computer games on game consoles etc.

Convergence - the much abused word – is nowhere to be seen. The world is populated of vertical systems where proprietary technologies abound. This is all the more surprising if one think that the basic information units – audio and video – are technologically all the same.

Started in July 1993 the MPEG-4 has grown to a very comprehensive and industry-neutral set of tools capable of satisfying the needs of the multimedia world:

  1.  it is delivery systems and transport agnostic, so that users – both content providers and end users – can effectively abstract from the layers lower than and including transport;

  2.   it provides a full set of compression tools for audio (speech and music) and video from very low to very high bitrates supporting a multiplicty of functionalities;

  3.  it provides tools to represent special types of synthetic audio-visual information, such as synthetic music, character strings annotated with other information, human faces and bodies;

  4.  it provides efficient tools for compressing time-varying 2D and 3D objects;

  5.  it enables bit-efficient composition in a 2D or 3D space of different objects;

  6.  it comprises a framework supporting Management and Protection of content that is being extended to provide interoperability at the level of protected content.

Therefore MPEG-4 is capable of providing the technology platform on top of which the world of multimedia can flourish. This is already happening but there is a long way to go.

Some examples:

  1. Fixed line terminals connected by ISDN or ADSL can receive high-quality moving pictures and audio in streaming mode, but this should also be possible on mobile terminals which can only use a few tens kbit/s;

  2. Rights holders would like to exploit the benefits of music distribution over the web to deliver high-quality audio on mobile and portable devices without losing in such a way their rights are not compromised;

  3.  The chimera of offering web services on TV sets has attracted many companies which have invested resources with no results because television cannot be extended with an alien paradigm, it can only be extended with a compatible multimedia paradigm.

Unfortunately the fact that the premises are there is a-priori no guarantee that things will happen, because industries and companies within industries have the tendency to operate with remarkable shortsightedness. Some examples:

  1. VRML has largely failed because the size of VRML files, where information is encoded with characters, was too big to be carried by today’s Internet. Still 3GPP, an initiative to develop specifications for 3rd generation mobile networks, is adopting SMIL – which again uses  characters to describe media composition – with the justification that bit efficiency is not important as composition information is used only occasionally and in any case is small. May be so on the devices evolving from today’s cellphones, but there is no reason why for PDAs with larger screens which are likely to appear at about the same time the two assumptions will hold. Therefore we are likely to find two incompatible types of devices which will artificially segment the market and undermine the chances of success of an enviroment for which concerns are being loudly raised.

  2. The desire of rights holders to retain control of their assets can only be shared, but the desire to create walled gardens where users will enter to consume protected content of only one source is not. MP3 has shown that consumers have plenty of technology that allow them to access and consume music that is for free and undistinguisheable from music that is purchased. The idea that consumers will leave free content based on a technology that offers total interoperability to move to paid-for content based on technologies that create walled gardens is so naïve to border insanity.

Being aware of the dangers is one way to avoid them. I am sure that, may be not at the first try, industries and companies will eventually see the shortsightedness of creating islands of products, services and applications and will fully embrace the full MPEG-4 technologies. Besides the advantage of accessing technological tools that have been designed to operate separately to satisfy individual application needs and still can be combined to provide more sophisticated applications because they has been designed to allow that, MPEG-4 users have the assurance that MPEG-4 versioning will keep the technology moving, either because existing audio and video compression tools will be upgraded – as the new 3D object compression and the ongoing audio call for evidence and video call for proposals show - or because new system-level functionalities will be added – as the character-to-binary XMT compiler encompassing SMIL and X3D and the Multiuser World call for proposals show.

In addition to this MPEG-4 users will benefit from the soon-to-be-completed developments in the MPEG-7 area that will provide users with the ability to innovate the way content is accessed and consumed and in the MPEG-21 area that will create the fundations of new forms of content usage for a networked society for which the world is indeed a village.