Constantly creating new ways for digital audio and video
by Leonardo Chiariglione – Convenor of ISO/IEC JTC 1/SC 29,
Coding of audio, picture, multimedia and hypermedia information
WG 11, Coding of moving pictures and audio
The media industry is riddled with acronyms sourced by the Moving Picture Experts Group (MPEG). MPEG-1 is used by Video CD and contains MP3, which is the compression scheme used to transfer audio files via internet to store in portable players and digital audio servers. MPEG-2 is used in digital television (DTV) set top boxes and Digital Versatile Disc (DVD). MPEG-4 Visual is used in digital cameras, video recorders and cell phones, and contains advanced audio coding (AAC), a part of MPEG-4 which compresses audio files twice as well as MP3 and high efficiency HEAAC which offers further compression. Advanced video coding (AVC) found in MPEG-4 part 10 is the best video compressor known.
This babel of acronyms notwithstanding, the typical image of MPEG in the industry is one of an indefatigable group constantly bent on finding new ways to produce audio and video streams – not to mention synthetic audio and visual streams – requiring fewer bits, while preserving the original quality of the signal.
In its 18 years of activity, MPEG has done a lot of the above and also produced a number of standards that were not immediately related to audio and visual compression, but were required for realizing the use of the compression technology. A notable example is the Digital Storage Media Command and Control (DSM-CC), essentially a collection of application protocols for use in digital television dating back to 1995.
A multimedia framework
In 1999, MPEG embarked on a project designed to provide all the technologies required to manage the life cycle of digital media in a value chain. The project is called MPEG-21 and has an appropriate number ISO/IEC 21000, Multimedia Framework.
The first part of the standard is a technical report that lays down the scope of the MPEG-21 project, but the technical foundations are found in part 2, Digital Item Declaration (DID). A Digital Item (DI) is a structure of information that contains the resources (i.e. the MPEG audio, video, etc.), the identifiers of resources, metadata, Digital Rights Management (DRM) information, licenses, etc. A DI is expressed using the W3C eXtensible Markup Language (XML), a recommended general-purpose markup language for creating special-purpose markup languages, capable of describing many different kinds of data.
The practical advantage of identifiers is being recognized in the analogue space by the proliferation of numbering schemes such as the ISO 2108, International Standard Book Number (ISBN) used to identify printed books or the recently approved ISO 15706, International Standard Audiovisual Number (ISAN) used to identify audiovisual works. Identifiers are simply indispensable in digital space.
MPEG-21 part 3, Digital Item Identification (DII) specifies how to uniquely identify DIs and their component elements. DII, however, does not specify new identification systems for elements for which identification and description schemes already exist, such as those mentioned above.
In many practical cases, particularly when dealing with delivery of content to end users, moving content across the value chain requires the use of technologies to manage and protect the Intellectual Property (IP) embedded in media content. MPEG calls these technologies Intellectual Property Management and Protection (IPMP). MPEG-21 part 4, IPMP components, provides a broad range of technologies that can be assembled to achieve specific goals in this area.
MPEG and IP management
Another important tool for IP management is provided by MPEG part 5, Rights Expression Language (REL). As in the physical world the buyer becomes the owner of a car on the basis of a purchase contract where terms and conditions binding the seller and the buyer are described, in the virtual space of bits it is necessary to have a technology whereby terms and conditions that appear in a “license” issued by the seller to the buyer of the seller’s song can be described. In this way the seller could decide to license the right to listen to his song once for 10 cents, or the right to listen 10 times for 50 cents, and for 1 euro for the right to listen to it indefinitely.
The above case is straightforward, but there are numerous more complicated cases. Assume that the terms and conditions the seller wants to express contain “you may copy this song 3 times.” The meaning of the word “copy” may be shared by humans (actually it is not), but for a machine this is certainly not the case. There is a need to provide a clear and unambiguous description of the meaning of the words that are used in a rights expression (the semantics) so that the person who issues the license to a content item can be confident that all devices conforming to the MPEG-21 standard when receiving the license will behave in a predictable fashion.
This is the purpose of MPEG-21 part 6, Rights Data Dictionary.
The MPEG-21 standard has currently 18 parts with several amendments under way, and it probably would be too bold an assumption to expect that a significant number of readers of this paper definitely would not have turned to another paper before reaching the end of part 18. Instead I would like to address another important recent area of MPEG standards, namely ISO/IEC 23000 MPEG-A Multimedia Application Formats.
This new family of MPEG standards has been triggered by the consideration that while MPEG has developed an impressive portfolio of digital media related technologies that has led it to become the reference standards committee for the media industry, integration of different MPEG technologies has been left to individual implementers. In some cases this may lead to shortcomings, such as the long period of time it may take to go from an MPEG standard to a product or the incompatibility between different implementations.
Music, photos and more…
MPEG-A attempts to respond to these shortcomings. The first standard is called Music Player Application Format and is driven by the desire to enable users to achieve an augmented experience of their sound resources by providing an “extended” MP3 format. This is achieved by making a standard combination of certain MPEG technologies, foremost the MP3 Audio compression, but also the MPEG-4/MPEG-21 file formats and the ID3 (a widely used set of metadata for describing MP3 songs) subset of MPEG-7 metadata. Additionally, there is the related technology – JPEG still-picture compression. The MPEG music player has already achieved International Standard status and several extensions are being considered, including the use of technologies designed to enable the distribution of protected songs, a well-known problem faced by the music industry.
MPEG is in the process of developing another MPEG-A standard, the MPEG photo player. This is driven by the needs of people using digital photo cameras to navigate the multitude of photos they generate. In this case, too, MPEG has drawn from its toolkit and built an effective solution. The key technology is again the MPEG-4 file format, supplemented by a number of technologies drawn from the MPEG flagship metadata standard MPEG-7, especially from MPEG-7 Visual, that enable an effective way to describe the visual properties of an image. Added to these are JPEG and EXIF (Exchangeable Image Format), the latter available in many photo cameras.
Expect MPEG to continue grinding audio and video bits, and also expect more news from MPEG about making its audio and video standards more useful for the coming digital media era.
About the author
obtained his Ph D degree from the University of Tokyo in 1973. During his professional career he launched several initiatives, including MPEG in 1988, and his last the Digital Media Project in 2003. He currently advises a number of companies in the area of digital media.