INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11 N2431
October 1998/Atlantic City
| Source | Audio Subgroup |
| Title | MPEG Audio FAQ Version 9 |
| Authors | D. Thom, H. Purnhagen, and the MPEG Audio Subgroup |
It is officially called "Multimedia Content Description Interface", a means of attaching meta-data to multimedia.
What is the general relationship to previous MPEG efforts?
Where MPEG-1 and -2 concentrated almost entirely on compression, MPEG-4 moved to a higher level of abstraction in coding objects and using content-specific techniques for coding content. MPEG-7 moves to an even higher level of abstraction, a cognitive coding, some might say.
In principle, MPEG-1, -2, and -4 are designed to represent the information itself, while MPEG-7 is meant to represent information about the information (although there are areas common between MPEG-4 and -7 --see below). Another way of looking at it is that MPEG-1, -2, and -4 made content available. MPEG-7 allows you to find the content you need.
What are potential connections between MPEG-7 tools and existing tools?
There are many possible connections between MPEG-4 tools and MPEG-7. Most of the content-specific tools contained in MPEG-4 have great potential because a model for the content is already specified: by choosing a method of coding, one selects the features that are important to the material. For example, if one encodes a sound by using sinusoidal tracks, then MPEG-7 asks which of those tracks are most significant in distinguishing the sound. It is a matter of abstraction up to the point of measuring similarity.
The Structured Audio tools also have a strong relationship to MPEG-7. They synthesize a sound from an already-existing description. The challenge in this case is to reach a suitable level of abstraction. There can be many, very different, descriptions which can be synthesized into perceptually indistinguishable sounds. It is clear that the models (e.g. of a musical instrument or an acoustic space) used directly within structured audio will not be sufficiently abstract or constrained. Therefore, it is an open research question as to how to identify, select, and build a structured audio model for MPEG-7.
What are the general applications for MPEG-7?
It is often said that MPEG-7 will make the web more searchable for multimedia content than it is for text today. This would also apply to making large content archives accessible to the public (or to enable people to identify content to buy). The same information used for content retrieval may also be used by agents, for selection and filtering of broadcast or "push" material. Additionally, the meta-data may be used for more advanced access to the underlying data, by enabling automatic or semi-automatic multimedia presentation or editing.
Although still an expanding list, we can envision indexing music, sound effects, and spoken-word content in the audio-only arena. MPEG-7 will enable query-by-example such as query-by-humming. In addition, audio tools play a large role in typical audio-visual content in terms of indexing film soundtracks and the like. If someone wants to manage a large amount of audio content, whether selling it, managing it internally, or making it openly available to the world, MPEG-7 is potentially the solution.
For more details, please see the MPEG-7 Applications Document (N2426) for a non-exhaustive list of representative examples. We are actively seeking input in this area, so if you are considering a potential MPEG-7 application that is not in the document, please contact the editor.
What are the foreseen elements of MPEG-7?
MPEG-7 work is currently seen as being in three parts: Descriptors (D's), Description Schemes (DS's), and a Description Definition Language (DDL). Each is equally crucial to the entire MPEG-7 effort.
Descriptors are the representations of low-level features, the fundamental qualities of audiovisual content which may range from statistical models of signal amplitude, to fundamental frequency of a signal, to an estimate of the number of sources present in a signal, to spectral tilt, to emotional content, to an explicit sound-effect model, to any number of concrete or abstract features. This is the place where the most involvement from the signal processing community is foreseen. Note that not all of the descriptors need to be automatically extracted--the essential part of the standard is to establish a normalized representation and interpretation of the Descriptor.
We are actively seeking input on what additional potential Descriptors would be useful. Description Schemes are structured combinations of Descriptors. This structure may be used to annotate a document, to directly express the structure of a document, or to create combinations of features which form a richer expression of a higher-level concept. For example, a radio segment DS may note the recording date, the broadcast date, the producer, the talent, and include pointers to a transcript. A classical music DS may encode the musical structures (and allow for exceptions) of a Sonata form. Various spectral and temporal Descriptors may be combined to form a DS appropriate for describing timbre or short sound effects.
Any suggestions on other applications of DS's to Audio material are very welcome. The Description Definition Language is to be the mechanism which allows a great degreed flexibility to be included in MPEG-7. Not all documents will fit into a prescribed structure. There are fields (e.g. biomedical imagery) which would find the MPEG-7 framework very useful, but which lie outside of MPEG's scope. A solution provider may have a better method for combining MPEG-7 Descriptors than a normative description scheme. The DDL is to address all of these situations.
What is the current time frame for MPEG-7?
How do I join the MPEG-7 Audio Issues AHG?
Send a message with "subscribe" in the Subject line to: mpeg-7-aud-request@meta-labs.com
Where can I find out more about MPEG-7?
You may want to examine the official MPEG site (http://www.cselt.it/mpeg/), which has all of the latest public MPEG documents. GMD-Darmstadt has done an excellent job of organizing and presenting the relevant MPEG-7 documents, and there are plans to expand the amount of information there. (http://www.darmstadt.gmd.de/mobile/MPEG7/index.html). There are a limited number of official MPEG-7 documents at (http://www.meta-labs.com/mpeg-7-aud/) in Portable Document Format, namely the MPEG-7 Context & Objectives document (N2460), the Requirements document (N2461), the Applications document (N2462), and the Evaluation Procedures document (N2463).
Where do I go if I have more questions?
If you have any questions, please do not hesitate to join the MPEG-7 Audio Issues AHG and direct your questions to the reflector. Chances are, there are others who share your concerns, and the AHG exists specifically to identify those concerns. If you must reach an individual, Adam Lindsay <adam@riv.be> will be happy to answer any questions regarding MPEG-7.