Source in MPEG
Leonardo Chiariglione – CSELT, Italy
For centuries the ancestors of the author who lived in the lower parts
of the Alps near the city of Turin had applied a simple idea: it was more
comfortable for everybody if the paths criss-crossing
the mountains were cobble stoned instead of leaving them in the state in which
the steps of millions of passengers had created them. It is not known whether
that work was undertaken by the free decision of those mountain dwellers or by
the local communal authority that imposed corvées on them during winter when
work in the fields was minimal. After all farmers are not known to be inclined
to share anything with anybody and those were years in which despotism,
enlightened or otherwise, ruled.
A few years ago computer people discovered that it was in (nearly)
everybody’s interest if the virtual equivalent of mountain paths – the raw
CPU – could be “cobble stoned” with an operating system that was the
result of a collective effort and that could be used by all.
Traditionally computer people have worked with data that were already
represented in, or could be easily converted to, a form that lent itself to
processing by automatic computers. Other types of data, those that reach human
ears and eyes, have a very different nature: they are intrinsically analogue. To
add difficulty they are also “broadband”, a sliding definition that depends
on the state of technology.
Processing and communication of audio and video data has been around for
a long time but invariably as ad-hoc solutions. As part of the movement
instigated by the Moving Picture Experts Group or MPEG,
audio and video have been reduced to a form that allows the necessary process to
be achieved by integrated circuits and the amount of bits reduced to such a
level that transmission is possible over today’s communication channels.
In parallel to the development of the MPEG-1,
standards MPEG has developed reference software using a process similar to the
Open Source Software (OSS), even though the details may be frowned upon by the
purists of the OSS community. It must be realised, however, that this process
had to be adapted to the rules governing International Organisation for
a traditional standards-setting organisation, under which MPEG operates.
Purpose of this paper is to recall how digitisation of audio and video was started, the motivations that led to the establishment of the Moving Picture Experts Group, the summary elements of the MPEG standards being used today, the characteristics of the MPEG “Open Source Software” process and the work under way.
It took about 400 years after the invention of the movable type, the
first example of a technology for large-scale use of Information Processing not
requiring direct human intervention, to see the invention of a technology of a
similar impact. But starting from the 1830s there has been a long string of
audio-visual Information Processing and Communication technologies made
available to mankind: to mention the most important, photography, telegraphy,
facsimile, telephony, phonography, cinematography, radio, television and
One feature of these technologies is that each of them has in general
little to share with the others. Every time one of these types of information is
processed, a special device has to be used. How different from the computer
world where processing of information is made using the same basic technology!
The theoretical groundwork to achieve the goal of unifying all types of
audio-visual information started some 15 years before the first electronic
computer was built. It was discovered that a band-limited signal (of bandwidth
B) could be sampled with a frequency ³
2B and reconstructed without error. The second step of the groundwork was achieved
some 20 years later with the definition of bounds to quantisation errors
depending on the number of bits used and the signal statistics.
Even though the Bell Laboratories, where the theoretical groundwork had
been done, made the first step of converting the theoretical groundwork into
something practical with the invention of the transistor, there was a long way
to go for practical applications. Even a “narrowband” signal like speech
that occupies the 0.3-3.4 kHz band on the telephone wire, if sampled at 8 kHz
with 8 bits/sample produced the staggering (for that time) value of 64 kbit/s.
After 15 years of experiments, bits were ready to play a role in speech
communication. In the 1960s the then CCITT (now ITU-T) adopted a recommendation
for the digital representation of speech (this actually defined two such
representations, called m-law and A-law). Both had a
sampling frequency of 8 kHz, but the quantisation law was 7 bits/sample for m-law
and 8 bits/sample for A-law, both non linear to take into account the
logarithmic nature of human ear perception. One should not, however, attach too
much meaning to this digitisation of speech. The scope of application was the
trunk network where multiplexing of telephone channels was more conveniently
done in digital than in analogue. Nothing changed for the end users.
More interesting was Group 3 facsmile (Gr. 3 fax). An A4 page scanned by
the 1728-sensors CCD of Gr. 3 fax in fine resolution mode (same resolution
horizontally and vertically) holds about 4 Mbits. With the “high speed”
modems of that time (9.6 kbit/s) it would have taken about 20 minutes to
transmit a page, but a simple compression scheme (sending “run lengths”
encoded with variable-length code words instead of all blacks and whites and
some bidimensional extensions) brought down transmission time to 2 minutes.
Digitised speech was an effective transmission method for the trunk
network, but the local access remained hopelessly analogue. The advent of ISDN
in the 1980s prompted the development of standards for speech compression with
the bandwidth of 7 kHz, sampled at 16 kHz with a higher number of bits/sample
(e.g. 14) than m-law
and A-law. Compression was needed because this kind of speech would generate in
excess of 200 kbit/s. Reduction to 64 kbit/s and below (compression ratio of
about 4) was possible preserving high speech quality. This device used DSPs but
never gave rise to a mass market.
Video presented a bigger challenge if one thinks that its bandwidth is 3
orders of magnitude more than speech’s and involves more than one signal.
Digital television is obtained by sampling the video luminance Y at 13.5 MHz and
the 2 chrominance differences R-Y and B-Y at 6.75 MHz with 8 bits/sample. The
total bitrate of 216 Mbit/s could be reduced to about 166 Mbit/s by removing the
non-visual samples. Such high bitrates were unsuitable for any practical
transmission medium and were used only for the digital tape (so-called D1) and
transmission in the studio.
The first attempt to apply bitrate reduction to reduce this high bitrate
to 1.5/2 Mbit/s to fit in the American and European speech multiplexers of 24
and 32 digital speech channels, respectively, was (and still largely is)
considered too challenging. Therefore the input bitrate was first reduced by 2:1
subsampling the video signal in the horizontal and vertical (actually temporal,
as the video signal is interlaced) directions and by further subsampling the
chroma differences. Then two simple techniques called DPCM and Conditional
Replenishment were used. A second generation of codecs, using more sophisticated
algorithms (DCT and motion compensation), provided acceptable quality at 384
kbit/s and, by further 2:1 subsampling the video signal in the horizontal and
vertical directions, at 64/128 kbit/s, the bitrate of ISDN.
Going back to audio, in the early 1980s Philips and Sony developed the
Compact Disc, a read-only digital storage device that employed laser
technologies (a comparable system was developed at about the same time by RCA,
but was short-lived). This was designed having stereo music in mind: two audio
channels sampled at 44.1 kHz with 16 bits/sample for a total bitrate of 1.41
Lastly in the USA (through the Advanced Television initiative) and in Europe (through the development of an industrial company) steps were made towards the development of a market of digital high-definition television.
The author’s work experience has been in a telco research establishment. The telecommunication industry used to be characterised by considerable innovation in the network infrastructure where investments were not spared and by reluctance to invest in terminal equipment. This was in part because terminals were alien to its culture (even though the more enlightened individuals were aware that unless there were new digital terminals there would be no much need for network innovation) and in part because the terminal was technically and legally outside of its competence. The attitude was “Let the manufacturing industry do the job (of developing terminals)”. Unfortunately the (telecom) manufacturing industry, accustomed to being pampered by fat and risk less orders from the telcos based on solid CCITT standards, had no desire to make investments on something that was based on the whim of end users they did not understand. The consumer electronics industry, that knew end users better and was accustomed to make business decisions based on their judgement of the validity of the products, still considered telecommunications terminals out of its interest. This explains why, at the end of the 1980s, there was virtually no end-user equipment based on compression technologies, with the exception of facsimile. To make cheap and small terminals one would have needed ASICs capable of performing the sophisticated signal processing functions needed by compression algorithms.
The author saw the attempts that were being made by both Philips and RCA in those years to store digital video on CDs for interactive applications (called CD-i and DVI, respectively) as an opportunity to ride on a mass market of video compression chips that could be used for videococommunication devices. What was required was the replacement of a laborious and unpredictable “survival of the fittest” market approach of the Consumer Electronics world with a regular standardisation process.
So started MPEG in January 1988 with the addition to the mandate a few months later of audio compression and the function needed to multiplex and synchronise the two streams (called “systems”). In 4 years the first standard – MPEG-1 – was developed. Interestingly, none of the two original target applications – interactive CD and digital audio broadcasting – are currently large users of the standard (videocommunication has not become too popular either). On the other hand MPEG-1 is used by tens of millions of Video CDs and MP3 players.
For the purpose of this paper there is one feature of MPEG-1 that is remarkable and that is that MPEG-1 was the first audio-visual standard that made full use of simulation for its development. For the author, whose laboratory had taken part in the development of the 1.5/2 Mbit/s videoconference codec using three 12 U racks of electronic and minimal support from computer simulation, this was an incredible experience. Even more significant for its future implications, was the fact that MPEG-1 – a standards in five parts – has a software implementation that appears as “part 5” of the standard (ISO/IEC 11172-5).
In July 1990 MPEG started its second project, MPEG-2. While MPEG-1 was a very focused standard for well-identified products, MPEG-2 addressed a problem everybody had an interest in: how to convert the 50-year old analogue television system to a digital compressed form in such a way that the needs of all possible application domains were supported. This was achieved by developing two systems layers. One, called the MPEG-2 Transport Stream (TS), was designed for error prone environments (such as cable, satellite and terrestrial) target of the transmission application domains. The other, called MPEG-2 Program Streams (PS) was designed to be software friendly and was used for DVD. The idea was that MPEG-2 would become the common infrastructure for digital television; indeed something that has been successfully achieved if one thinks that at any given moment there are more bits carried by MPEG-2 TS than by IP. The title of the standard “Generic coding of moving pictures and associated audio” formally conveyed this intention. By the time MPEG-2 was approved (November 1994) the first examples of real-time MPEG-1 decoding on popular programmable machines had been demonstrated. This was, if there had been a need for it, an incentive to continue the practice of providing reference software for the new standard (ISO/IEC 13818-5).
In July 1993 MPEG started its 3rd project, MPEG-4. The original goal is reflected in the original title of the project “Very low bitrate audiovisual coding”. Even though no specific mass-market applications were in sight, it was sensed that the digitisation of narrowband analogue channels, such as the telephone access network (Internet was not a mass phenomenon, yet), would provide interesting opportunities to carry video and audio at a bitrate definitely lower than 1 Mbit/s, roughly the lowest bitrate value supported by MPEG-1 and MPEG-2. For that bitrate range it was clear that a decoder could very well be implemented on a programmable device, unlike what had happened to the other MPEG standards. It could even happen that there would eventually be more software-based than hardware-based implementations of the standard. This was the reason why the reference software, part 5 of MPEG-4 (ISO/IEC 14496-5) has the same, normative, status as the traditional text-based descriptions of the other parts of MPEG-4.
In the event MPEG-4 became a very comprehensive standard as signalled by its current title “Coding of audio-visual objects”: the standard supports the coded representation of individual audio-visual objects whose composition in space and time is signalled to the receiver. The different objects making up a scene can even be of different origin: natural and synthetic.
This does not mean, however, that a particular implementation of the standard is necessarily “complex”. An application developer may choose among the many profiles – dedicated subsets of the full MPEG-4 tools – the one to use to develop his application.
For all these reasons it is expected that MPEG-4 will become the infrastructure on top of which the currently disjointed world of multimedia will flourish.
Readers may wonder why, if the coding algorithm is implemented in software, there was a need at all to develop a standard. Shouldn’t it suffice to download on your machine the code that allows you to decode the particular algorithm that was used to produce the bitstream of your interest?
In the early days of development of MPEG-4 this question used to be asked very often but today, with an ever-expanding use of MP3, it is easier to understand the benefits of a standard: a playback device is not necessarily connected to the network, it may be on a broadcast channel or it is just a stand-alone or portable device, the devices can use many different CPUs for which it could be too costly to develop playback codes, the hardware may use an ASIC for the audio-visual decoding that is not upgradeable or it may have been designed to run just with the amount of RAM that the standard algorithm requires and not another. In other words it is simpler to have a common standard on which business opportunities can multiply, instead of having to struggle with incompatibilities all over the place.
Lastly it is to borne in mind that compression coding is not a transparent operation. In general the lower the bitrate used, the more negatively the quality is affected. Transcoding from an algorithm to another may simply produce garbage. Also the idea that compression technology keeps on improving is a myth. Only now, after many years is MPEG re-issuing a Call for Proposals for video compression technologies because there is a feeling that there may be something worth considering. For audio MPEG is still at the level of issuing a Call for Evidence because the group does not even have a feeling that there may be something worth considering.
The very size of the standard has transformed the development of the reference software into a huge undertaking. It is therefore interesting to see how such a project was managed. These are the most important features:
The condition was set that any component of the standard, both normative (decoder) and informative (encoder), had to be implemented in software. For any proposal to be accepted and adopted, it was a condition that source code be made available and copyright released to ISO.
For each portion of the standard a manager of the code was appointed: a representative of Microsoft and MoMuSys for video in C++ and C, respectively, Fraunhofer for natural audio, MIT for Structured Audio, ETRI for Text-to-Speech interface, Optibase for the so-called "Core" (the code portion on which all media decoders and other components plug in), Apple for the so-called MPEG-4 File Format etc.
Also, for each portion of the standard a manager of experiments was appointed. This manager integrated the code of the accepted tools in the existing code base.
Unlike traditional open source software (OSS) projects only MPEG members could participate in the project. Discussions were usually done (and the practice still continues) on email reflectors that are open to non-MPEG members.
is a place where new ideas are forged continuously. One idea was generated by
the fact that while the reference code is intended to be “reference”
(normative or informative as the case may be), it is not intended to be
efficient. Therefore since December 1999 MPEG is working on a new part of MPEG-4
that will contain optimised code (to start, optimised ways to search for motion
vectors, a computationally very expensive part of the standard). Any
implementer can take this code and use it free of copyright. The
condition has been set, however, that such optimised code should not require
second idea, launched in October 2000, has led to the decision of developing an
MPEG-4 “reference hardware description”. It is expected that this will
further promote the use of MPEG-4 as the basic multimedia infrastructure in both
software and hardware.
The text of the so-called “copyright disclaimer” that is found on all MPEG-4 software modules is given below.
"This software module was originally developed by <FN1> <LN1> (<CN1>) and edited by <FN2> <LN2> (<CN2>), <FN3> <LN3> (<CN3>), … in the course of development of the <MPEG standard>. This software module is an implementation of a part of one or more <MPEG standard> tools as specified by the <MPEG standard>. ISO/IEC gives users of the <MPEG standard> free license to this software module or modifications thereof for use in hardware or software products claiming conformance to the <MPEG standard>. Those intending to use this software module in hardware or software products are advised that its use may infringe existing patents. The original developer of this software module and his/her company, the subsequent editors and their companies, and ISO/IEC have no liability for use of this software module or modifications thereof. Copyright is not released for non <MPEG standard> conforming products. <CN1> retains full right to use the code for its own purpose, assign or donate the code to a third party and to inhibit third parties from using the code for non <MPEG standard> conforming products. This copyright notice must be included in all copies or derivative works. Copyright Ó 199_".
N.B.: <FN> = First Name, <LN> = Last name, <CN> = Company Name
Currently MPEG is engaged in the final stages of development of MPEG-7
“Multimedia Content Description Interface”, a standard to “describe”
audio and video information, be it at the level of a complete movie or as a
single object in a picture. The standard will be
approved in July 2001. Also for this standard there is a huge body of reference
code that has been developed according to very similar rules as MPEG-4.
In June 2000
MPEG has started a new project called MPEG-21 “Multimedia Framework”. In
this context, MPEG will develop and integrate, in collaboration with other
bodies, all the technologies that are needed for electronic commerce of digital
content on the network.
The key technologies that are needed by this project are:
Digital Item Declaration: a uniform and flexible abstraction and interoperable schema for declaring Digital Items
Content Representation: how the data is represented as different media
Digital Item Identification and Description: a framework for identification and description of any entity regardless of its nature, type or granularity
Content Management and Usage: the provision of interfaces and protocols that enable creation, manipulation, search, access, storage, delivery, and (re)use of content across the content distribution and consumption value chain
Intellectual Property Management and Protection: the means to enable content to be persistently and reliably managed and protected across a wide range of networks and devices
Terminals and Networks: the ability to provide interoperable and transparent access to content across networks and terminal installations
Event Reporting: the metrics and interfaces that enable Users to understand precisely the performance of all reportable events within the framework.
Of particular interest for this paper is item 5, Intellectual Property Management and Protection. Since MPEG-2 times MPEG has been mindful of the need to provide solutions for those – content and service providers – who attach monetary value to content. So far the solutions provided by MPEG have been at the level of enabling the use of proprietary protection technologies, but these have the disadvantage that consumption of protected content is no longer transparent to the user, even in the case he is willing to adhere to the conditions set by the rights holder. This is the reason why MPEG is now developing a solution to provide “interoperability at the level of protected content”.
In the 15th century “Letter patents”, were already in use in Venice and Florence, but unknown in Mainz. Therefore the only way for Johannes Gutenberg to protect his invention was by hiding the secrets from everybody including his financial backers and this eventually led him to ruin. In the 19th centuries all audio and video related inventions were protected by patents. This continued in the 20th century while the centre of gravity progressively shifted from the individuals to the companies hiring them. When the prospects of using digital technologies became clear all companies and organisations started making or funding research in audio and video coding. Today the number of patents is counted by the thousands.
When MPEG started its work in audio-visual coding it became immediately evident that either MPEG played by the existing rules in the audio-visual world that standards usually require patents for their implementations or it would have been impossible to produce any standard of practical value. This not to mention the difficulty for MPEG, with no funds of its own, of becoming aware of patents required to implement its standards.
The problem of patents in standards is of course well known to the three main international standards organisations IEC, ISO and ITU. They have developed the following general policy:
no patent should be required to implement a standard or
the rights holder should release the rights or
the rights holder should make a statement where he engages to give
licensee to his patent “on fair and reasonable terms and non
therefore developed a policy for the development of its standards that
deliberately neglects consideration of patents and only seeks to achieve the
The result has been that MPEG standards usually require a large number of
As many as 100 different standards are reportedly needed to implement an MPEG-2 decoder. Because of the high interest in a “one-stop shop” for MPEG-2 patents, a private organisation giving license of most MPEG-2 patents has been set up. Interestingly, the amount to be paid for the patents in an MPEG-2 decoder has remained constant, while the number of relevant patents has increased.
The same is happening for MPEG-4. The MPEG-4 Industry Forum (http://www.m4if.org/) has been established with the goal of kicking off patents pools for MPEG-4 profiles. Of course the MPEG-4 case is much more complex as many business models require decoder download. A similar organisation for MPEG-7 is likely to be set up very soon.
Through a completely different process MPEG – as representative of the
world of audio and video – has come to a similar conclusion as the world of
data processing as regards the need to provide open solutions expressed in
software (or, as the case may be, hardware) to technologies that are considered
part of the “infrastructure”. The outstanding difference is that while the
data processing world likes to define fully open technologies, MPEG bows to the
reality of the world of digital audio and video where patents are found all over
the place. Therefore reference software (and reference hardware description) is
copyright-free but, in general, not patent-free.
MPEG-21, a project to define an ecosystem of content on the network,
places standardisation of the infrastructure one level higher compared to what
has been done so far. As the provision of reference software, be it normative or
informative, is now integral part of MPEG standards, it can be expected that
considerable challenges lie ahead when MPEG will need to accommodate libertarian
spirits with other more mundane considerations. But the author believes that it
is better to deal with this problem in a group of technical experts than in a
court of law or in a parliament.
3 is called 1 minute facsimile because it is usually employed with a
vertical resolution that is ½ the horizontal resolution
More details on the steps that led to the establishment of MPEG can be found
in “Chiariglione, L.: MPEG - From the conception of the idea to its
ConfTele99, Sesimbra, 1999/04/15”
In the case of MPEG-4 this has been formulated as in http://www.cselt.it/mpeg/public/mpeg-4_procedures.htm