Riding the Media Bits  chiariglione.org
Riding the Media Bits
Digital Media Project
Digital Media Manifesto
Leonardo
Acronyms
Site Map
Home

Inside MPEG-2


e-mail

 Last update: 2005/03/08

 

An overview of the technical content of MPEG-2.

 
 

Unlike MPEG-1, where a reference model need only consider the decoder, the full extent of the MPEG-2 standard requires consideration of the complete chain from source to destination. The figure below gives a schematic representation of the scope of the MPEG-2 standard. 

Model of the MPEG-2 standard

The Demultiplexer, Video and Audio decoding blocks correspond to part 1, 2 and 3 (or 7) of ISO/IEC 13818 (although the most common use of the standard is with MPEG-1 Audio Layer II). The DSM-CC block, not in the MPEG-1 model, is specified in part 6. Additionally part 9 provides the jitter specification of the interface between the receiver and the delivery system. Part 4 is conformance testing for parts 1, 2, 3 and 7. Similarly part 5 is the Reference Software for parts 1, 2, 3 and 7 (no reference software exists for part 6). Part 10 is conformance testing of DSM-CC. Recently a new part 11 has been added providing specification for an extension of conditional access.

MPEG-2 Systems is a great piece of engineering work designed to satisfy a a broader range of requirements than any other groups had ever tried to deal with in a standards committee. The figure below illustrates the basic multiplexing approach for a television program composed of a single video stream and a single audio stream. 

The video and audio data are encoded according to MPEG-2 Video and MPEG-2 Audio or AAC, and system level information is added to the resulting compressed streams. These streams in packetised form are called Packetised Elementary Streams (PES). The PESs can then be further combined to form either a PS or a TS. 

The PS is similar to the MPEG-1 Systems Multiplex. It results from combining one or more PESs, all having a common time base, into a single stream. The PS is designed for use in the same relatively error-free environments of MPEG-1 and is suitable for software processing applications. PS packets may be of variable and relatively great length. The TS combines one or more PESs with the same or different time bases into a single stream, a requirement for the carriage of more than one TV programs that in general have been independently produced.

The TS is designed for use in environments where errors are likely, such as storage or transmission in lossy or noisy media. TS packets are 188 bytes long. This choice was made, for lack of better reasons, because at the time it was thought that having a packet length related (188=4x47) to ATM Adaptation Layer 1 (AAL1) expected to be used for real-time video transmission, would be an advantage. 

Part 2 extends MPEG-1 by supplying several tools. The most important ones are those allowing encoding of interlaced video. When video is already in progressive form, MPEG-2 Video coding falls back to MPEG-1. Exploitation of interlace information provides an improvement of compression efficiency of about 20%. MPEG-2 Video also provides tools for various types of scalability, namely SNR and spatial scalability. SNR scalability is a functionality that is provided by multiple layers such that an enhancement layer carries DCT coefficients quantised with improved accuracy. In spatial scalability there is a basic layer and an enhancement layer. The latter carries information that can be used to improve the spatial definition of the picture. 

Part 3 extends MPEG-1 from the stereo to the multichannel case while preserving backwards compatibility with MPEG-1 Audio. This means that an MPEG-1 Audio decoder is able to extract the stereo part from an MPEG-2 Audio bitstream. The opposite is also true. An MPEG-2 Audio decoder is capable of decoding an MPEG-1 Audio bitstream. MPEG-2 Audio also provides an extension to lower sampling-frequency for MPEG-1 Audio.

Parts 4 and 5 correspond to those of MPEG-1. 

Part 6 has the title "Digital Storage Media Command and Control (DSM-CC)" and supplies protocols to establish audio-visual sessions on heterogeneous networks (this is the User to Network part of the standard) and to control audio-visual streams in both interactive and broadcast environments (this is the User to User part of the standard). For interactive environments, the DSM-CC User-to-User protocol allows clients to access a collection of distributed objects (such as files, directories and streams) located on remote servers. For broadcast environments, the DSM-CC User-to-User Object Carousel protocol allows simultaneous access to broadcast objects by various clients (see figure below). 

The DSM-CC model

In the DSM-CC model, a stream is sourced by a Server and delivered to a Client, both considered to be Users of the DSM-CC network. DSM-CC defines a logical entity called the Session and Resource Manager (SRM) which provides a (logically) centralised management of the DSM-CC Sessions and Resources.

Part 7 has the title "Advanced Audio Coding (AAC)" and supplies an alternative, non MPEG-1 Audio compatible, way to encode stereo andmultichannel audio. AAC has 3 profiles: Main Profile, Low Complexity (LC) Profile and Scaleable Sampling Rate (SSR) Profile. The Main Profile provides the best quality but is more complex than the LC Profile. The SSR Profile has lower complexity than the Main and LC Profiles. Additionally the SSR Profile can provide a frequency scaleable signal.

Part 9 has the title "Real Time Interface for System Decoders (RTI)" and is the specification of the RTI to Transport Stream decoders which may be utilised for adaptation to all appropriate networks carrying Transport Streams (see figure below).

Part 10 is the conformance testing for DSM-CC. 

Part 11, approved in 2003, extends the functionality of content protection. This will be described in more detail later.

 

 
 

Send an e-mail to commentSee the communication policy

 

Copyright © 2003 chiariglione.org