| Riding the Media Bits | chiariglione.org | ||
|
The development of MPEG-2 - Part B |
|
||
|
Last update: 2005/03/08 |
|||
|
|
|||
| The steps that led to the development of MPEG-2 Audio, AAC, Systems, DSM-CC and RTI. | |||
|
|
|||
|
As with MPEG-1, the Audio work in MPEG-2 took a different turn from its original direction. MPEG-1 Audio already provided an excellent way to compress stereo audio, exactly what many broadcasters were thinking of providing as a first step in their soon-to-come digital services. But the future would clearly lie with a further enhancement of the user experience, that would be provided by multichannel audio services. For a Service Provider (SP) it made a lot of sense to start with MPEG-1 stereo sound and upgrade it later to a multichannel audio service that could still be received by the existing population of MPEG-1 Audio receivers, even though the latter would get only stereophonic, not multichannel sound. This was the same argument that was made by people who wanted to have a scalable MPEG-2 Video. Why was the audio argument accepted and not the apparently similar video argument? The answer to this question has many facets. On the one hand, there was a matter of personalities involved in the discussions in the two groups. On the other, there was the obvious consideration that the video part of a program would require, in general, one order of magnitude more bits than the audio part and, therefore, a slight inefficiency in the use of the total program bitrate for the audio could be tolerated, while for video inefficiency would come at too high a price to pay. At the Haifa meeting the decision was made to adopt the requirement that MPEG-2 Audio be backward compatible with MPEG-1 Audio. This requirement seemed to considerably restrict the range of technologies that could be submitted in response to the MPEG-2 Audio CfP. Still 10 submissions were received in response to the call. After a while, a growing uneasiness was being felt in the Audio group that, by working exclusively on a backward-compatible solution - justified, as shown before, for digital television services - MPEG was excluding pure audio solutions where the excellence of the standard was going to be judged exclusively on the ground of the highest audio quality at the smallest amount of bits/s. At the July 1993 meeting in New York, hosted by Columbia University, this issue was raised by a US NB contribution. So the decision was made that, when carrying out MPEG-2 Audio Verification Tests on the Backward Compatible (BC) solution, MPEG would also use yet-to-be-identified Non-Backward Compatible (NBC) codecs in order to assess the improved performance that could be obtained with an unconstrained algorithm. If the tests showed that the backward-compatibility constraint did introduce too heavy a compression penalty, MPEG would initiate the development of a new, NBC multichannel audio coding standard. I personally liked the idea of creating an internal competition between what was bound to be two groups of people working on different technologies. The result of that competition could only improve the performance of both the BC and NBC multichannel audio coding solutions. At the same meeting I was involved in an unusual case. It was past midnight one evening and I was working with a group of MPEG members in a room at Columbia University, the host of the MPEG meeting. Tristan Savatier, then with Thomson Consumer Electronics, and a very active member of the Video group, felt the need for a cup of coffee and went out to get one but found the kitchen door locked (his intentions became known to me only afterwards). He worked on the lock, got in the kitchen and had his coffee but was caught red-handed by the night security. I had to take responsibility for Tristan's future actions - for that evening, I mean, not forever - or I would have lost his work that night. Unexpectedly, at the Paris meeting in March 1994, the US NB requested that MPEG endorse a specific proprietary multichannel audio coding solution as one element of an MPEG-2 Audio standard family. My reaction, at the mid-week plenary, that this was not in line with the MPEG policy of standards developed within the group, was greeted with whistles of disapproval on the part of some members. The Friday plenary saw a rather lengthy monologue of mine interrupted by a few exchanges of words with some MPEG members. This was a christening of fire for Peter Schirling of IBM, who had just been appointed as head to the US delegation, as Cliff Reader had left that position one year before at the Sydney meeting and had been replaced by Greg Wallace, then with 3DO, who had left that position the meeting before. The meeting ended with a confirmation of the MPEG policy that continues to this day. One MPEG member recorded this monologue on a CC (current ISO rules would not allow doing this) and, subsequently, Tristan Savatier got a copy of the tape and converted it to MPEG-1 Audio Layer II and posted it on a web site. The posting was structured in a way that looked like a soloist performance with titles created from the more interesting (for him) passages of my monologue, much as in an Italian opera. My reaction to this initiative was that, since I had not released the copyright of my "performance", the posting was illegal and should be removed (kind of "cease and desist"). This request of mine, however, was met by a shrug of the shoulders (virtual, as this happened by email). Therefore I can (probably) claim to have been the target of the first example of an unauthorised posting of a "performance" (not musical, I agree, but performance it still was) on the web. True that the coding technology used was still MPEG-1 Audio Layer II and not the eventually more famous Layer III. The work on what was eventually called Advanced Audio Coding (AAC), followed the usual steps of requirement definition, CfP and collaborative development. Marina Bosi, then with Dolby Labs, acted as its editor. While returning to the hotel one evening of the AES Convention in New York, she was badly hit by a taxi and had to undergo several surgeries before recovering completely. The group deeply appreciated her determination in the way she carried out her duties while in such terrible personal circumstances that would have crushed the resistance of many. When Marina was reporting the completion of the AAC work at the Bristol meeting in April 1997 before the final approval, I asked her what she was still using her walking stick for (and she still badly needed it at that time). She then defiantly set it aside and, standing, completed her report. The VTs showed that subjective transparency was achieved at 128 kbit/s, a 50% gain over MPEG-1 Audio Layer II! As for MP3, the best AAC encoders today can provide even better performance. The VT also confirmed that the original target of "indistinguishable" audio quality at 384 kbit/s for five full-bandwidth channels was achieved and exceeded: tests carried out by BBC and NHK showed that 320 kbit/s were sufficient to achieve the target. The development of the MPEG-2 Audio and Video standards required the best experts in digital audio and video processing, but the development of the Systems part required seasoned engineers, a species that is unfortunately disappearing. The lucky side for MPEG at that time was that there were plenty of them - and very good ones - because so many companies were waiting for a solution to make products or offer services. The MPEG-2 media coding parts of the standard had been designed to be "generic" (hence the title eventually given to MPEG-2 as "Generic coding of moving pictures and associated audio") but the application domains impacted on media coding in a rather indirect fashion, while the purpose of the Systems part was to act as the interface with the application domains. Finding out this impact was again the task of the Requirements group. A major requirement was that digital television would be carried by delivery systems that were mostly analogue, typically Herzian channels and CATV. Different industries and countries had plans to develop solutions to digitise them with appropriate modulation schemes. So MPEG could assume that digitisation would "happen" (as in fact it did, albeit in a very unorderly and non-uniform fashion) but there were a number of functionalities between the media coding and the physical layer, e.g. multiplexing of different television programs, that were roughly equivalent to an OSI "transport layer" and that were not going to be provided by modulation schemes. A brand new "systems" layer, with completely different requirements than those that had led to the definition of MPEG-1 Systems, was needed. The MPEG-1 Systems layer had adopted a packet multiplexer, which I consider a great achievement (and personal vindication), as I have already said. This had happened thanks to the positive interaction between a group of IT-prone members and other open-minded groups of telco and CE members. That this outcome was not discounted can be seen from the case of DAB: that service uses MPEG-1 Audio, but does not use the MPEG-1 Systems layer and uses a traditional frame-based solution instead. The reasons are because the MPEG-1 Systems layer does not provide support for adaptation to the physical layer (e.g., it assumed an error-less environment, hardly a valid assumption in a radio channel), but more importantly because a packet-based multiplex was anathema to Audio engineers at that time. In the digital television domain, we were talking, if not of the same engineers, of people with a similar cultural background. Eventually the decision was made to adopt a fixed-length packet-based multiplexer, a choice that somehow accommodated both views of the world. This, however, only solved one half of the problem. A multiplex laden with features designed to support transmission in a hostile environment was not a good choice for DSM applications, because this required another solution, essentially the same as the MPEG-1 Systems standard. The first definition of the MPEG-2 Systems layer was achieved at the Sydney meeting in March/April 1993, where it was recognised that a single solution encompassing both application domains was not feasible, at least in the very tight timeline of the project. Therefore the systems layer was defined as having two forms, one called Transport Stream (TS) and the other Program Stream (PS). Today it is too late for regrets, but I am still consumed by my failure to bring together all the industries that had an interest in a "transport solution for real-time media". Granted that reconciling so many conflicting requirements could have been challenging but now the PS and TS basically have no common root. As a result, the industries in need of a TS or PS solution went away with their part of the booty, while the telcos looked from a distance at the TS/PS debate without even trying to join the discussion, being lost as they were in their ATM Adaptation Layer (AAL) dispute of AAL1/AAL2 vs. AAL5. My regret is augmented by the fact that MPEG did have enlightened and competent people who could have provided the unifying solution withstanding the unnatural solution that was forced down our throats for real-time media on the network. The request that the US National Body had made in Paris about a non-MPEG audio codec had been rejected, but the reasons that had prompted it remained unchanged. Indeed the USA, with their ATV project, were moving ahead with plans to deploy their terrestrial digital television system (which they did in 1997) and they wanted to use MPEG-2 Systems and Video but use a non-MPEG audio codec. How was it possible for them to do so if the system did not recognise a non-MPEG audio bitstream? The problem was solved by establishing a Registration Authority (RA), one of the standard ISO mechanisms. Those who wanted to have their proprietary streams carried by the MPEG-2 Systems layer would register that stream with the RA who would then assign a registration number to be carried in an appropriate field in the bitstream. The Society for Motion Pictures and Television Engineers (SMPTE) was eventually appointed by ISO as the RA for this so-called "format identifier". With the same mechanism it was possible to accept a request made at the Singapore meeting in November 1994 by the Confédération Internationale des Sociétés des Auteurs et des Compositeurs (CISAC), the international confederation of rights societies of authors and composers, i.e. to be provided the means to signal copyright information regarding the video stream, the audio stream, and the audio-visual stream in an MPEG-2 stream. The so-called "copyright identifier" solved the problem. This is structured as a two-field number where the first field identifies the agency managing the rights to the stream and the second field is the identifier assigned by that agency to content managed by them. Again, the solution requires a RA where agencies can go to and get their identifiers. Another, very important component was added to MPEG-2 Systems. This was in response to the request from pay TV operators to provide an infrastructure on top of which proprietary protection schemes could be implemented. The addition of two special messages solved the problem: Entitlement Control Messages (ECM) and Entitlement Management Messages (EMM). More about this later. All that has been described so far was sufficient for the particular, though very important, Over-The-Air (OTA) broadcasting constituency, not for those - the telecommunication and CATV industry - who employed physical delivery means. To stay in or to move into the business of digital television competitively, these industries needed a standard protocol to set up a channel with the remote device and to let a receiver interact with content stored at the source. The DSM group provided the home for this important piece of work. An incredibly active group of people started gathering under the chairmanship of Tom Lookabough first and of Chris Adams later, both of Divicom, to develop the Digital Storage Media Command and Control (DSM-CC) standard that became part 6 of MPEG-2. In the best MPEG tradition, MPEG developed a completely generic standard. So, even if the DSM, telco and CATV industries had triggered the work, the final protocol is generic in the sense that it can be used both in the case a return channel exists and when the channel is unidirectional. In the latter case the transmitter can use a carousel, but the receiver is presented with a single interface. Ironically, because the Video on Demand (VOD) business did not fare as expected, the part of the DSM-CC standard using the carousel is widely in use for broadcast applications. The last major component of MPEG-2 is the so-called Real-Time Interface (RTI). This was developed because the MPEG-2 Systems specification assumes that packets arrive at the decoder with zero jitter, clearly an idealised assumption that holds reasonably well in most OTA broadcast, satellite and CATV, but not a valid assumption for such packet-based networks as ATM and Internet Protocol (IP). The purpose of part 9 of MPEG-2 is then to provide a specification for the level of jitter that an implementation is required to withstand. The MPEG-2 Committee Draft (CD), the first stage of the standard issued for ballot, was approved in Seoul in November 1993, after one of the most intensive weeks in MPEG history. Some delegates worked until 6 am on Friday to produce the three Systems, Video and Audio drafts so that they could be photocopied and distributed to all members for approval at the afternoon plenary. It was at that meeting that the mark of one million photocopied pages was reached. The short night did not prevent Tristan Savatier from staging another of his tricks. He convinced one of the lady delegates to lend him her stockings and shoes and, during the coffee break of the Friday afternoon plenary, he hid under the Convenor's desk wearing the stocking in his hands and arms, and the shoes in his hands. When I resumed the meeting he started showing the stockings and the shoes as if they were mine. The following Paris meeting allowed people to make a review of the work done at the intense Seoul meeting. The systems part was found to need a major overhaul and so it was decided that a special meeting would be held in June in Atlanta, GA, hosted by Scientific Atlanta, just before the regular July meeting. With this the final approval by MPEG was made as planned in Singapore in November 1994. I would like to conclude this chapter by reporting what VADIS did to promote the development of MPEG-2 and specifically CSELT's role in it. Besides active participation in tens of CEs, VADIS carried out a thorough campaign of field trials to assess the performance of the MPEG-2 standard. Some VADIS members produced audio-visual bitstreams, others made available transmission adaptors, like one of the first modems for satellite, cable and terrestrial UHF, still others made available their ATM networks. CSELT had continued working on its multiprocessor architecture (the third generation, using an Intel 860 RISC instead of the original 80186 and five 2901 DSPs per board) and produced two real-time MPEG-2 decoders. The two decoders are still in regular use in the lab after so many years. Another achievement of the project was the support given to the development of the VLSI design of an MPEG-2 Video decoder, which enabled Philips to become the 4th worldwide supplier of such chips. |
|||
|
|
|||
|
Send comments by e-mail
--- See the
communication policy
|
|||
|
|
|||
|
Copyright © 2003 chiariglione.org |
|||