MPEG-4 Systems Elementary Stream Management (ESM) 1. What does demultiplexing mean? 1. What does demultiplexing mean?The demultiplexing stage retrieves the individual elementary streams that are usually interleaved for transmission or storage. It is a functionality that is not seen as part of MEPG-4 Systems. It is hidden in the delivery layer. MPEG-4 ESM just deals with the demultiplexed, usually still SL-packetized, elementary streams that are accessible through the DMIF application interface (DAI). In order to support upstream information, a receiving terminal might also incorporate multiplexing facilities. 2. Which layers are passed, before the objects are composed?There are three layers in an audio-visual terminal that should be mentioned: delivery layer, sync layer and compression layer.
3. What is the TransMux?The TransMux is an outdated synonym for a part of what we now call delivery layer. This change in terminology happened in order to align the Systems and DMIF parts of the standard. 4. What is the delivery layer?It is a generic abstraction for delivery mechanisms (computer networks, etc.) able to store or transmit a number of multiplexed elementary streams or FlexMux streams. To allow for maximum flexibility for service creation and application design, it is not specified by MPEG-4. The interface, however, to the delivery layer is well defined, thus allowing transmission of MPEG-4 content over any type of transport layer facility (e.g., ITU-T Recommendations H.22x, MPEG-2 Transport Stream, IETF RTP). 5. And what is the FlexMux?The FlexMux is a tool that provides a flexible way of interleaving packets of data. It is not meant to be robust to errors, because it can be layered on top of a robust transport layer. The FlexMux is fully defined by MPEG-4, but its use is optional: applications can operate directly on top of a traditional transport layer (formerly called "TransMux") if they so desire. 6. What are these "object descriptors" anyway?The ESM part of Systems also specifies means to identify and name elementary streams so that they can be referred to in a scene description and be attached to individual objects. This association is performed in object descriptors that are transmitted in their own elementary streams. Object descriptors are separate from the scene description itself, thus simplifying editing and remultiplexing of MPEG-4 content. The descriptors associate audio-visual objects, more precisely, nodes in the scene to elementary stream identifiers. An additional mapping is required to resolve these identifiers to actual transport layer "channels" (e.g., port numbers). How this mapping is performed depends on the delivery layer instance that is actually used. In accordance with the goal of allowing the use of any delivery layer, MPEG does not define this mapping but rather expects parties that define these delivery layers to define how MPEG-4 content should be mapped to their design in a way that they consider most appropriate. 7. What's so special about this Initial Object Descriptor (IOD)?The IOD is an object descriptor that does not only describe a
set of elementary streams, but it also conveys the set of profile and level
information that is needed by a receiver to assess the processing resources
needed for that content. 8. What does Management of the Receiving Terminal's Buffer mean?To predict how the decoder will behave when decoding the various elementary data streams that form an MPEG-4 session, Systems provide a Systems Decoder Model. This model helps to provide a well defined framework in which the receiver's behaviour can be unambiguously characterized. Use of this model enables the encoder to monitor the buffer resources that are used to decode the session and ensure that they are not exceeded. The required buffer resources are conveyed to the decoder at the beginning of a session, so that the decoder can decide whether it is capable to provide them. Buffer management is critical in broadcast applications, and as a general framework for defining and enabling synchronization. Real implementations will have to take steps to accommodate non ideal behaviour of the environment in which they operate; this typically involves additional buffering to filter out network jitter. 9. What are the assumptions of the timing model?For applications involving real-time transmission, the timing model adopted by MPEG-4 assumes a constant end-to-end delay from the output of the encoder to the input of the decoder. This is only done so that a well defined and verifiably correct model is employed. It does not mean that MPEG-4 content cannot be transmitted over variable delay networks. 10. What should I use timing information for?There are two kinds of timing information that can be conveyed in elementary streams. The first set is used to convey the sender's time base to the receiver (clock references) and the second contains the desired time (in units of the sender's time base) for specific events such as the desired decoding or composition time for portions of the encoded audio-visual information. With this timing information, the inter picture interval and audio sample rate can be adjusted at the decoder to match the encoder's inter picture interval and audio sample rate for synchronized operation. 11. What are these 'profiles and levels'?Profiles are a mechanism to establish some well-defined subsets
of the overall MPEG-4 functionality. Profiles are defined in multiple
dimensions. In the context of MPEG-4 Systems Scene Graph Profiles and OD
Profiles are distinguished. 12. Which MPEG-4 Systems profiles are distinguished?In the context of MPEG-4 Systems Scene Graph Profiles and OD Profiles are distinguished. Several scene graph profiles are currently defined: Audio, Simple2D, Complete2D and Complete. They specify the scene description nodes and ROUTEs which are allowed to be present in the scene description. These profiles do not prescribe what media nodes are allowed in the scene description. This information is inferred from the Audio and Visual profiles.
13. How do I know the total amount of memory needed for my MPEG-4 terminal?In general, profiles & levels help you in this. However, the total amount of memory can not be derived from the standard, since the amount of memory needed for composition is not derivable in a simple way from the profile & level specification. 14. How do object descriptors relate to elementary streams?Object descriptors are the glue between the scene description and the elementary streams. An ES_Descriptor (a part of an object descriptor) is associated to each elementary stream and contains all the information needed to find it, describe it and advise the receiver which resources need to be set up to decode it. 15. How do I transmit an object descriptor?Object descriptors are transported in an elementary stream of their own. They are never sent just "as is" but are always encapsulated in so-called "OD commands". 16. What are OD commands good for?OD commands allow to differentiate between the descriptive aspect of object descriptors themselves and the need to enable and disable them. OD commands provide this latter functionality (ODUpdate, ODRemove). 17. Why are object descriptors streamed?Indeed, one may argue that all the elementary streams for a presentation are known at the start of that presentation. However, anticipating highly dynamic content, MPEG found it prudent to allow for half-way updates of the set of object descriptors, to cope with newly appearing content streams. And therefore the notion of an 'elementary stream' was applied to the object descriptors as well, same as to all the content elementary streams. 18. And, can I also have multiple object descriptor streams?Yes, even this is possible! Keep in mind that everything in MPEG-4 strives to be object-oriented. An MPEG-4 presentation may consist of several different sub-scenes, each of which could possibly be used independently. Therefore, at least each of those sub-scenes will have their own object descriptor stream. 19. What is the sync layer?The sync layer defines some syntax (the SL packet headers) that permits to carry timing information, i.e., time stamps and clock references. This data allows a receiver to determine which portions of different streams are to be composed and, hence, presented at the same time. 20. Are object descriptors part of the sync layer?Good question. It is not explicitly stated like this in the spec, however, at the receiver side you do need the object descriptors in order to start parsing the sync layer (SL). This is due to the fact that a sub-descriptor (SLConfigDescriptor) determines some of the variable syntax fields of the SL packet header. 21. Why do I see sometimes "adaptation layer" instead of "sync layer"?You may see the term adaptation layer in some older MPEG documents. We decided to rename it to "sync layer" when it became clear that this is the major functionality performed by this layer. 22. Why do I see the words "PDU" and "packet" used for the same thing?Well, this is another terminology change during the standardization process. We figured out that the "PDU" term suggests more standardized protocol functionality than there actually exists at the sync layer. The sync layer is not a protocol in the sense understood by the networking industry. It is "just" a syntax for adding time stamps and some other information to elementary stream access units in order to allow inter-stream synchronization. 23. What does stream synchronization actually mean?Stream synchronization involves multiple things. It involves evaluating the time stamps present in the SL packets, decoding the time stamped access units in due time and compositing (and presenting) them at the point in time indicated by the composition time stamp. 24. I know STB from MPEG-2. Now, what is an OTB?An object time base is the flow of time that can be reconstructed from the set of OCRs sent for this time base. As in MPEG-2, the process of time base reconstruction is not normatively specified in MPEG-4. 25. Ah, time bases relate to objects! So, can there be multiple time bases in a scene?Indeed, different audio-visual objects may have different OTBs. MPEG-4 permits presentation of multiple objects encoded using different OTBs. However, in that case a tight synchronization between such objects may not be possible. Therefore it can be anticipated that some application scenarios (e.g. broadcast) may want to restrict content to just one single time base. 26. I know PCRs and SCRs from MPEG-2. So, what are OCRs?Since MPEG-4 defines "audio-visual objects" rather than "programs", it was decided to rename the "program clock reference" to an "object clock reference". An object clock reference is a sample from an object time base. Different objects within the same presentation may have different object time bases, even though this may make tight synchronization impossible. 27. I know PTSs from MPEG-2. Now, what are CTSs?In contrast to MPEG-2, MPEG-4 inserts a step between decoding and presentation of audio-visual information, which is called composition. Therefore the point in time when some 'composition unit' of audio-visual information is available in its decoded representation is called its 'composition time' rather than 'presentation time'. Therefore the associated time stamps are called 'composition time stamps' rather than 'presentation time stamps' in MPEG-4. 28. MPEG-4 Systems talks about "Access Units". However, how do I know what a visual or audio access unit is?Indeed this is not determined in the Systems spec. The Visual and Audio part of the standard do give definitions what their access units are. Look at Annex K in the Visual spec and Subpart 1 / Clause 6 in the Audio spec. 29. What is Object Content Information (OCI)?OCI is a set of descriptors that carry basic information about the content, its creator, etc. of a certain sub-set of an MPEG-4 presentation. This information is not needed to decode and present the content. However, it might be presented to the user in some (non-standardized) way. 30. What is the relation between OCI and MPEG-7?OCI can be seen as a trivial cousin of MPEG-7. It just provides basic information while MPEG-7 may supply you with a broad range of information. 31. What is the relation between OCI and IPMP?Some of the information that might be available through OCI could also be relevant, for example, for determining IP ownership of this content. However, it is exclusively the role of IPMP to manage and protect IP associated to such content. OCI is just intended for information of the user. 32. What is the difference between an OCI descriptor and an OCI stream?Basic OCI descriptors may be associated to object descriptors and, hence, to (a group of) elementary streams. In other words, this is a more or less static association of OCI to elementary stream(s). However, if the OCI changes frequently over time, it is possible as well to establish an OCI stream which allows to update the OCI descriptors over time. |
||
| Webmaster | ||