INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND
AUDIO
ISO/IEC
JTC1/SC29/WG11 N7705
MPEG2005/
October 2005, Nice, France
Source:
Audio Subgroup
Title:
MPEG Technologies: Structured Audio
Status:
Approved
Editor:
Giorgio Zoia
What it is
Structured audio representations are coding schemes that are made up of semantic information about the sounds they represent and that make use of high-level models. Well-known examples of structured audio representations in literature are for instance the musical-instrument digital interface (MIDI) musical-event lists, and linear-prediction models of speech [1]. A toolset for structured general representations of sound is defined by MPEG-4, namely MPEG-4 Structured Audio.
What it is for
Among the numerous applications of structured sound, a very important one is ultra low-bitrate transmission of audio content and related processing algorithms, exploiting formerly unexplored forms of redundancy in signals. Furthermore, structured sound descriptions allow perceptually or physically sensible control, providing a more natural interface for the search and manipulation of data.
Description of MPEG-4 SA
SA shows a general concept very similar to that of the most popular SWSS sets (software sound synthesis sets, see e.g. [2] for an applied overview) typical of the world of computer music, and it is basically composed by two native tools, SAOL and SASL, plus support of two other tools: a sound bank format (SA-SBF from MIDI DLS-2) and MIDI.
Structured Audio Orchestra Language
SAOL (Structured Audio Orchestra Language) is a C-like programming language. It is used to describe an orchestra of instruments (in the wide sense, including functionality for both sound generation and sound processing) with their related functions; it only uses variables of a single numeric type (32-bit floating point). Variables are instead characterized by three different rates: initialization-, control- and sampling-rate; statements as a consequence, are each one characterized by one the same three rates and periodically executed at that rate in programming order.
Structured Audio Score Language
SASL (Structured Audio Score Language) is used to schedule tasks through instrument instantiations and control statements (affecting control-rate variables); SASL instructions can be dispatched to the decoder at any time, at the initialization of the performance and/or at the beginning of each control cycle, through an MPEG-4 stream of data. A built-in scheduler maps SASL instructions on SAOL instruments creating in runtime a program, which can be called indifferently performance or decoding (in this particular case the second term, typical of coding standards, is equivalent to the first).
MIDI support
An alternative method that can also be used (in conjunction with or instead of SASL) for control of structured audio is
In addition to SASL, MIDI score events can also be used for controlling the playing of SAOL instruments. MIDI (Musical Instrument Digital Interface) has been widely used in the music industry since its disclosure in 1983. This form of control is included in SA to enable backwards compatibility with MIDI-based synthesis. When used in algorithmic SAOL-based synthesis, the MIDI events are converted to SAOL orchestra control events before execution.
Both the MIDI and the SASL control information can be transmitted in the SA stream header and in the bitstream following it. The control events in the header (either in a MIDI file or a SASL score file) must have timing information, which is used to register each event with the scheduler of the decoder to be used later in the decoding process.
The Sample Bank Format
The Structured Audio Sample Bank Format (SASBF) is used for transmission of audio sample banks for wavetable synthesis and associated simple processing algorithms.
The SASBF is based on MIDI Downloadable Sounds 2 format (DLS 2) that like MIDI is also specified by the MIDI Manufacturers Association (MMA). The purpose of this format is to guarantee the quality of the synthesized sound and the compatibility between different decoders. In fact, general MIDI only specifies the mapping between the MIDI events and the music instruments, but it does not normatively define the quality of the music synthesis. MIDI alone enables a very low bitrate transmission of sound, but it is entirely dependent on the synthesizer concerning what the output will sound like. The downloadable sound concept in SASBF is used to transmit instead the wavetables with the bitstream, providing a normative way to control the quality of the played samples.
SA Object Types
There are four SA-based object types that can be defined in an MPEG-4 bitstream:
The following picture shows a block diagram for the most generic SA decoding process, i.e. the one allowed by the Main Synthesis object type.

SA Application Scenarios
The versatile toolset of MPEG-4 Structured Audio enables a rich variety of applications. The main scenarios are the following:
References
[1] C. Roads, The Computer Music Tutorial, Cambridge, MA: MIT Press 1996, parts II and III
[2] R. B. Dannenberg and N. Thompson, Real-Time Software Synthesis on Superscalar Architectures, Computer Music Journal, vol. 21 (3), pp. 83-94, MIT Press 1997
[3] J. Hupaniemi and R. Väänänen, SNHC Audio and Audio Composition, in F. Pereira and T. Ebrahimi (editors): The MPEG-4 Book, Prentice Hall PTR, 2002.