INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC1/SC29/WG11 N7705
MPEG2005/
October 2005, Nice, France

Source:      Audio Subgroup
Title:          MPEG Technologies: Structured Audio
Status:       Approved
Editor:        Giorgio Zoia

What it is

Structured audio representations are coding schemes that are made up of semantic information about the sounds they represent and that make use of high-level models. Well-known examples of structured audio representations in literature are for instance the musical-instrument digital interface (MIDI) musical-event lists, and linear-prediction models of speech [1]. A toolset for structured general representations of sound is defined by MPEG-4, namely MPEG-4 Structured Audio.

What it is for

Among the numerous applications of structured sound, a very important one is ultra low-bitrate transmission of audio content and related processing algorithms, exploiting formerly unexplored forms of redundancy in signals. Furthermore, structured sound descriptions allow perceptually or physically sensible control, providing a more natural interface for the search and manipulation of data.

Description of MPEG-4 SA

SA shows a general concept very similar to that of the most popular SWSS sets (software sound synthesis sets, see e.g. [2] for an applied overview) typical of the world of computer music, and it is basically composed by two native tools, SAOL and SASL, plus support of two other tools: a sound bank format (SA-SBF from MIDI DLS-2) and MIDI.

Structured Audio Orchestra Language

SAOL (Structured Audio Orchestra Language) is a C-like programming language. It is used to describe an orchestra of instruments (in the wide sense, including functionality for both sound generation and sound processing) with their related functions; it only uses variables of a single numeric type (32-bit floating point). Variables are instead characterized by three different rates: initialization-, control- and sampling-rate; statements as a consequence, are each one characterized by one the same three rates and periodically executed at that rate in programming order.

Structured Audio Score Language

SASL (Structured Audio Score Language) is used to schedule tasks through instrument instantiations and control statements (affecting control-rate variables); SASL instructions can be dispatched to the decoder at any time, at the initialization of the performance and/or at the beginning of each control cycle, through an MPEG-4 stream of data. A built-in scheduler maps SASL instructions on SAOL instruments creating in runtime a program, which can be called indifferently performance or decoding (in this particular case the second term, typical of coding standards, is equivalent to the first).

MIDI support

An alternative method that can also be used (in conjunction with or instead of SASL) for control of structured audio is

In addition to SASL, MIDI score events can also be used for controlling the playing of SAOL instruments. MIDI (Musical Instrument Digital Interface) has been widely used in the music industry since its disclosure in 1983. This form of control is included in SA to enable backwards compatibility with MIDI-based synthesis. When used in algorithmic SAOL-based synthesis, the MIDI events are converted to SAOL orchestra control events before execution.

Both the MIDI and the SASL control information can be transmitted in the SA stream header and in the bitstream following it. The control events in the header (either in a MIDI file or a SASL score file) must have timing information, which is used to register each event with the scheduler of the decoder to be used later in the decoding process.

The Sample Bank Format

The Structured Audio Sample Bank Format (SASBF) is used for transmission of audio sample banks for wavetable synthesis and associated simple processing algorithms.

The SASBF is based on MIDI Downloadable Sounds 2 format (DLS 2) that like MIDI is also specified by the MIDI Manufacturers Association (MMA). The purpose of this format is to guarantee the quality of the synthesized sound and the compatibility between different decoders. In fact, general MIDI only specifies the mapping between the MIDI events and the music instruments, but it does not normatively define the quality of the music synthesis. MIDI alone enables a very low bitrate transmission of sound, but it is entirely dependent on the synthesizer concerning what the output will sound like. The downloadable sound concept in SASBF is used to transmit instead the wavetables with the bitstream, providing a normative way to control the quality of the played samples.

SA Object Types

There are four SA-based object types that can be defined in an MPEG-4 bitstream:

  1. In MIDI-only object type only MIDI files of MIDI events are transmitted in the bitstream. This means that the decoding uses non-normative ways to generate sound, and the mapping between the MIDI instruments and the music synthesis is done according to the patch mappings defined in the General MIDI specification. Thus a bitstream of this object type is backwards compatible with the MIDI specification.
  2. In Wavetable object type MIDI files and SASBF wavetables can be transmitted in the bitstream, and MIDI events are used to control playing the wavetable-based instruments.
  3. In Algorithmic Synthesis object type only SA native tools can be used in the bitstream; the synthetic instruments are defined only with SAOL statements and variables, while SASBF is not supported; SASL score events only can be used to control the sound synthesis process.
  4. In Main Synthesis object type all components of SA are allowed in the stream, including SASBF, MIDI files and events.

The following picture shows a block diagram for the most generic SA decoding process, i.e. the one allowed by the Main Synthesis object type.


 

SA Application Scenarios

The versatile toolset of MPEG-4 Structured Audio enables a rich variety of applications. The main scenarios are the following:

  1. MIDI over MPEG-4. At the lower level, SA allows supporting MIDI specification inside a normative MPEG-4 stream or file.
  2. Wavetable synthesis. The wavetable synthesis engine in MPEG-4 SA offers a bounded-complexity sound synthesis implementation, and enables implementation on low-complexity decoders and terminals. Current applications may include karaoke systems, musical backgrounds for WWW pages, mobile-device tools, etc.
  3. Algorithmic synthesis. The algorithmic synthesis capability of SA offers high-quality user-definable sound synthesis including expressive control. Because any signal processing routine can be written with SAOL, the application area is very wide. Current applications may include generic sound synthesis and computer music, video games, low-bitrate Internet delivery of music, virtual reality models and entertainment.
  4. Audio effects processing. A SAOL program can be used as custom effects processing module for natural and synthetic audio.
  5. Generalized structured audio coding. The SA decoder can be used to emulate the behavior of natural audio coders (in essence any decoder can be implemented using SA tools). However, the SA toolset may not provide the computationally optimal tools to achieve the best general audio coding in all cases.

References

[1]      C. Roads, The Computer Music Tutorial, Cambridge, MA: MIT Press 1996, parts II and III

[2]      R. B. Dannenberg and N. Thompson, Real-Time Software Synthesis on Superscalar Architectures, Computer Music Journal, vol. 21 (3), pp. 83-94, MIT Press 1997

[3]      J. Hupaniemi and R. Väänänen, SNHC Audio and Audio Composition, in F. Pereira and T. Ebrahimi (editors): The MPEG-4 Book, Prentice Hall PTR, 2002.