INTERNATIONAL ORGANIZATION FOR STANDARDIZATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC 1/SC 29/WG 11
CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC 1/SC 29/WG 11N7456
July 2005, Poznan

Title

FBA white paper

Source

SNHC

Status

Proposal

Editor

Marius Preda (INT)

 

What is FBA and why is it useful?

Face & Body Animation (FBA) consists of a set of tools enabling a specific representation of a humanoid avatar and allowing very low bitrate compression and transmission of animation parameters. These features open the way of multimedia applications that allow adding in the presentations, with a reduced cost, virtual presenters. Thus, it is now possible to enrich web sites content with a human like synthetic model giving instructions to the user in an interactive way. Furthermore, it is possible to send in a television channel, multiplexed with the main video and audio streams, the animation of an avatar. Finally, but not less importantly, is the use of such tools in on-line games and 3D movies for ensuring a compact representation of the media layer.

 

 

FBA technical features

 

A 3D (or 2D) face and body object is a representation of the human face and body, that is structured for portraying the visual manifestations of speech, facial expressions and body posture, adequate to achieve visual speech intelligibility and the recognition of the mood and gesture of the speaker. A face and body object is animated by a stream of face and body animation parameters (FBA) encoded for low-bandwidth transmission in broadcast (one-to-many) or dedicated interactive (point-to-point) communications.

 

The Face Animation Parameters (FAPs) manipulate key feature control points in a mesh model of the face to produce animated visemes for the mouth (lips, tongue, teeth), as well as animation of the head and facial features like the eyes. FAPs are quantized with careful consideration for the limited movements of facial features, and then prediction errors are calculated and coded arithmetically. The remote manipulation of a face model in a terminal with FAPs can accomplish lifelike visual scenes of the speaker in real-time without sending pictorial or video details of face imagery every frame.

 

The Body Animation Parameters (BAPs) define joint angles with respect to body axes and are independent of a particular body model.

 

A simple streaming connection can be made to a decoding terminal that animates a default face and body model. A more complex session can initialize a custom face and body in a more capable terminal by downloading face definition parameters (FDP) and body definition parameters (BDP) from the encoder. Thus specific background images, facial textures, and head and body geometry can be portrayed. The composition of specific backgrounds, face and body 2D/3D meshes, texture attribution of the mesh, etc. is described in ISO/IEC 14496 part 1. An FBA stream has a maximum bitrate of 2-3kbit/s for face and 40 kbit/s for body. Optional temporal DCT coding provides further compression efficiency in exchange for delay. Using the facilities of ISO/IEC 14496 part 1, a composition of the animated face and body model and synchronized, coded speech audio (low-bitrate speech coder or text-to-speech) can provide an integrated low-bandwidth audio/visual speaker for broadcast applications or interactive conversation.

 

Limited scalability is supported. Face and body animation achieves its efficiency by employing very concise motion animation controls in the channel, while relying on a suitably equipped terminal for rendering of moving 2D/3D faces and body with non-normative models held in local memory. Models stored and updated for rendering in the terminal can be simple or complex. To support speech intelligibility, the normative specification of FAPs intends for their selective or complete use as signaled by the encoder. A masking scheme provides for selective transmission of FAPs and BAPs according to what parts of the face are naturally active from moment to moment.

 

The Face and Body Animation specifications are defined in parts 1 and 2 of the MPEG-4 standard.

 

Beyond FBA: the BBA specifications

 

In a recent work, published as ISO/IEC 14496-16, the SNHC working group, extended the FBA concepts and added a new tool called Bone-based Animation (BBA) that allows higher quality representation and animation of generic models thanks to a multilayer structure: skeleton, muscle and skin.