INTERNATIONAL ORGANISATION FOR STANDARDISATION

ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC1/SC29/WG11

CODING OF MOVING PICTURES AND AUDIO

 

ISO/IEC JTC1/SC29/WG11

MPEG2005/N7319

July 2005, Poznan, Poland

 

 

Source:           Video

Status:            Draft Output

Title:               Introduction to Low-level Visual Description Tools

 

 

Basic Structure Description Tools

 

This group consists of five supporting tools of visual descriptions describing elementary features, such as color, texture and shape. These tools are designed as descriptor containers and enable structural representation of elementary features. The GridLayout container provides efficient representations of visual features on grids. TimeSeries containers represent temporal arrays of several descriptions. The GofGopFeature container describes representative descriptions over video segment. The MultipleView container describes a 3D object using several pictures captured from different view angles.

 

Color Description Tools

 

Color is the most basic attribute of visual contents. MPEG-7 Visual defines five different description tools, each of which represents a different aspect of the color attribute. Color distribution tools include a representative color description (DominantColor), basic color distribution description (ScalableColor) and an advanced color distribution description (ColorStructure). The remaining tools include ColorLayout describing spatial distribution of colors, and ColorTemperature describing perceptual feeling of illumination color.

 

DominantColor

The Dominant Color Descriptor characterizes an image or region by a small number of representative colors.  These are selected by quantizing pixel colors into (up to 7) principal clusters.  The description then consists of the fraction of the image represented by each color cluster and the variance of each one.  A measure of overall spatial coherency of the clusters is also defined.  This descriptor is a very compact description of the color distribution in the image.

 

ScalableColor

The Scalable Color Descriptor is a color Histogram in the HSV Color Space, which is encoded by a Haar transform. It has a binary representation that is scalable, in terms of bin numbers and bit representation accuracy, over a broad range of granularity. Retrieval accuracy can therefore be balanced against descriptor size.  Inversion of the Haar transform is not necessary for consumption of the description, since similarity matching is also effective in the transform domain.

 

ColorLayout

The Color Layout Descriptor represents the spatial layout of color images in a very compact form. It is based on generating a tiny (8x8) thumbnail of an image, which is encoded via DCT and quantized.  As well as efficient visual matching, this also offers a quick way to visualize the appearance of an image, by reconstructing an approximation of the thumbnail, by inverting the DCT.

 

ColorStructure

The Color Structure Descriptor captures both color content and information about the spatial arrangement of the colors.  Specifically, it is a histogram that counts the number of times a color is present in an 8x8 windowed neighborhood, as this window progresses over the image rows and columns.  This enables it to distinguish, for example, between an image in which pixels of each color are distributed uniformly and an image in which the same colors occur in the same proportions, but are located distinct blocks.

 

IlluminationInvariantColor

IlluminationInvariantColor is another supporting tool in the color description tool group. It is a container and can extend four color descriptors – DominantColor, ScalableColor, ColorLayout and ColorStructure – to support illumination invariant similarity matching.

 

All the tools are applicable to square-formed and arbitrary shaped still pictures.

 

Texture Description Tools

 

Edge Histogram

The edge histogram descriptor represents the spatial distribution of five types of edges (four directional edges and one non-directional).  It consists of local histograms of these edge directions, which may optionally be aggregated into global or semi-global histograms.

 

Homogeneous Texture

The Homogeneous Texture descriptor is designed to characterize the properties of texture in an image (or region), based on the assumption that the texture is homogeneous – i.e., the visual properties of the texture are relatively constant over the region.  The descriptive features are extracted from a bank of orientation- and scale-tuned Gabor filters.

 

Texture Browsing

The Texture Browsing Descriptor is useful for representing homogeneous texture for browsing type applications. This descriptor, combined with the Homogeneous Texture Descriptor, provides a scalable solution to representing homogeneous texture regions in images.

 

Shape Description Tools

 

Shape 3D

The 3D Shape Descriptor provides an intrinsic shape description of 3D mesh models, by exploiting some local attributes of the 3D surface

 

Motion Description Tools

 

There are four motion Descriptors: Camera Motion, Motion Trajectory, Parametric Motion and Motion Activity.

 

Camera Motion

This descriptor characterizes 3-D camera motion parameters. It is based on 3-D camera motion parameter information, which can be automatically extracted or generated by capture devices. The camera motion descriptor supports the following well-known basic camera operations: fixed, panning, tracking, tilting, booming, zooming, dollying, and rolling.

 

Motion Trajectory

The motion trajectory of an object is a simple, high-level feature, defined as the localization, in time and space, of one representative point of this object. This descriptor shows usefulness for content-based retrieval in object-oriented visual databases.

 

Parametric Motion

The parametric model is associated with arbitrary (foreground or background) objects, defined as regions (group of pixels) in the image over a specified time interval. Such an approach leads to a very efficient description of several types of motions, including simple translation, rotation and zoom, or more complex motions such as combinations of the above-mentioned elementary motions.

 

Motion Activity

The Motion Activity descriptor captures the intuitive notion of ‘intensity of action’ or ‘pace of action’ in a video segment. This descriptor is useful for applications such as video re-purposing, surveillance, fast browsing, dynamic video summarization, content-based querying etc.

 

Localization description tools

 

The localization description tools can be used to indicate arbitrary shaped regions of interest in the spatial (RegionLocator) and spatio-temporal (SpatioTemporalLocator) domains.