INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11
MPEG2005/N7319
July 2005, Poznan, Poland
Source: Video
Status: Draft Output
Title: Introduction to Low-level Visual Description Tools
This group consists of five supporting tools of visual descriptions describing elementary features, such as color, texture and shape. These tools are designed as descriptor containers and enable structural representation of elementary features. The GridLayout container provides efficient representations of visual features on grids. TimeSeries containers represent temporal arrays of several descriptions. The GofGopFeature container describes representative descriptions over video segment. The MultipleView container describes a 3D object using several pictures captured from different view angles.
Color is the most basic attribute of visual contents. MPEG-7 Visual defines five different description tools, each of which represents a different aspect of the color attribute. Color distribution tools include a representative color description (DominantColor), basic color distribution description (ScalableColor) and an advanced color distribution description (ColorStructure). The remaining tools include ColorLayout describing spatial distribution of colors, and ColorTemperature describing perceptual feeling of illumination color.
The Dominant Color Descriptor characterizes an image or region by a small number of representative colors. These are selected by quantizing pixel colors into (up to 7) principal clusters. The description then consists of the fraction of the image represented by each color cluster and the variance of each one. A measure of overall spatial coherency of the clusters is also defined. This descriptor is a very compact description of the color distribution in the image.
The Scalable Color Descriptor is a color Histogram in the HSV Color Space, which is encoded by a Haar transform. It has a binary representation that is scalable, in terms of bin numbers and bit representation accuracy, over a broad range of granularity. Retrieval accuracy can therefore be balanced against descriptor size. Inversion of the Haar transform is not necessary for consumption of the description, since similarity matching is also effective in the transform domain.
The Color Layout Descriptor represents the spatial layout of color images in a very compact form. It is based on generating a tiny (8x8) thumbnail of an image, which is encoded via DCT and quantized. As well as efficient visual matching, this also offers a quick way to visualize the appearance of an image, by reconstructing an approximation of the thumbnail, by inverting the DCT.
The Color Structure Descriptor captures both color content and information about the spatial arrangement of the colors. Specifically, it is a histogram that counts the number of times a color is present in an 8x8 windowed neighborhood, as this window progresses over the image rows and columns. This enables it to distinguish, for example, between an image in which pixels of each color are distributed uniformly and an image in which the same colors occur in the same proportions, but are located distinct blocks.
IlluminationInvariantColor is another supporting tool in the color description tool group. It is a container and can extend four color descriptors – DominantColor, ScalableColor, ColorLayout and ColorStructure – to support illumination invariant similarity matching.
All the tools are applicable to square-formed and arbitrary shaped still pictures.
The edge histogram descriptor represents the spatial distribution of five types of edges (four directional edges and one non-directional). It consists of local histograms of these edge directions, which may optionally be aggregated into global or semi-global histograms.
The Homogeneous Texture descriptor is designed to characterize the properties of texture in an image (or region), based on the assumption that the texture is homogeneous – i.e., the visual properties of the texture are relatively constant over the region. The descriptive features are extracted from a bank of orientation- and scale-tuned Gabor filters.
The Texture Browsing Descriptor is useful for representing homogeneous texture for browsing type applications. This descriptor, combined with the Homogeneous Texture Descriptor, provides a scalable solution to representing homogeneous texture regions in images.
The 3D Shape Descriptor provides an intrinsic shape description of 3D mesh models, by exploiting some local attributes of the 3D surface
There are four motion Descriptors: Camera Motion, Motion Trajectory, Parametric Motion and Motion Activity.
This descriptor characterizes 3-D camera motion parameters. It is based on 3-D camera motion parameter information, which can be automatically extracted or generated by capture devices. The camera motion descriptor supports the following well-known basic camera operations: fixed, panning, tracking, tilting, booming, zooming, dollying, and rolling.
The motion trajectory of an object is a simple, high-level feature, defined as the localization, in time and space, of one representative point of this object. This descriptor shows usefulness for content-based retrieval in object-oriented visual databases.
The parametric model is associated with arbitrary (foreground or background) objects, defined as regions (group of pixels) in the image over a specified time interval. Such an approach leads to a very efficient description of several types of motions, including simple translation, rotation and zoom, or more complex motions such as combinations of the above-mentioned elementary motions.
The Motion Activity descriptor captures the intuitive notion of ‘intensity of action’ or ‘pace of action’ in a video segment. This descriptor is useful for applications such as video re-purposing, surveillance, fast browsing, dynamic video summarization, content-based querying etc.
The localization description tools can be used to indicate arbitrary shaped regions of interest in the spatial (RegionLocator) and spatio-temporal (SpatioTemporalLocator) domains.