Coarse-to-fine stereo with thin structures and transparencies

Mikhail Sizintsev

Contact email: sizints at cse dot yorku dot ca


Coarse-to-fine stereo processing of a typical image.


Overview
:

    Dense stereo algorithms rely on matching over a range of disparities. To speed up the search and reduce match ambiguity, processing can be embedded in the hierarchical, or coarse-to-fine (CTF), framework using image pyramids. However, this technique is limited when resolving thin structures, as they are poorly represented at coarser scales. In this paper we exploit alternative pyramid and search space techniques to ameliorate these difficulties. We propose matching with the Magnitude-extended Laplacian Pyramid, (see figure below), (MeLP) - a generalization of the Laplacian pyramid that explicitly encodes the energy magnitude component of the band-passed images. In essence, MeLP effectively encodes fine scale details in low resolution images, which allows for accurate recovery of thin structures during CTF processing. Furthermore, transparencies can be resolved for common cases when spatial frequency structure is locally different for each layer. See sample results below.

Left Right True Disparity
Bars Bars left Bars right Bars Ground Truth
Cones Cones left Cones right Cones Ground Truth

Dataset in use. Left and right images of Bars and Cones scenes along with associated disparity ground truth. Thin structures of interest are outlined with red circles.

LP-1 LP-3 MeLP
Bars Bars LP-1 Bars LP-3 Bars MeLP
Cones Cones LP-1 Cones LP-3 Cones MeLP

Recovered disparity maps with coarse-to-fine stereo operating on Laplacian pyramid with disparity search range of 1 (LP-1), Laplacian pyramid with disparity search range of 3 (LP-3), and the proposed Magnitude-extended Laplacian Pyramid with disparity search range 1 (MeLP). Note the significanlty better recovery of thin structures while keeping search range minimal when MeLP is used.



Why does the traditional coarse-to-fine fail?
:

   
Gaussian and Laplacian pyramids
Typical image pyramids ubuquitous in computer vision and related areas.

The basic phylosophy behind standard image pyramid techniques is to represent coarse frequency information in an image of smaller size, which is allowed by the Nyquist sampling theorem. This includes both Gaussian pyramid GP (which just stores the lowpassed and subsampled versions of the original image) and Laplacian pyramid LP derived from the former (which stores only the highest band of frequency information allowed at each level). Both pyramids are schematically shown above. Thus, by desing, high frequency information is not represented in the small resolution images and we should not expect to have reliable initial disparity estimates for thin structures.

The major question is how to introduce high frequency details into the low resolution images?



Magnitude-extended Laplacian Pyramid (MeLP)
:

   
Coarse frequency content of a small image ''Packing'' high frequency information into a small image
Consecutive levels in the Laplacian pyramid are the high-passed version of the original image at different resolutions, and thus represent different frequency bands. Computing energy magnitude of the high-passed image effectively ''packages'' high frequency information into a small image.

In the Laplacian pyramid, smaller image contains coarser scale information. But is there some way to ''package'' the fine resolution content of the large image into the smaller image? It turns out that computing the magnitude energy image of the original high-passed image would do (see figure above).
Coarse frequency content of a small image ''Packing'' high frequency information into a small image
Bulding block of MeLP. Recursive construction of MeLP.

This idea allows us to recursively construct a pyramid in two directions: (i) scale, which is the same as in a typical Laplacian pyramid (ii) magnitude, which involves extracting a half-sized energy image from a high-passed image (see figure above). This pyramid is called Magnitude-extended Laplacian Pyramid MeLP for quite obvious reasons.
MeLP stereo processing
Stereo Processing using MeLP.

It is rather simple to perfom stereo processing using MeLP. In case of LP, disparity of every point at the finer level has to be refined based on the only hypothesis coming from the coarser level. The estimation itself progresses from coarse to fine level in a linear fashion. In case of MeLP, there are two hypothesis coming from the scale and magnitude directions -- the disparity at the finer level is the refinement of the hypothesis that ends up resulting in a better match score. Note that these hypotheses are usually identical, especially for natural scenes, and only different in case of thin structures, errors, and sometimes 3D boundaries. The CTF disparity estimation process is a leaf-to-root traversal along the pyramid, where root is the image with the highest resolution.



Thin structures versus Transparency
:

    It has been widely accepted that image of transparent surfaces can be percieve as an additive combination of the foreground and background surfaces. However, thin structures at coarser levels (which are result of convolution and subsampling) are represented in an identical fashion -- that is why thin structures are sometimes called pseudo-transparency. This makes transparencies and thin structures to be treated in an identical manner in our computational framework. All we need to do is to allow multiple disparity estimates at every point!

Left Right Disparity 1 Disparity 2 3D view
Synth Transparency Synthetic Left Transparency Synthetic Rigth TS disparity 1 TS disparity 2 TS 3D view
Real Transparency Real Left Transparency Real Rigth TR disparity 1 TR disparity 2 TR 3D view

Disparity maps for synthetic and real scenes with 2-layer transparency recovered with the proposed Magnitude-extended Laplacian Pyramid (MELP) with disparity search range 1. Multiple disparity estimates were allowed while traversing the MELP tree-like structure -- results for the first two refined hypotheses are shown with red pixels meaning undefined (i.e. only single distict disparity estimation is available at these points). Note that disparity maps are not grouped based on geometry and explicit 3D reconstruction is required to uncover the scene structure -- in our case, to enhance visualization we robustly fit two planes, because both scenes are composed of two planes.



Related Papers:

Mikhail Sizintsev, Hierarchical Stereo with Thin Structures and TransparencyFifth Canadian Conference on Robots and Vision (CRV), pages 97-104, 2008. (short version)

Software:

Magnitude-extended Laplacian Pyramid construction MATLAB code (to appear)


Last updated: April 24, 2009.