This special issue includes a selection of papers from SCCG 2004 conference chaired by Prof. Alexander Pasko (Hosei University, Tokyo). The Spring Conference on Computer Graphics is the oldest annual event in Central Europe and the proceedings are later published by ACM SIGGRAPH. This is possible thanks to the fruitful combination of high quality contributions and generous sponsoring from HP Invent Slovakia. The conference celebrated 20th anniversary in 2004. More details can be found at www.sccg.sk.

There are two competitions organized during the conference – SCCG Best Papers and SCCG Best Presentations. They are based on evaluation by reviewers and public voting of SCCG participants. Awarding of winners is a part of closing ceremony and the diplomas with logos of sponsors are available at www.sccg.sk, as well.  As proposed by Alexander Pasko and accepted by the editor-in-chief, Prof. Victor V.  Pilyugin, the winning papers are published in special issue of CGG, a prominent online journal at http://elibrary.ru/cgg. The papers are slightly extended and rewritten, based on SCCG discussions and inspirations. After completing the selection, one can see that the unifying idea of all five papers awarded can be formulated as discovering the tricky solutions between speeding-up (modeling) and rendering quality criteria.

William Van Haevre et al. dealt with ray density estimation for plant growth simulation. In particular, they evaluated the varying indoor environment illumination while growing the plants using intensity-modified rules for L-systems. The novel approach results in a flexible and accurate algorithm to achieve more realistic vegetation. The paper won the 3rd Best Presentation Award.

Mario Sormann et al. focused on a solution of a complex task – creating models from image sequences as fast and as good as possible. VR modeler is a novel interactive monocular 3D modeling system with nicely separated intelligent 2D interaction and 3D reconstruction. Besides that, the coarse and detailed precision of urban models is supported for web presentation and other purposes. The results already contributed to  Virtual Heart of Central Europe (www.vhce.info) which is a recent European cultural heritage project.  The paper won the 3rd Best Paper Award.

Rui Rodrigues and Antonio Ramires Fernandes report on prospective use of graphics cards. A significant part of 3D reconstruction, especially epipolar geometry computations, can be transferred into the GPU. This new idea offers a remarkable gain up to two orders of magnitude in terms of computational times. The paper won the 2nd Best Presentation Award.

Ivan Viola et al. explored frequency domain volume rendering (FVR) because of computational speed. Moving significant parts of computations to GPU, they report acceleration by factor of 17. This allows for highly interactive framerates with varying rendering quality. The quality depends on interpolation schemes. The authors analyzed four of them to clarify the trade-off between performance and quality. The paper won the 2nd Best Paper Award.

Last but not least, Diego Gutierrez et al. contributed by a SIGGRAPH quality paper on global illumination for inhomogeneous media. In total, there are 10 different light-object interactions known and we simplify the model to achieve faster solutions. The authors noticed that light rays travel a curved path while going through inhomogeneous media where the index of refraction is not constant. In addition, they took into account the way how human perception deals with luminances. In total, the phenomena like sunset, green flash, and bleaching are mastered to complete an excellent research and a brilliant presentation. This is why only five papers are here – Diego clearly won in both competitions.  

For conclusion, I have to recall the following. In 2003, one year ago, this message from Alexander Pasko arrived to Budmerice Castle:

“Dear participants and organizers of SCCG, your conference provides unique opportunity for young researchers to make their efforts visible in the world, especially for those who are not hypnotized by the visual quality of modern computer graphics works in modeling, rendering, and animation. We all know that such a work still requires tedious manual labor hampered by errorneous models and algorithms. Let us hope that the next spiral of development will make our work in computer graphics more close to a joyful mind game.”

I have to thank again to Alexander and to all people who contributed to SCCG 2004 in the spirit of these beautiful and clever words.

 Andrej Ferko Comenius University Bratislava, SK-842 48 Bratislava, Slovakia, ferko@fmph.uniba.sk, www.sccg.sk/~ferko



Projective Texturing for the Acceleration of Epipolar Geometry Computation in 3D Reconstruction

 


Rui Rodrigues,
Departamento de Informática, Universidade do
Minho, Braga, Portugal
ruirodrigues@di.uminho.pt

António Ramires Fernandes,
Departamento de Informática, Universidade do
Minho, Braga, Portugal
af@di.uminho.pt


Contents


 

Abstract

The process of 3D reconstruction, or depth estimation, is a complex one. Many methods often have several parameters that may require fine tunning to adapt to the scene and improve reconstruction results. Usability of these methods is directly related to their response time. Epipolar geometry, a fundamental tool used in 3D reconstruction, is commonly computed on the CPU. We propose to take advantage of the advances of graphic cards, to accelerate this process. Projective texturing is used to transfer a significant part of the computational load from the CPU into the GPU. The new approach is illustrated in the context of a previously published work for 3D point reconstruction from a set of static images. Test results show that gains of up to two orders of magnitude in terms of computation times can be achieved, when comparing current CPU's and GPU's. We conclude that this leads to an increase in usability.

Keywords: Epipolar Geometry, Projective Texturing, Depth Estimation, 3D Reconstruction, OpenGL

 

1 Introduction

The reconstruction of view-independent 3D information from a set of static images has taken the attention of researchers for some years [Faugeras 1993Hartley and Zisserman 2000]. The approaches are multiple and with different applications. Many techniques are based on establishing correspondences between entities from different images [Redert et al. 1999]. Common corresponded entities can be points [Kawamoto and Imiya 2001Tell and Carlsson 2000Rodrigues et al. 2002], lines or contours [Sato and Cipolla 1999Schulz-Mirbach and Weiss 1994], and rectangular blocks or segments [Smith 1998]. If a pair of entities from two distinct images correspond, and the camera position, orientation and intrinsic parameters for two images are known (or estimated), it is possible to compute the position of those entities in 3D space. This position can be expressed directly in a 3D referential or as a depth – the distance relative to a reference camera. However, each of these entity types has some type of ambiguity associated such that, given two entities in different images, it is not straightforward to determine whether they correspond or not [Redert et al. 1999].

Determining correspondences between a large number of entities (possibly thousands), in a large number of images (tens), is a time consuming process. Furthermore, the basic geometric theory does not anticipate the problems one encounters when dealing with real scenes, such as noise in the images, approximate camera calibration and perspective distortions, occlusions and highlights. Methods dealing with contours, for instance [Sato and Cipolla 1999Schulz-Mirbach and Weiss 1994Rodrigues et al. 2002], may encounter additional problems with contour extraction.

In addition, for a given technique, the heterogeneity of inputs requires the tuning of a series of parameters (either automatically or interactively) to improve the reconstruction results. Many issues may arise: is the image set sufficient/appropriate for the reconstruction of all the details of the scene? What are are the best parameter settings for a particular scene?

Getting user (or automatic) feedback to deal with these issues calls for interactive reconstruction rates. The longer a method takes to provide a reconstruction the less interactive it becomes. Hence, it can be argued that usability of these methods is related to response time. Achieving real-time 3D reconstruction using off-the-shelf systems would also give way to a wide range of applications, namely in terms of image and scene transmission, such as live 3D video broadcasting and conferencing, or mixed-reality computer games.

In view-dependent 3D reconstruction methods, real-time reconstruction is close to being achieved [Lok 2001Yang et al. 2002]. For view-independent methods, a more realistic goal in the near future is to achieve interactive response times, allowing for fast turnaround time and for off-line (yet fast) reconstruction. This paper presents a method that accelerates a class of view-independent approaches, in order to add interactivity in this sense.

Due to the geometric nature of the computations involved in the reconstruction process, one way to improve reconstruction rates is to explore graphics hardware acceleration using off-the-shelf graphic cards (GC's).

The computational power of consumer-grade graphics hardware (or commodity graphics hardware) has grown tremendously in recent years, and continues to do so. The gaming industry is one of the main forces that pushes the development of graphics hardware, and contributes to its wide availability.

Given the amount of geometry processing involved in reconstruction tasks, we propose to use a well known 3D graphics API (GAPI) - OpenGL [Shreiner 2000] - as the platform to perform (batch) geometric computations. The advantages are twofold: we will be using a language that (1) is naturally oriented to 3D geometry processing, and (2) serves as a seamless link to hardware acceleration, given the wide API support provided by graphic cards manufacturers, and the ever-growing capabilities of graphic cards.

We therefore propose to explore the potential of GCs, using projective texturing [Segal et al. 1992], to accelerate some of the tasks relating to 3D reconstruction of points using epipolar geometry. The use of projective texturing in the context of 3D reconstruction has been explored previously, namely by [Lok 2001], but it was used for a different purpose: determining a convex hull for collision detection.

Using projective texturing for epipolar geometry related computations, a significant amount of the computational load is transferred from the CPU to the GC. Tests performed with an implementation based on the point reconstruction method presented in [Rodrigues et al. 2002] show the very significant performance gain obtained – up to two orders of magnitude – and the potential of the method proposed in this paper.

Section 2 describes the fundamentals of epipolar geometry and how it is used for 3D or depth reconstruction. The application of projective texturing in the context of epipolar geometry is described in section 3. Tests and results of our proposal using a case study are presented in section 4. Finally conclusions and directions for future work are presented in section 5.

 

2 Epipolar Geometry for 3D Point Reconstruction

Typical reconstruction algorithms rely on finding correspondences between entities in two images. In this article we focus on the correspondence of points.

If a point pi in image Ii is corresponded with a point qj in Ij, and camera positions Ci and Cj are known, then the corresponding reconstructed point (r-point) rip is easily determined (in the ideal case) by triangulation using the information of the cameras.

However, finding correspondences between entities of different images is not trivial. On the one hand, errors in the process of image acquisition and camera calibration may lead to distortions of the entities, resulting in missed correspondences (false negatives). It may also happen that ambiguities in the entities corresponded lead to incorrect correspondences (false positives). Whenever possible, it is useful to add constraints to help reducing this correspondence problem.

Epipolar geometry allows the search for correspondences of a pi to be constrained to a subset of the entities in image Ij. In this section we will present the basics of epipolar geometry. A more detailed analysis of epipolar geometry can be found in [Hartley and Zisserman 2000].

 

2.1 Epipolar Geometry for Two-Image Point Reconstruction

Consider two images Ii and Ij, with known cameras Ci and Cj, and a point pi of Ii (see Figure 1). Let Vpi be the view ray of pi associated to Ci (the half-line starting in the centre of Ci and going through pi). The epipolar line Ei,jp is the projection of Vpi in Ij. Points vipx belonging to Vpi are projected in Ei,jp, and points pi,jy of Ei,jp can be projected back to Vpi, using Cj .

 


PIC
Figure 1: Epipolar Geometry


In terms of correspondence search, this means that the correct correspondence for pi in Ij has to be in Ei,jp.

Assume there is an error function ei,jpx that quantifies for a point pi,jx the error of corresponding it with pi (a possible error function is discussed in section 2.3 ). In this case, determining the best correspondence of pi could amount to minimize that error function over a set of candidates pi,jx. In the two-image setting, this could be done by evaluating the error function at the points belonging to the epipolar line Ei,jp, as shown in Algorithm 1 .


  • Let Vpi be the view ray of pi associated to Ci
    • Project Vpi in Ij as the epipolar line Ei,j p
    • choose a set of candidate image points pi,j 1 , pi,j 2 , ... from Ei,j p
    • Store ei,j px as the error of pi,j x in Ij
  • The correspondence pi,j c for pi is selected amongst the candidates with lower ei,j px
  • The reconstructed point (r-point) ri p corresponding to pi is in the intersection of the view rays V pi and Vpi,j c

Algorithm 1:

Reconstruction using epipolar line candidates


2.2 Epipolar Geometry for Multiple-Image Point Reconstruction

For simple entities such as points, the information provided by two images may be insufficient to determine correspondences reliably. It is possible to reduce ambiguity by extending point correspondence to multiple images. In this case, choosing correspondence candidates from epipolar lines in different images gives rise to the problem of fusion: if pi corresponds with qj in Ij and with ok in Ik, the associated r-points rip,j and rip,k will in general not coincide, due to multiple error sources. These two points would have to be fused together in an additional step.

A possible solution for the fusion problem, as suggested in [Rodrigues et al. 2002], is to choose candidates vipx directly from the view ray, project each of them in the images, and accumulate the corresponding error to vipx (see Algorithm 2 ).


  • Let Vpi be the view ray of pi associated to Ci
  • Choose a set of candidate world points vi p1, vi p2, ... from Vpi
  • For each image Ij (j/=i)
    • For each candidate world point vi px
      • Project vi px in Ij as pi,j x
      • Let ei,j px be the error of pi,j x in Ij
      • Add ei,j px to the total error di px of vi px
  • The r-point ri p corresponding to pi is selected amongst the candidates with lower total error values di px

Algorithm 2:

Reconstruction using view ray candidates


This approach has an important advantage: the correspondence, reconstruction and fusion stages are merged into one. When the best candidate is chosen, it is already a reconstructed point (r-point) and that choice already takes into account the contribution of multiple images.

2.3 Distance maps as error functions

The error function chosen to assess the correspondence between points in different images is the distance of the projected candidate to the closest contour (as suggested in [Rodrigues et al. 2002]). This means that, for a given candidate vipx of a contour point pi from Ii, the individual error ei,jpx contributed by Ij is the distance of pi,jx (the projection of vipx in Ij) to its' closest contour in Ij.

There are several advantages in the use of distance as the correspondence error function. First, it introduces an implicit tolerance to contour displacements originated by faulty contour detection or errors in camera calibration. Second, this distance function can be easily and quickly computed in the form of a distance map using distance transforms [Borgefors 1984]. Furthermore, since it is independent of the original point pi, the distance map can be pre-computed, and it only has to be computed once for each image, regardless of the number of points to be reconstructed, or the number of candidates per point.

In this work, the error maps that will be projected over the view rays are the distance maps generated from the contours of all images. Those maps are precomputed once and used for the reconstruction of any point.

3 Projective Texturing and Epipolar Geometry

In section 2 we have seen the application of epipolar geometry to 3D point reconstruction. Epipolar geometry is traditionally computed on the CPU. We propose to perform in the GC a significant part of the computations required.

The algorithm presented in section 2 requires that a view ray was densely sampled into a set of 3D candidates, and that each of those candidates were evaluated according to errors accumulated from the n error maps associated with the images.

The evaluation of the error function for a given 3D candidate requires n projections in error maps, n readings of error map values (or n ×  4 readings and n interpolations) and n sums.

Two features of GC's can be explored to accelerate this computation: first the acceleration provided for operations such as projection, and secondly the pipelined architecture that allows for parallel computing.

GC's provide acceleration for a common graphic procedure known as projective texturing [Segal et al. 1992] that allows one to project images (textures) over 3D geometry, just as a slide projector casts an image over the surfaces in front of it (Figure 2).

 


PIC

 

Figure 2: Projective Texturing. An image is cast over geometry as if it was a slide on a projector

 


This procedure can be used to project an error map Dj from the position of the associated camera Cj over a candidate point vipx. By projecting the error map as a texture, the color value that is assigned to vipx will be a representation of the error value corresponding to pi,jx (the projection of vipx in Ij) on the error map (Figure 3 ).

 


(a)PIC (b)PIC

Figure 3: Projective Texturing for reconstruction.
  
(a) An error map is projected as a texture over a view ray (the ray is represented as thick to show the errors projected)
  
(b) A close-up of the ray with the projected errors coded as intensity (top) and the corresponding error graph (bottom)


GC's make this procedure extremely fast. Furthermore if multiple candidates are packed and sent to the graphics card simultaneously then the pipelined architecture allows for another degree of acceleration. Hence the more candidates are processed on one pass, the more effective the acceleration. Since all 3D candidates are in the 3D line defined by the view ray for a particular contour point, rendering the view ray is a natural option to render all the candidates.

Error accumulation for multiple error maps is also possible with GC's using additive blending. Additive blending can be used to accumulate texture values coming from multiple sources, and projected on the same point. Using the projector analogy, texture additive blending is similar to the effect resulting as lights of different intensity or color are cast from different projectors on a neutral surface, blending together into a new color (Figure 4 ).

 


PIC

Figure 4: Accumulation of projections. Multiple projectors casting different images over the same object result in the images being accumulated

 


In this section we will show how to apply multiple projective texturing with additive blending, to enable the implementation of error measure and accumulation over multiple images for a complete set of candidates in a view ray, in the context of the work in [Rodrigues et al. 2002].

The following sub-sections detail the building blocks of this solution:

  • the use of projective texturing with error maps
  • multiple projective texturing and the use of additive blending for error accumulation
  • the proper setup of buffer and rendering parameters to simplify the post-processing of the values read from the GC's image buffer

3.1 Projective texturing

Projective texturing allows an image to be projected in geometry, providing an effect similar to a slide projector. In the context of epipolar geometry, projective texturing allows one to reverse the common approach, ie instead of projecting a view ray V ip onto an image Ij, an image is projected onto the view ray. The part of the image that is actually projected on the view ray V ip corresponds to the epipolar line Ej,ip of the point pi that originated V ip. By projecting an error map as an image onto the view ray, the errors corresponding to that view ray's candidates are projected on the 3D points themselves. In this case, each camera acts as a projector.

Texturing is a common graphic task supported in GC's. Projective texturing is a special case of texturing in which the texture coordinates of a point are computed based on the point's position in the world, and the parameters of the projector. More specifically, projective texture homogeneous coordinates s, t, r and q are computed by multiplying the point's homogeneous coordinates x, y, z and w by a transformation matrix T that corresponds to the projection of the 3D point in the image plane of the projector (Equation 1 ).

 |_ _| |_ _| s x t y |_ r _| = T |_ z _| q w

(1)

The projector's intrinsic and extrinsic parameters are the same of the corresponding camera. For this reason, computing projective texture coordinates is very similar to computing a vertex's projection in an image using the vertex's position and the corresponding camera's parameters.

The texture transformation is usually divided in four transformations (Equation 2 ), represented here by their respective matrices, namely:

 

Model Matrix (M)

The model matrix is used to transform the point's coordinates from object space to world space. A given point may be specified relative to an object referential which is in turn defined relative to a world referential 1

View Matrix (V)

The view matrix transforms the world-space coordinates in projector-space (or camera-space) coordinates. The base referential for the resulting coordinates is now the camera's (or projector's) position and orientation. This matrix contains the information of the projector's extrinsics.

Projection Matrix (P)

The projection matrix converts the view-space coordinates into texture-space 2D coordinates. This matrix encodes the projector's intrinsics

Normalization Matrix (N)

This matrix scales and offsets the image-space coordinates, such that they are mapped in the normalized range of [0..1] x [0..1]

T = NPVM

(2)

For the purpose of view ray texturing in the context of reconstruction, P and V are dependent only on the projector's parameters, which are constant for each camera. Furthermore, the Object space is made coincident with the World space, making M the identity matrix, hence also constant. Finally, N also consists of constant factors, for each camera. Since all the matrices involved are immutable for a given camera acting as a projector, the complete texture transformation matrix Ti may be computed for each camera Ci in advance and stored, to save computation time.

The use of projective texturing for the computation of the errors of the candidates of a point pi using a single error map Dj associated to Cj is described in Algorithm 3 .


  • Pre-process
    • For each camera Cj
      • Create OpenGL texture Texj using the error map Dj
      • Compute Tj = Nj Pj Vj
    • Configure OpenGL to generate projective texturing coordinates and enable additive blending
  • Reconstruct pi of Ii using camera Cj
    • Bind texture Texj
    • Load Tj as OpenGL texture matrix
    • Render the view ray Vpi (including all candidates) in an OpenGL buffer

Algorithm 3:

Error projection using projective texturing


After performing this algorithm, the GC's buffer will contain each candidate represented with a value corresponding to its error in Dj. Section 3.3 presents a proper buffer and camera setup to simplify the relation between a candidate's position in the buffer and its position in the world.

3.2 Error Accumulation for Multiple Cameras

Section 3.1 describes the process of projecting errors from one error map in a given candidate, considering a two-camera setting. To extend the method to a multiple-camera setting, textures (error maps) projected from different projectors (cameras) over a given candidate have to be accumulated. This accumulation can be performed on the same buffer, using additive blending.

Additive blending adds the texture values of newly rendered geometry to the values already existing in the buffer. In the context of reconstruction, a solution to incrementally accumulate errors (texture values) in the buffer is to render the same candidate multiple times using additive blending, each time textured with a different error map projected from its corresponding camera. In this way the error values from multiple cameras are accumulated for each candidate.

 


(a)PIC (b) PIC

Figure 5: Projective texturing with accumulation for reconstruction.
  
(a) Two error maps are projected as textures using accumulation over a view ray (the errors' origin is represented by different colors)
  
(b) A close-up of the ray with the projected and accumulated errors


Algorithm 4 describes the method to accumulate errors from multiple images using projective texturing and additive blending.


  • Pre-process
    • For each camera Cj
      • Create OpenGL texture Texj using the error map Dj
      • Compute Tj = Nj Pj Vj
    • Configure OpenGL to generate projective texturing coordinates and enable additive blending
  • Reconstruct pi of Ii using all cameras Cj, j/= i
    • For each camera Cj, j/= i
      • Bind texture Texj
      • Load Tj as OpenGL texture matrix
      • Render the view ray Vpi (including all candidates) in an OpenGL buffer

Algorithm 4:

Error projection using projective texturing and additive blending


3.2.1 Channel masking

There are limits to the maximum error value that can be represented in textures and rendering buffers of GC's. One of the most supported pixel buffer formats is the RGB (Red-Green-Blue) buffer, which is composed of three 8-bit channels (one for each of the colors). This means that values stored on this buffer are limited in the range of (0, 0, 0) to (255, 255, 255).

Since each color channel is independent, accumulation can be done on a per-channel basis. To achieve this, color channel masking is used: error maps are coded as gray-scale maps, and each error map is only projected in one of the buffers' color channels (red, green or blue). This allows to distribute the error accumulation over the different channels, in order to use more of the 24-bit accumulation space (resulting on an effective upper limit of 3 ×  256 = 768). The cameras' distribution between the different channels is not relevant, as long as it is guaranteed that accumulation overflow does not occur (ie a given channel does not accumulate to an error value higher than 255).

For instance, if error values are clamped to an upper limit of 10 units, at least 25 cameras can be safely accumulated in a single color channel (25 × 10 < 255), resulting in at least 75 cameras considering the three channels.

 

3.3 Definition of the Candidates' Rendering Buffer and camera

The view ray, with the projected and accumulated error maps, has to be rendered in a buffer using a virtual camera to capture the rendered geometry. The final accumulated errors have to be read from this buffer to choose the best candidate. Therefore, special care should be taken in choosing the representation of candidates, the buffer, and the virtual camera parameters such that:

  • All candidates in a view ray segment range are rendered
  • The CPU post-processing required for reading and analyzing the accumulated errors and selecting the best 3D candidate is minimized

To achieve these goals, some factors should be taken into consideration:

  • Only a segment of the view ray is of actual interest to the reconstruction
  • The smallest entity that can be represented on a buffer (and for which errors may be accumulated) is a pixel
  • There are limits to the size and shape of a buffer
  • Whichever candidates are chosen, they will all be collinear in the buffer

To render a segment of the view ray, only the start and end 3D points have to be passed to the GC as the vertexes of a line. The corresponding connecting line is then rendered in the buffer by the GAPI. Since pixels are the smallest entities that can be represented in the buffer, each pixel rendered from the segment actually becomes a candidate.

The start and end points of the segment can be defined based on previous knowledge of the dimension of the scene being reconstructed. If such information is not available, a large segment can be used initially to obtain a conservative estimate of the scene's dimensions.

Given the collinearity of candidates and the rectangular shape of buffers, the best way to maximize buffer usage is to set it up such that the segment is rendered as a straight line aligned to one of the viewport's axis (e.g. an horizontal line). Furthermore, the final reconstruction is simplified if a pixel selected from the buffer can be easily associated to the corresponding 3D candidate.

Using an orthographic camera CO properly placed relatively to the segment allows the accomplishment of these goals. Such a setup makes it possible to have a segment filling an entire 1-pixel-high buffer, and to associate a pixel to a 3D candidate based on the pixel's x coordinate in the buffer.

 

3.3.1 Extrinsic parameters

The extrinsic parameters of CO are defined as a function of the position, direction and length of the segment (Figure 6 ). Let DIR be the normalised vector of the direction of the view ray. The lookat point is the middle point of the segment in 3D. The camera can be placed in any position pos in the plane containing the lookat point and perpendicular to DIR. The LOOK vector of CO is defined as the unit vector pointing from pos to the lookat point. The RIGHT vector of CO is parallel to DIR. The UP vector of CO is set perpendicular to RIGHT and LOOK. A graphical representation of the setup of the orthographic camera CO is depicted in Figure 6. Assuming the view ray is parallel to the paper plane, LOOK and RIGHT are also parallel to the paper, and UP is perpendicular to the paper.

 


PIC

Figure 6: Orthographic camera (CO) for view ray segment rendering.


Pseudo-Code 1 represents the setup of the extrinsic parameters of CO, using notation inspired in the OpenGL API.


DIR = normalize(end - start);
LOOK = DIR ×Ci.UP;
lookat = (start + end) / 2;
pos = lookat - LOOK;
UP = DIR ×LOOK;
gluLookAt(pos, lookat, UP);

Pseudo-Code 1:

Orthographic camera's extrinsic parameters' setup in OpenGL


The gluLookAt call defines the camera's position to be pos, and the orientation is specified by a lookat point, which is the center of the segment, and the UP vector.

3.3.2 Intrinsic parameters

Assume now that the segment's length in 3D, segLen3D has been defined or computed as suggested inSection 3.3 . The segment size in pixels, segSize2D, corresponds both to the number of candidates that will be rendered, and to the width of the buffer to be used. This number is directly related to the final precision of the reconstruction itself. For a given predefined segLen3D, the larger the set of candidates, the higher the reconstruction precision.

The intrinsic parameters of CO (namely the clipping planes and the viewport) are defined based on segLen3D and segSize2D. Each pixel corresponds to a 3D candidate of the view ray. Pseudo-Code 1 represents the setup of the intrinsic parameters of CO, using notation inspired in the OpenGL API.


segLen3D = |end - start|;
glViewport(0,0,segSize2D,1);
glOrtho( -segLen3D/2 , segLen3D/2 , -0.5 , 0.5 , 0.5 , 2 );

Pseudo-Code 2:

Orthographic camera's intrinsic parameters' setup in OpenGL


The glViewport call defines the buffer's start coordinates and dimensions in pixels. The glOrtho call defines an orthographic camera with left, right, bottom, top, near and far clipping panes. These planes define the volume of space that will be rendered on the buffer. Left, right, bottom and top planes are defined such that they comprise exactly the length and "thickness" of the line segment. Near and far planes are defined such that the segment lies between them.

3.3.3 From 2D pixel buffer coordinates to 3D candidates

Having defined CO as above, the segment's projection will completely fill the buffer, which will be one-pixel tall. In those conditions, a 3D point rip in the view ray can be easily derived from its corresponding pixel with 2D window coordinates (x,0) using Equation 3 .

 ( ) i segLen3D-- rp = start+ x * segSize2D *DIR

(3)

After setting the orthographic camera and the corresponding buffer, the view ray's segment can be rendered on that buffer using the projective texturing with additive blending procedure with multiple cameras, as described in Algorithm 4 . As a result, the buffer will contain an RGB value for each pixel of the segment. Each RGB value encodes the accumulated error as described in Section 3.2.1 . The GAPI provides a call to read the buffer data from the GC into a main memory array. If the errors are distributed over more than one channel, each candidate's final error is computed by adding the corresponding value of each channel, at the same pixel position. After doing so, the right candidate may be chosen from the candidates with the lowest error iterating over the resulting array. The array positions correspond to the x coordinates in the buffer, and therefore can also be used to determine the associated 3D candidate using Equation Equation 3 .

 

4 Tests and Results

To assess the performance improvements achieved with projective texturing we implemented a point reconstruction algorithm based on [Rodrigues et al. 2002], with two variants: a CPU-based one, and a GPU-based (using projective texturing). These variants were submitted to two tests.

The first test serves to show the influence of varying the number of candidates used. The number of candidates plays a major role in the computational load of the reconstruction, but also in its quality, as referred in section ??. The scene's setup consists of a set of images of 512 x 512 pixels of a textured version of the Stanford model repository' bunny, taken from 70 virtual viewpoints around the bunny (Figure 7 (a)).

 


(a) PIC(b) PIC

Figure 7: "Bunny": (a) The synthetic test model and the camera positions and orientations; (b) a view of multi-image point reconstruction

 


Scenes with high level of detail may require a large number of candidates for a proper reconstruction. For this reason, reconstruction was performed using 400, 800 and 1200 candidates, for 16826 contour points of the first image. (Figure 7 (b) illustrates the achieved final reconstruction of contour points from multiple images).

The results in Table 1 show the time in milliseconds required for reconstructing all the 16826 contour points evaluating 400, 800 and 1200 candidates in 69 images each, using the two implementation variants. The test was performed on System A, a Pentium IV computer running at 2.53 GHz, with a GeForce FX 5900 graphics card.


Nr. of Candidates per point

400

800

1200





CPU time

177171

347461

520482





GPU time

4807

5938

6574





CPU/GPU

36.85

58.51

79.17

 

 

 

 

 

Table 1:

Performance results for reconstructing 16826 contour points, evaluating 400, 800 and 1200 candidates in 69 different images (time in milliseconds)


From the table is clear that the time required by the GPU implementation does not increase linearly with the number of candidates, as opposed to the CPU variant. In fact, using the largest set of candidates (1200) more than doubles the advantage of using projective texturing over the CPU implementation, when compared to the smallest set (400).

The second test compares the performance under different hardware settings, to show that even with older GPU's, there is still a major advantage in using projective texturing. Two hardware systems were used: System A is the same system used in the first test. System B consists of a Pentium IV Celeron computer running at 2.4 GHz, with a GeForce 256, an older graphics card.

In this test, a real scene consisting of a small wooden boat was used (see Figure 8 (a)). A set of 39 images of 1024 x 768 pixels was captured around the boat using a digital camera.

 


(a) PIC(b)PIC

 

Figure 8: "Boat": (a) an image of the test scene; (b) a view of a multiple-image point reconstruction


Reconstruction was performed for 21169 contour points of the first image in each of the systems, and using both implementation variants. Table 2 shows the time results in miliseconds of both implementation variants in the two hardware systems. These times include the reconstruction of all the 21169 contour points, by evaluating 1200 candidates for each point, in 38 images. Figure 8 (b) shows a virtual view of the final reconstruction using contour points from multiple images.


System

A

B




CPU time

343036

377052




GPU time

6451

69874




CPU/GPU

53.17

5.39

 

 

 

 

Table 2:

Performance results of the reconstruction of 21169 contour points, by evaluating 1200 candidates for each point in 38 images, in different hardware systems (time in milliseconds)


The results show that, even when using older graphics systems, significant performance gains are obtained with the GPU based implementation, although to a lesser extent.

5 Conclusions and Future Work

Improving performance of 3D reconstruction methods is of growing importance, both as a step towards real-time applications and as a mean for better exploration and interactivity of reconstruction methods. The process of 3D reconstruction is a complex one, and many methods often have several fine tuning parameters. Hence the feasibility of a method is also related to its response time. Fast response times allow the user to fine tune the parameter settings and adjust the image input set to obtain the best reconstruction results. We proposed the application of projective texturing – a technique commonly available in current hardware graphic cards and API's – to perform some crucial steps of reconstruction, namely the computation of epipolar geometry.

The technique was tested using CPU-based and GPU-based implementations (without and with projective texturing, respectively) of a modified version of the point reconstruction algorithm presented in [Rodrigues et al. 2002]. The results show gains of up to two orders of magnitudes in response time, when applying the method in current off-the-shelf CPU's and GPU's. As an example, computing three reconstructions of a synthetic scene, with 400, 800 and 1200 candidates per point, takes less than 18 seconds using the GPU-based implementation, versus over 1000 seconds on the CPU-based implementation. The improved response time provided by the use of projective texturing enables more extensive adjustment of reconstruction settings, when comparing to the CPU-based implementation. We conclude that this leads to an increase in interactivity and usability.

We now plan to explore multi-texturing and graphics hardware programming to further enhance performance, and extend the role of the GPU to other stages of reconstruction. Another possible extension to the algorithm is to compute the 3D reconstructions of multiple points simultaneously, by projecting textures in multiple view rays at a time in the same buffer and reading them in a single pass. This may seem a trivial solution for increasing performance even further, since the projective texturing would be faster for simultaneous multiple rays. However, two problems arise: how to define an appropriate buffer to record the errors of multiple view rays simultaneously and how to read back and interpret the errors. We would like to explore methods to solve these problems.

 

Acknowledgments

This work was funded by FCT (the Portuguese Science and Technology Foundation) under Grant Praxis XXI/BD/20322/99. The authors also wish to thank the support of Philips Research Laboratories (Eindhoven, The Netherlands).

 

References

 

   BORGEFORS, G. 1984. Distance transformations in arbitrary dimensions. Computer Vision, Graphics, and Image Processing 27, 3, 321–345.

   FAUGERAS, O. 1993. Three-Dimensional Computer Vision: A Geometric Viewpoint. MIT Press, Cambridge, Massachusetts.

   HARTLEY, R., AND ZISSERMAN, A. 2000. Multiple View Geometry in computer vision. Press Syndicate of the University of Cambridge.

   KAWAMOTO, K., AND IMIYA, A. 2001. Detection of spatial points and lines by random sampling and voting procedure. Pattern Recognition Letters 22, 2 (February), 199–207.

   LOK, B. 2001. Online model reconstruction for interactive virtual environment. In Proceedings 2001 Symposium on Interactive 3D Graphics, 69–72.

   REDERT, A., HENDRIKS, E., AND BIEMOND, J. 1999. Correspondence estimation in image pairs. IEEE Signal Processing Magazine 16, 3, 29–46.

   RODRIGUES, R., FERNANDES, A., VAN OVERVELD, K., AND ERNST, F. 2002. Reconstructing depth from spatiotemporal curves. In Proceedings of the 15th International Conference on Vision Interface, 252–259.

   SATO, J., AND CIPOLLA, R. 1999. Affine reconstruction of curved surfaces from uncalibrated views of apparent contours. IEEE Trans. Pattern Analysis and Machine Intell. 21, 11 (November), 1188–1198.

   SCHULZ-MIRBACH, H., AND WEISS, I. 1994. Projective reconstruction from curve correspondences in uncalibrated views. Technical Report TR-402-94-014, Technical University of Hamburg-Harburg.

   SEGAL, M., KOROBKIN, C., VAN WIDENFELT, R., FORAN, J., AND HAEBERLI, P. 1992. Fast shadows and lighting effects using texture mapping. In Proceedings of SIGGRAPH'92, ACM, E. E. Catmull, Ed., 249–252.

   SHREINER, D., Ed. 2000. OpenGL Reference Manual, OpenGL Architecture Review Board, Addison-Wesley.

   SMITH, S. M. 1998. ASSET-2: Real-time motion segmentation and object tracking. Real-Time Imaging 4, 1, 21–40.

   TELL, D., AND CARLSSON, S. 2000. Wide baseline point matching using affine invariants computed from intensity profiles. In ECCV (1), 814–828.

   YANG, R., WELCH, G., AND BISHOP, G. 2002. Real-time consensus-based scene reconstruction using commodity graphics hardware. In Proceedings of Pacific Graphics 2002.