|
This special issue
includes a selection of papers from SCCG 2004 conference chaired by Prof.
Alexander Pasko ( |
|
|
There are two
competitions organized during the conference SCCG Best Papers and SCCG Best
Presentations. They are based on evaluation by reviewers and public voting of
SCCG participants. Awarding of winners is a part of closing ceremony and the
diplomas with logos of sponsors are available at www.sccg.sk,
as well. As proposed
by Alexander Pasko and accepted by the editor-in-chief, Prof. Victor V. Pilyugin,
the winning papers are published in special issue of CGG, a prominent online
journal at http://elibrary.ru/cgg. The papers are
slightly extended and rewritten, based on SCCG discussions and inspirations.
After completing the selection, one can see that the unifying idea of all
five papers awarded can be formulated as discovering the tricky solutions
between speeding-up (modeling) and rendering quality criteria. William Van Haevre et al. dealt with ray
density estimation for plant growth simulation. In particular, they evaluated
the varying indoor environment illumination while growing the plants using
intensity-modified rules for L-systems. The novel approach results in a
flexible and accurate algorithm to achieve more realistic vegetation. The
paper won the 3rd Best Presentation Award. Mario Sormann et al. focused on a
solution of a complex task creating models from image sequences as fast and
as good as possible. VR modeler is a novel interactive monocular 3D modeling
system with nicely separated intelligent 2D interaction and 3D
reconstruction. Besides that, the coarse and detailed precision of urban
models is supported for web presentation and other purposes. The results
already contributed to Virtual Heart of Central
Europe (www.vhce.info) which is a recent
European cultural heritage project.
The paper won the 3rd Best Paper Award. |
|
|
Rui Rodrigues and Antonio Ramires Fernandes
report on prospective use of graphics cards. A significant part of 3D
reconstruction, especially epipolar
geometry computations, can be transferred into the GPU. This new idea offers
a remarkable gain up to two orders of magnitude in terms of computational
times. The paper won the 2nd Best Presentation Award. |
|
|
Ivan Viola et al.
explored frequency domain volume rendering (FVR) because of computational
speed. Moving significant parts of computations to GPU, they report
acceleration by factor of 17. This allows for highly interactive framerates with varying
rendering quality. The quality depends on interpolation schemes. The authors
analyzed four of them to clarify the trade-off between performance and
quality. The paper won the 2nd Best Paper Award. |
|
|
Last but not
least, Diego Gutierrez et al. contributed by a SIGGRAPH quality paper on
global illumination for inhomogeneous media. In total, there are 10 different
light-object interactions known and we simplify the model to achieve faster
solutions. The authors noticed that light rays travel a curved path while
going through inhomogeneous media where the index of
refraction is not constant. In addition, they took into account the way how
human perception deals with luminances.
In total, the phenomena like sunset, green flash, and bleaching are mastered to complete an
excellent research and a brilliant presentation. This is why only five papers
are here Diego clearly won in both competitions. |
|
|
For conclusion, I
have to recall the following. In 2003, one year ago, this message from
Alexander Pasko arrived
to |
|
|
Dear participants and
organizers of SCCG, your conference provides unique opportunity for young
researchers to make their efforts visible in the world, especially for those
who are not hypnotized by the visual quality of modern computer graphics
works in modeling, rendering, and animation. We all know that such a work
still requires tedious manual labor hampered by errorneous models and algorithms. Let us hope that
the next spiral of development will make our work in computer graphics more
close to a joyful mind game. |
|
|
I have to thank again to Alexander and to all
people who contributed to SCCG 2004 in the spirit of these beautiful and
clever words. |
|
|
Andrej
Ferko
Comenius University Bratislava, SK-842 48
Bratislava, Slovakia, ferko@fmph.uniba.sk, www.sccg.sk/~ferko |
|
Rui Rodrigues,
Departamento de Informática, Universidade do
ruirodrigues@di.uminho.pt
António
Ramires Fernandes,
Departamento de Informática, Universidade do
af@di.uminho.pt
1 Introduction
2 Epipolar Geometry for 3D Point Reconstruction
2.1 Epipolar Geometry for Two-Image Point
Reconstruction
2.2 Epipolar Geometry for Multiple-Image Point
Reconstruction
2.3 Distance maps as error functions
3 Projective Texturing and Epipolar Geometry
3.1 Projective texturing
3.2 Error Accumulation for Multiple Cameras
3.2.1 Channel masking
3.3 Definition of the Candidates' Rendering Buffer
and camera
3.3.1 Extrinsic parameters
3.3.2 Intrinsic parameters
3.3.3 From 2D pixel buffer coordinates to
3D candidates
4 Tests and Results
5 Conclusions and Future Work
Keywords: Epipolar Geometry, Projective Texturing, Depth Estimation, 3D
Reconstruction, OpenGL
The
reconstruction of view-independent 3D information from a set of static images
has taken the attention of researchers for some years [Faugeras 1993, Hartley and Zisserman 2000]. The approaches are multiple and
with different applications. Many techniques are based on establishing
correspondences between entities from different images [Redert
et al. 1999]. Common corresponded entities can be points [Kawamoto and Imiya 2001, Tell and
Carlsson 2000, Rodrigues et al. 2002], lines or contours [Sato and Cipolla 1999, Schulz-Mirbach
and Weiss 1994], and rectangular blocks or segments [Smith 1998]. If a pair of entities from two
distinct images correspond, and the camera position, orientation and intrinsic
parameters for two images are known (or estimated), it is possible to compute
the position of those entities in 3D space. This position can be expressed
directly in a 3D referential or as a depth –
the distance relative to a reference camera. However, each of these entity
types has some type of ambiguity associated such that, given two entities in
different images, it is not straightforward to determine whether they
correspond or not [Redert
et al. 1999].
Determining
correspondences between a large number of entities (possibly thousands), in a
large number of images (tens), is a time consuming process. Furthermore, the
basic geometric theory does not anticipate the problems one encounters when
dealing with real scenes, such as noise in the images, approximate camera
calibration and perspective distortions, occlusions and highlights. Methods
dealing with contours, for instance [Sato
and Cipolla 1999, Schulz-Mirbach and Weiss 1994, Rodrigues
et al. 2002], may encounter additional problems with
contour extraction.
In addition,
for a given technique, the heterogeneity of inputs requires the tuning of a
series of parameters (either automatically or interactively) to improve the
reconstruction results. Many issues may arise: is the image set
sufficient/appropriate for the reconstruction of all the details of the scene? What
are are the best parameter settings for a particular scene?
Getting user
(or automatic) feedback to deal with these issues calls for interactive reconstruction rates. The longer a method
takes to provide a reconstruction the less interactive it becomes. Hence, it
can be argued that usability of these methods is related to response time. Achieving
real-time 3D reconstruction using off-the-shelf systems would also give way to
a wide range of applications, namely in terms of image and scene transmission,
such as live 3D video broadcasting and conferencing, or mixed-reality computer
games.
In
view-dependent 3D reconstruction methods, real-time reconstruction is close to
being achieved [Lok 2001, Yang
et al. 2002]. For view-independent methods, a more
realistic goal in the near future is to achieve interactive response times,
allowing for fast turnaround time and for off-line (yet fast) reconstruction. This
paper presents a method that accelerates a class of view-independent approaches,
in order to add interactivity in this sense.
Due to the
geometric nature of the computations involved in the reconstruction process,
one way to improve reconstruction rates is to explore graphics hardware
acceleration using off-the-shelf graphic cards (GC's).
The
computational power of consumer-grade graphics hardware (or commodity graphics hardware) has grown tremendously in
recent years, and continues to do so. The gaming industry is one of the main
forces that pushes the development of graphics hardware, and contributes to its
wide availability.
Given the
amount of geometry processing involved in reconstruction tasks, we propose to
use a well known 3D graphics API (GAPI) - OpenGL [Shreiner 2000] - as the platform to perform (batch) geometric
computations. The advantages are twofold: we will be using a language that (1)
is naturally oriented to 3D geometry processing, and (2) serves as a seamless
link to hardware acceleration, given the wide API support provided by graphic
cards manufacturers, and the ever-growing capabilities of graphic cards.
We therefore
propose to explore the potential of GCs, using projective texturing [Segal et al. 1992], to accelerate some of the tasks
relating to 3D reconstruction of points using epipolar geometry. The use of
projective texturing in the context of 3D reconstruction has been explored
previously, namely by [Lok 2001], but it was used for a different
purpose: determining a convex hull for collision detection.
Using
projective texturing for epipolar geometry related computations, a significant
amount of the computational load is transferred from the CPU to the GC. Tests
performed with an implementation based on the point reconstruction method
presented in [Rodrigues
et al. 2002] show the very significant performance gain
obtained – up to two orders of magnitude – and the potential of the method
proposed in this paper.
Section 2 describes the fundamentals of
epipolar geometry and how it is used for 3D or depth reconstruction. The
application of projective texturing in the context of epipolar geometry is
described in section 3. Tests and results of our proposal
using a case study are presented in section 4. Finally conclusions and directions
for future work are presented in section 5.
Typical
reconstruction algorithms rely on finding correspondences between entities in
two images. In this article we focus on the correspondence of points.
If a point pi in image Ii is corresponded with a point qj in Ij, and camera positions Ci and Cj are known, then the corresponding reconstructed point (r-point) rip is easily determined (in the ideal
case) by triangulation using the information of the cameras.
However,
finding correspondences between entities of different images is not trivial. On
the one hand, errors in the process of image acquisition and camera calibration
may lead to distortions of the entities, resulting in missed correspondences
(false negatives). It may also happen that ambiguities in the entities
corresponded lead to incorrect correspondences (false positives). Whenever
possible, it is useful to add constraints to help reducing this correspondence problem.
Epipolar
geometry allows the search for correspondences of a pi to be constrained to a subset of
the entities in image Ij. In this section we will present
the basics of epipolar geometry. A more detailed analysis of epipolar geometry
can be found in [Hartley
and Zisserman 2000].
Consider
two images Ii and Ij, with known cameras Ci and Cj, and a point pi of Ii (see Figure 1). Let Vpi be the view ray of pi associated to Ci (the half-line starting in the
centre of Ci and going through pi). The epipolar line Ei,jp is the projection of Vpi in Ij. Points vipx belonging to Vpi are projected in Ei,jp, and points pi,jy of Ei,jp can be projected back to Vpi, using Cj .

Figure 1: Epipolar Geometry
In terms of
correspondence search, this means that the correct correspondence for pi in Ij has to be in Ei,jp.
Assume there
is an error function
i,jpx that quantifies for a point pi,jx the error of corresponding it with pi (a possible error function is
discussed in section 2.3 ). In this case, determining the
best correspondence of pi could amount to minimize that error
function over a set of candidates pi,jx. In the two-image setting, this
could be done by evaluating the error function at the points belonging to the
epipolar line Ei,jp, as shown in Algorithm 1 .
|
For
simple entities such as points, the information provided by two images may be
insufficient to determine correspondences reliably. It is possible to reduce
ambiguity by extending point correspondence to multiple images. In this case,
choosing correspondence candidates from epipolar lines in different images
gives rise to the problem of fusion: if pi corresponds with qj in Ij and with ok in Ik, the associated r-points rip,j and rip,k will in general not coincide, due
to multiple error sources. These two points would have to be fused together in an additional step.
A possible
solution for the fusion problem, as suggested in [Rodrigues et al. 2002], is to choose candidates vipx directly from the view ray, project
each of them in the images, and accumulate the corresponding error to vipx (see Algorithm 2 ).
|
This
approach has an important advantage: the correspondence, reconstruction and fusion stages are merged into one. When the best candidate is chosen, it is
already a reconstructed point (r-point) and that choice already takes into account the
contribution of multiple images.
The
error function chosen to assess the correspondence between points in different
images is the distance of the projected
candidate to the closest contour (as suggested in [Rodrigues et al. 2002]). This means that, for a given candidate vipx of a contour point pi from Ii, the individual error
i,jpx contributed by Ij is the distance of pi,jx (the projection of vipx in Ij) to its' closest contour in Ij.
There are
several advantages in the use of distance as the correspondence error function.
First, it introduces an implicit tolerance to contour displacements originated
by faulty contour detection or errors in camera calibration. Second, this
distance function can be easily and quickly computed in the form of a distance
map using distance transforms [Borgefors
1984]. Furthermore,
since it is independent of the original point pi, the distance map can be
pre-computed, and it only has to be computed once for each image, regardless of
the number of points to be reconstructed, or the number of candidates per
point.
In this work,
the error maps that will be projected over the view rays are the distance maps
generated from the contours of all images. Those maps are precomputed once and
used for the reconstruction of any point.
In
section 2 we have seen the application of epipolar
geometry to 3D point reconstruction. Epipolar geometry is traditionally
computed on the CPU. We propose to perform in the GC a significant part of the
computations required.
The algorithm
presented in section 2 requires that a view ray was
densely sampled into a set of 3D candidates, and that each of those candidates
were evaluated according to errors accumulated from the n
error maps associated with the images.
The evaluation
of the error function for a given 3D candidate requires n
projections in error maps, n readings of error map
values (or n × 4 readings and n interpolations) and n sums.
Two features
of GC's can be explored to accelerate this computation: first the acceleration
provided for operations such as projection, and secondly the pipelined
architecture that allows for parallel computing.
GC's provide
acceleration for a common graphic procedure known as projective
texturing [Segal
et al. 1992] that allows one to project images (textures)
over 3D geometry, just as a slide projector casts an image over the surfaces in
front of it (Figure 2).
This
procedure can be used to project an error map
j from the position of the associated
camera Cj over a candidate point vipx. By projecting the error map as a
texture, the color value that is assigned to vipx will be a representation of the
error value corresponding to pi,jx (the projection of vipx in Ij) on the error map (Figure 3 ).
|
Figure 3: Projective
Texturing for reconstruction. |
GC's
make this procedure extremely fast. Furthermore if multiple candidates are
packed and sent to the graphics card simultaneously then the pipelined
architecture allows for another degree of acceleration. Hence the more
candidates are processed on one pass, the more effective the acceleration. Since
all 3D candidates are in the 3D line defined by the view ray for a particular
contour point, rendering the view ray is a natural option to render all the
candidates.
Error
accumulation for multiple error maps is also possible with GC's using additive blending. Additive blending can be used to
accumulate texture values coming from multiple sources, and projected on the
same point. Using the projector analogy, texture additive blending is similar
to the effect resulting as lights of different intensity or color are cast from
different projectors on a neutral surface, blending together into a new color
(Figure 4 ).
In
this section we will show how to apply multiple projective texturing with
additive blending, to enable the implementation of error measure and
accumulation over multiple images for a complete set of candidates in a view
ray, in the context of the work in [Rodrigues et al. 2002].
The following
sub-sections detail the building blocks of this solution:
Projective
texturing allows an image to be projected in geometry, providing an effect
similar to a slide projector. In the context of epipolar geometry, projective
texturing allows one to reverse the common approach, ie instead
of projecting a view ray V
ip onto an image Ij, an image is
projected onto the view ray. The part of the image that is actually
projected on the view ray V ip corresponds to the epipolar line Ej,ip of the point pi that originated V ip. By projecting an error map as an
image onto the view ray, the errors corresponding to that view ray's candidates
are projected on the 3D points themselves. In this case, each camera acts as a
projector.
Texturing is a
common graphic task supported in GC's. Projective texturing is a special case
of texturing in which the texture coordinates of a
point are computed based on the point's position in the world, and the
parameters of the projector. More specifically,
projective texture homogeneous coordinates s, t, r and q are computed by multiplying the point's homogeneous
coordinates x, y, z and w by a transformation
matrix T that corresponds to the projection of the
3D point in the image plane of the projector (Equation 1 ).
|
(1) |
The
projector's intrinsic and extrinsic parameters are the same of the
corresponding camera. For this reason, computing projective texture coordinates
is very similar to computing a vertex's projection in an image using the
vertex's position and the corresponding camera's parameters.
The texture
transformation is usually divided in four transformations (Equation 2 ), represented here by their
respective matrices, namely:
Model Matrix (M)
The model matrix is used to transform the
point's coordinates from object space to world space. A given point may be
specified relative to an object referential which is in turn defined relative
to a world referential 1
View Matrix (V)
The view matrix transforms the world-space
coordinates in projector-space (or camera-space) coordinates. The base referential
for the resulting coordinates is now the camera's (or projector's) position and
orientation. This matrix contains the information of the projector's
extrinsics.
Projection Matrix (P)
The projection matrix converts the view-space
coordinates into texture-space 2D coordinates. This matrix encodes the
projector's intrinsics
Normalization Matrix (N)
This matrix scales and offsets the image-space
coordinates, such that they are mapped in the normalized range of [0..1] x
[0..1]
|
(2) |
For
the purpose of view ray texturing in the context of reconstruction, P and V are dependent only on
the projector's parameters, which are constant for each camera. Furthermore,
the Object space is made coincident with the World space, making M the identity matrix, hence also constant. Finally, N also consists of constant factors, for each camera. Since
all the matrices involved are immutable for a given camera acting as a
projector, the complete texture transformation matrix Ti may be computed for each camera Ci in advance and stored, to save
computation time.
The use of
projective texturing for the computation of the errors of the candidates of a
point pi using a single error map
j associated to Cj is described in Algorithm 3 .
|
After
performing this algorithm, the GC's buffer will contain each candidate
represented with a value corresponding to its error in
j. Section 3.3 presents a proper buffer and camera
setup to simplify the relation between a candidate's position in the buffer and
its position in the world.
Section 3.1 describes the process of projecting
errors from one error map in a given candidate, considering a two-camera
setting. To extend the method to a multiple-camera setting, textures (error
maps) projected from different projectors (cameras) over a given candidate have
to be accumulated. This accumulation can be performed on the same buffer, using
additive blending.
Additive
blending adds the texture values of newly rendered geometry to the values
already existing in the buffer. In the context of reconstruction, a solution to
incrementally accumulate errors (texture values) in the buffer is to render the
same candidate multiple times using additive blending, each time textured with
a different error map projected from its corresponding camera. In this way the
error values from multiple cameras are accumulated for each candidate.
|
Figure 5: Projective
texturing with accumulation for reconstruction. |
Algorithm 4 describes the method to accumulate
errors from multiple images using projective texturing and additive blending.
|
There
are limits to the maximum error value that can be represented in textures and
rendering buffers of GC's. One of the most supported pixel buffer formats is
the RGB (Red-Green-Blue) buffer, which is composed of three 8-bit channels (one
for each of the colors). This means that values stored on this buffer are
limited in the range of (0, 0, 0) to (255, 255, 255).
Since each
color channel is independent, accumulation can be done on a per-channel basis. To
achieve this, color channel masking is used: error
maps are coded as gray-scale maps, and each error map is only projected in one
of the buffers' color channels (red, green or blue). This allows to distribute
the error accumulation over the different channels, in order to use more of the
24-bit accumulation space (resulting on an effective upper limit of 3 × 256 = 768). The cameras' distribution between the different
channels is not relevant, as long as it is guaranteed that accumulation
overflow does not occur (ie a given channel
does not accumulate to an error value higher than 255).
For instance,
if error values are clamped to an upper limit of 10 units, at least 25 cameras
can be safely accumulated in a single color channel (25
× 10 < 255), resulting in at
least 75 cameras considering the three channels.
The
view ray, with the projected and accumulated error maps, has to be rendered in
a buffer using a virtual camera to capture the rendered geometry. The final
accumulated errors have to be read from this buffer to choose the best candidate.
Therefore, special care should be taken in choosing the representation of
candidates, the buffer, and the virtual camera parameters such that:
To
achieve these goals, some factors should be taken into consideration:
To
render a segment of the view ray, only the start
and end 3D points have to be passed to the GC as
the vertexes of a line. The corresponding connecting line is then rendered in
the buffer by the GAPI. Since pixels are the smallest entities that can be
represented in the buffer, each pixel rendered from the segment actually
becomes a candidate.
The start and end points of the
segment can be defined based on previous knowledge of the dimension of the
scene being reconstructed. If such information is not available, a large segment
can be used initially to obtain a conservative estimate of the scene's
dimensions.
Given the
collinearity of candidates and the rectangular shape of buffers, the best way
to maximize buffer usage is to set it up such that the segment is rendered as a
straight line aligned to one of the viewport's axis (e.g. an horizontal
line). Furthermore, the final reconstruction is simplified if a pixel selected
from the buffer can be easily associated to the corresponding 3D candidate.
Using an
orthographic camera CO properly placed relatively to the
segment allows the accomplishment of these goals. Such a setup makes it
possible to have a segment filling an entire 1-pixel-high buffer, and to
associate a pixel to a 3D candidate based on the pixel's x
coordinate in the buffer.
The
extrinsic parameters of CO are defined as a function of the
position, direction and length of the segment (Figure 6 ). Let DIR
be the normalised vector of the direction of the view ray. The lookat point is the middle point of the segment in 3D. The
camera can be placed in any position pos in the
plane containing the lookat point and perpendicular
to DIR. The LOOK vector
of CO is defined as the unit vector
pointing from pos to the lookat
point. The RIGHT vector of CO is parallel to DIR. The UP vector of CO is set perpendicular to RIGHT and LOOK. A graphical
representation of the setup of the orthographic camera CO is depicted in Figure 6. Assuming the view ray is parallel
to the paper plane, LOOK and RIGHT
are also parallel to the paper, and UP is
perpendicular to the paper.
|
Figure 6: Orthographic camera (CO) for view ray segment rendering. |
Pseudo-Code 1 represents the setup of the
extrinsic parameters of CO, using notation inspired in the
OpenGL API.
|
DIR = normalize(end - start);
|
The
gluLookAt call defines the camera's position to be pos, and the orientation is specified by a lookat point, which is the center of the segment, and the
UP vector.
Assume
now that the segment's length in 3D, segLen3D has
been defined or computed as suggested inSection 3.3 . The segment size in pixels, segSize2D, corresponds both to the number of candidates
that will be rendered, and to the width of the buffer to be used. This number
is directly related to the final precision of the reconstruction itself. For a
given predefined segLen3D, the larger the set of
candidates, the higher the reconstruction precision.
The intrinsic
parameters of CO (namely the clipping planes and the
viewport) are defined based on segLen3D and segSize2D. Each pixel corresponds to a 3D candidate of
the view ray. Pseudo-Code 1 represents the setup of the
intrinsic parameters of CO, using notation inspired in the
OpenGL API.
|
segLen3D = |end - start|;
|
The
glViewport call defines the buffer's start
coordinates and dimensions in pixels. The glOrtho
call defines an orthographic camera with left, right, bottom, top, near and far
clipping panes. These planes define the volume of space that will be rendered
on the buffer. Left, right, bottom and top planes are defined such that they
comprise exactly the length and "thickness" of the line segment. Near
and far planes are defined such that the segment lies between them.
Having
defined CO as above, the segment's projection
will completely fill the buffer, which will be one-pixel tall. In those
conditions, a 3D point rip in the view ray can be easily
derived from its corresponding pixel with 2D window coordinates (x,0) using Equation 3 .
|
(3) |
After
setting the orthographic camera and the corresponding buffer, the view ray's
segment can be rendered on that buffer using the projective texturing with
additive blending procedure with multiple cameras, as described in
Algorithm 4 . As a result, the buffer will contain an RGB
value for each pixel of the segment. Each RGB value encodes the accumulated
error as described in Section 3.2.1 . The GAPI provides a call to read
the buffer data from the GC into a main memory array. If the errors are
distributed over more than one channel, each candidate's final error is
computed by adding the corresponding value of each channel, at the same pixel
position. After doing so, the right candidate may be chosen from the candidates
with the lowest error iterating over the resulting array. The array positions
correspond to the x coordinates in the buffer, and
therefore can also be used to determine the associated 3D candidate using
Equation Equation 3 .
To
assess the performance improvements achieved with projective texturing we
implemented a point reconstruction algorithm based on [Rodrigues et al. 2002], with two variants: a CPU-based one, and a
GPU-based (using projective texturing). These variants were submitted to two
tests.
The first test
serves to show the influence of varying the number of
candidates used. The number of candidates plays a major role in the
computational load of the reconstruction, but also in its quality, as referred
in section ??. The scene's setup consists of a
set of images of 512 x 512 pixels of a textured version of the Stanford model
repository' bunny, taken from 70 virtual viewpoints around the bunny
(Figure 7 (a)).
Scenes
with high level of detail may require a large number of candidates for a proper
reconstruction. For this reason, reconstruction was performed using 400, 800
and 1200 candidates, for 16826 contour points of the first image. (Figure 7 (b) illustrates the achieved final
reconstruction of contour points from multiple images).
The results in
Table 1 show the time in milliseconds required for
reconstructing all the 16826 contour points evaluating
400, 800 and 1200 candidates in 69 images each, using the two implementation
variants. The test was performed on System A, a Pentium IV computer running at
2.53 GHz, with a GeForce FX 5900 graphics card.
|
From
the table is clear that the time required by the GPU implementation does not
increase linearly with the number of candidates, as opposed to the CPU variant.
In fact, using the largest set of candidates (1200) more than doubles the
advantage of using projective texturing over the CPU implementation, when
compared to the smallest set (400).
The second
test compares the performance under different hardware settings, to show that
even with older GPU's, there is still a major advantage in using projective
texturing. Two hardware systems were used: System A
is the same system used in the first test. System B
consists of a Pentium IV Celeron computer running at 2.4 GHz, with a GeForce
256, an older graphics card.
In this test,
a real scene consisting of a small wooden boat was used (see Figure 8 (a)). A set of 39 images of
1024 x 768 pixels was captured around the boat using a digital
camera.
|
Figure 8: "Boat": (a) an image of
the test scene; (b) a view of a multiple-image point reconstruction |
Reconstruction
was performed for 21169 contour points of the first image in each of the
systems, and using both implementation variants. Table 2 shows the time results in
miliseconds of both implementation variants in the two hardware systems. These
times include the reconstruction of all the 21169 contour points, by evaluating
1200 candidates for each point, in 38 images. Figure 8 (b) shows a virtual view of the
final reconstruction using contour points from multiple images.
|
The
results show that, even when using older graphics systems, significant
performance gains are obtained with the GPU based implementation, although to a
lesser extent.
Improving
performance of 3D reconstruction methods is of growing importance, both as a
step towards real-time applications and as a mean for better exploration and
interactivity of reconstruction methods. The process of 3D reconstruction is a
complex one, and many methods often have several fine tuning parameters. Hence
the feasibility of a method is also related to its response time. Fast response
times allow the user to fine tune the parameter settings and adjust the image
input set to obtain the best reconstruction results. We proposed the
application of projective texturing – a technique commonly available in
current hardware graphic cards and API's – to perform some crucial steps
of reconstruction, namely the computation of epipolar geometry.
The technique
was tested using CPU-based and GPU-based implementations (without and with
projective texturing, respectively) of a modified version of the point
reconstruction algorithm presented in [Rodrigues et al. 2002]. The results show gains of up to two orders of
magnitudes in response time, when applying the method in current off-the-shelf
CPU's and GPU's. As an example, computing three reconstructions of a synthetic
scene, with 400, 800 and 1200 candidates per point, takes less than 18 seconds
using the GPU-based implementation, versus over 1000 seconds on the CPU-based
implementation. The improved response time provided by the use of projective
texturing enables more extensive adjustment of reconstruction settings, when
comparing to the CPU-based implementation. We conclude that this leads to an
increase in interactivity and usability.
We now plan to
explore multi-texturing and graphics hardware programming to further enhance
performance, and extend the role of the GPU to other stages of reconstruction. Another
possible extension to the algorithm is to compute the 3D reconstructions of
multiple points simultaneously, by projecting textures in multiple view rays at
a time in the same buffer and reading them in a single pass. This may seem a
trivial solution for increasing performance even further, since the projective
texturing would be faster for simultaneous multiple rays. However, two problems
arise: how to define an appropriate buffer to record the errors of multiple
view rays simultaneously and how to read back and interpret the errors. We
would like to explore methods to solve these problems.
This
work was funded by FCT (the Portuguese Science and Technology Foundation) under
Grant Praxis XXI/BD/20322/99. The authors also wish to thank the support of
Philips Research Laboratories (
BORGEFORS, G. 1984. Distance transformations in arbitrary
dimensions. Computer Vision, Graphics,
and Image Processing 27, 3, 321–345.
FAUGERAS, O. 1993. Three-Dimensional
Computer Vision: A Geometric Viewpoint. MIT Press,
HARTLEY, R., AND ZISSERMAN, A. 2000. Multiple View
Geometry in computer vision. Press Syndicate of the
KAWAMOTO, K., AND IMIYA, A. 2001. Detection of spatial points and lines by
random sampling and voting procedure. Pattern Recognition Letters 22, 2 (February), 199–207.
LOK, B. 2001. Online model reconstruction for
interactive virtual environment. In Proceedings 2001 Symposium on Interactive 3D Graphics, 69–72.
REDERT, A., HENDRIKS, E., AND BIEMOND, J. 1999. Correspondence estimation in image
pairs. IEEE Signal Processing Magazine 16, 3,
29–46.
RODRIGUES, R., FERNANDES, A., VAN OVERVELD, K., AND ERNST, F. 2002. Reconstructing depth from spatiotemporal
curves. In Proceedings of the 15th International Conference on Vision Interface,
252–259.
SATO, J., AND CIPOLLA, R. 1999. Affine reconstruction of curved surfaces
from uncalibrated views of apparent contours. IEEE Trans. Pattern Analysis and Machine Intell. 21, 11
(November), 1188–1198.
SCHULZ-MIRBACH, H., AND WEISS,
SEGAL, M., KOROBKIN, C., VAN WIDENFELT, R., FORAN, J., AND HAEBERLI, P. 1992. Fast shadows and lighting effects using
texture mapping. In Proceedings of SIGGRAPH'92,
ACM, E. E. Catmull, Ed., 249–252.
SHREINER, D., Ed. 2000. OpenGL Reference Manual, OpenGL
Architecture Review Board, Addison-Wesley.
SMITH, S. M. 1998. ASSET-2: Real-time motion
segmentation and object tracking. Real-Time Imaging 4,
1, 21–40.
TELL, D., AND CARLSSON, S. 2000. Wide baseline point matching using
affine invariants computed from intensity profiles. In ECCV
(1), 814–828.
YANG, R., WELCH, G., AND BISHOP, G. 2002. Real-time consensus-based scene reconstruction using commodity graphics hardware. In Proceedings of Pacific Graphics 2002.