|
This special issue
includes a selection of papers from SCCG 2004 conference chaired by Prof.
Alexander Pasko ( |
|
|
There are two
competitions organized during the conference SCCG Best Papers and SCCG Best
Presentations. They are based on evaluation by reviewers and public voting of
SCCG participants. Awarding of winners is a part of closing ceremony and the
diplomas with logos of sponsors are available at www.sccg.sk,
as well. As proposed
by Alexander Pasko and accepted by the editor-in-chief, Prof. Victor V. Pilyugin,
the winning papers are published in special issue of CGG, a prominent online
journal at http://elibrary.ru/cgg. The papers are
slightly extended and rewritten, based on SCCG discussions and inspirations.
After completing the selection, one can see that the unifying idea of all
five papers awarded can be formulated as discovering the tricky solutions
between speeding-up (modeling) and rendering quality criteria. William Van Haevre et al. dealt with ray
density estimation for plant growth simulation. In particular, they evaluated
the varying indoor environment illumination while growing the plants using
intensity-modified rules for L-systems. The novel approach results in a
flexible and accurate algorithm to achieve more realistic vegetation. The
paper won the 3rd Best Presentation Award. Mario Sormann et al. focused on a
solution of a complex task creating models from image sequences as fast and
as good as possible. VR modeler is a novel interactive monocular 3D modeling
system with nicely separated intelligent 2D interaction and 3D
reconstruction. Besides that, the coarse and detailed precision of urban
models is supported for web presentation and other purposes. The results
already contributed to Virtual Heart of Central
Europe (www.vhce.info) which is a recent
European cultural heritage project.
The paper won the 3rd Best Paper Award. |
|
|
Rui Rodrigues and Antonio Ramires Fernandes
report on prospective use of graphics cards. A significant part of 3D
reconstruction, especially epipolar
geometry computations, can be transferred into the GPU. This new idea offers
a remarkable gain up to two orders of magnitude in terms of computational
times. The paper won the 2nd Best Presentation Award. |
|
|
Ivan Viola et al.
explored frequency domain volume rendering (FVR) because of computational
speed. Moving significant parts of computations to GPU, they report
acceleration by factor of 17. This allows for highly interactive framerates with varying
rendering quality. The quality depends on interpolation schemes. The authors
analyzed four of them to clarify the trade-off between performance and
quality. The paper won the 2nd Best Paper Award. |
|
|
Last but not
least, Diego Gutierrez et al. contributed by a SIGGRAPH quality paper on
global illumination for inhomogeneous media. In total, there are 10 different
light-object interactions known and we simplify the model to achieve faster
solutions. The authors noticed that light rays travel a curved path while
going through inhomogeneous media where the index of
refraction is not constant. In addition, they took into account the way how
human perception deals with luminances.
In total, the phenomena like sunset, green flash, and bleaching are mastered to complete an
excellent research and a brilliant presentation. This is why only five papers
are here Diego clearly won in both competitions. |
|
|
For conclusion, I
have to recall the following. In 2003, one year ago, this message from
Alexander Pasko arrived
to |
|
|
Dear participants and
organizers of SCCG, your conference provides unique opportunity for young
researchers to make their efforts visible in the world, especially for those
who are not hypnotized by the visual quality of modern computer graphics
works in modeling, rendering, and animation. We all know that such a work
still requires tedious manual labor hampered by errorneous models and algorithms. Let us hope that
the next spiral of development will make our work in computer graphics more
close to a joyful mind game. |
|
|
I have to thank again to Alexander and to all
people who contributed to SCCG 2004 in the spirit of these beautiful and
clever words. |
|
|
Andrej
Ferko
Comenius University Bratislava, SK-842 48
Bratislava, Slovakia, ferko@fmph.uniba.sk, www.sccg.sk/~ferko |
|
VR Modeler: From Image Sequences to 3D Models
Mario Sormann, Joachim Bauer, Christopher Zach,
Andreas Klaus, Konrad Karner,
Austria
sormann@vrvis.at
· Abstract
·
3. VR Modeler: 3D Models from Image Sequences
·
3.1 3D Model Representations
·
4. Conclusion and Future Work
· 5. Acknowledgements
In this
paper we present a novel interactive modeling system, called VR Modeler,
to create 3D geometric models from a set of photographs. In our approach
standard automatic reconstruction techniques are assisted by a human operator.
The modeling system is efficient and easy to use because the user can
concentrate on the 2D segmentation and interpretation of the scene whereas our
system is responsible for the corresponding 3D information. Therefore we
developed the user interface of VR Modeler as a monocular 3D modeling system.
Additionally, we are able to obtain coarse as well as high resolution models
from architectural scenes. Finally, we tested the modeling system on different
types of datasets to demonstrate the usability of our approach.
Key words:
interactive
modeling system, user interface, feature based modeling, photogrammetry, 3D
reconstruction, image sequences, 3D modeling
The
creation of 3D models for use in an interactive virtual environment is an
expensive and tedious process and is still a challenging problem in computer
vision. Typically the requirement that the virtual environment should mirror an
existing scene demands accurate three dimensional (3D) geometry, as well as
surface materials or textures. Thus, there is a need for a method to directly
extract realistic 3D models from real photographs. In the fields of
photogrammetry and computer vision many approaches have been developed which
allow the production of photorealistic 3D models [13], [18]. In general, these
algorithms take multiple images of a real environment using a calibrated camera
and then create from these images a 3D structure of the scene. The output of
such an algorithm is a dense point cloud, corresponding to important features
in the scene. These point clouds should be converted into logical objects in
order to create suitable representations for a virtual environment. Current
available methods for automatic segmentation are not yet robust enough to build
useful geometric models for the visualization, thus fully automatic
segmentation yields to an ill-posed problem. In this paper we discuss how we
can make the modeling process more convenient and efficient. So far there are
two separate research areas in computer vision, one is the reconstruction
problem and the other one the recognition problem. In our approach we solve the
reconstruction problem by highly redundant information about the scene, in our
case image sequences. The recognition problem is handed over to a human
operator, who is supported by an intelligent user interface. Thus the operator
can focus on the segmentation and interpretation of the scene using only one
image while the system takes care about the associated 3D information. Essentially
our interactive modeling system, called VR Modeler (Virtual
Reality Modeler) allows a user to construct a geometric model of the scene from
a set of photographs. The images are taken with a hand-held digital consumer
camera using short baselines. After some preprocessing the relative orientation
of the image sequences are calculated fully automatic. Our orientation method, which
is not the topic of this paper, is based on work desrcibed by Horn [8], Klaus
et al. [9] and Nister [12]. Once we have determined the relative orientation
between all image pairs we are able to extract 3D information from the
photographs automatically by employing area and feature based matching
techniques. The 3D information consists of 3D points, 3D lines and 3D surfaces,
as illustrated in Figure 1. After applying this automatic reconstruction
process all 2D features in the images correspond to their 3D counterparts. Due
to the fact that we like to obtain a consistent 3D model of the scene it is
necessary to combine the extracted different model representations. In our
approach we decided to accomplish this task by utilizing a human operator in
terms of his interpretation and segmentation abilities. As a result of the
modeling process we are able to achieve a coarse as well as a detailed 3D model
of the scene. The remainder of this paper is structured as follows: after a
section related work we desrcibe our approach to build 3D geometric models from
image sequences. Furthermore we present the achieved results and finally we
conclude our approach and outline some aspects of future work.

Fig. 1.
Fusion of automatically
extracted 3D lines and 3D point cloud. The illustration shows the bell tower on
the castle hill in Graz
The process
of reconstructing 3D models from image sequences is a very active research
topic in computer vision. Nevertheless no general technique exists to obtain
fully automatic 3D models from image sequences. However three different
research fields provide methods to recover 3D information from oriented digital
images. These research fields are known as area based modeling, feature based modeling
and human assisted reconstruction from image information.
2.1 Area based
Modeling
The
estimation of dense 3D point clouds from image sequences is discussed by Pollefeys et al.
[13] and Brown et al. [4]. The geometrical theory of these methods relies on being able to solve the
reconstruction problem. From corresponding points the relative orientation can be
estimated and 3D
points are extracted. These procedure is applied to many pixels within stereo or multiview
images which results in
dense 3D point clouds.
2.2 Feature Based
Modeling
Several
authors discussed the problem of 3D data acquisition from digital images using
various feature extraction and matching methods. A general overview of these
methods is given in Baillard et al. [1], where they propose a line matching
method over multiple oriented views. Schmid and Zisserman [15] assume a 2D
feature extraction method from images including contour chains, line segments
and vanishing points to automatically recover planes from architectural images.
2.3 Human assisted
Reconstruction from Image Information
One of the
most popular approaches in this field is the modeling and rendering of
architecture proposed by Debevec [5], called Facade. This approach,
which combines geometry-based modeling and image-based modeling, can be
separated into two main components. The first component facilitates the
recovery of a basic geometric model of the photographed scene. The second
component desrcibes an efficient view dependent texturing method to better represent
geometric details of the basic model. Another project named Realise [10]
is based on a hybrid approach with vision techniques to assist users to extract
models from image sequences. A user specifies interactively the topology of the
scene whereas the system reconstructs the geometry from the images. The
commercial product Image Modeler from Realviz [11] is inspired by both
approaches.
3. VR
Modeler: 3D Models from Image Sequences
In this
section we present the underlying model representations and algorithms.
Furthermore we discuss some user interface aspects illustrated on the
implementation of VR Modeler. As already mentioned the input images are
captured with a digital consumer camera using short baselines, thus a digital
video camera with a reasonably high resolution will work as well. The extracted
3D primitives (3D points, 3D lines) and 3D surfaces acquired with current state
of the art techniques will be insufficient in terms of interpretation,
segmentation and visualization aspects. To make this aspects more concrete we
will outline various problems of modeling 3D scenes in more detail.
Consequently in this paper we focus on three aspects:
· The segmentation and interpretation
of a scene to obtain a consistent 3D model from image sequences
· How the modeling process is affected
by the visualization aspect
· How the user interface can make the
process of modeling more convenient and accurate.
Figure 2
shows the Herz-Jesu church in Graz represented as a 3D point cloud. Obviously
it is very difficult to obtain fully automatic a correct segmentation and
interpretation of the scene from this model representation. In fact the
segmentation and interpretation task includes the localization and
classification of facades, windows, doors or any other relevant scene objects.
Therefore in our opinion the better choice to acquire 3D models is to combine
3D surface representations with feature based modeling assisted by a human
operator. 3D point cloud from of the Herz-Jesu church in Graz. The generated 3D
point cloud includes many outliers from areas like the sky, thus a segmentation
and interpretation of the scene is needed.

Fig. 2. 3D point cloud from of the Herz-Jesu
church in Graz. The generated 3D point cloud includes many outliers from areas
like the sky, thus a segmentation and interpretation of the scene is needed.
The second
question yields to the level of detail concept, where geometry, which is to
complex to be rendered fast enough, is replaced by a simpler model. We observed
that standard simplification methods are not well suited to generate different
levels of detail for architectural models, since they do not preserve for
example upright walls to which humans are very sensitive. Hence, we decided to
produce a coarse as well as a high detailed polygonal models of the scene.
Furthermore with such a coarse polygonal model we can achieve realtime
rendering in high quality even on low bandwidth network connections [17].
The last
question is related to the user interface of VR Modeler. In general user
interfaces represent a key concept to computer applications and directly relate
to the usability of a given application [16]. The key concept of our
interactive modeling system is based on the fact that humans are not good at
precise or accurate operations in 3D. Therefore we developed our VR Modeler as
a so called monocular 3D modeling system.
In the VR
Modeler, an architectural building is represented as a set of different model
representations. Various representations can be found in [2] and [9]. Our
approach deals with the following three types of model representations: marker
points, marker lines and 3D point clouds. This ordering also reflects the
complexity of the geometric primitives. The following sections give a detailed
desrciption of the mentioned representation types.
Marker Points
The
extraction of marker points from image sequences is related to the
reconstruction problem, which is to identify the 2D points in two images that
are projections of the same 3D point in the world. From corresponding points within
the image sequences the relative orientation and the 3D positions of the
corresponding points can be estimated. Once we have determined the relative
orientation, additional 3D points can be easily extracted from two 2D points in
the image sequence, as illustrated in Figure 3. In VR Modeler we distinguish
between a semi automatic and a fully automatic marker point extraction. The
semi automatic extraction is supported over an incremental and straightforward
process by a human operator. Consequently the user defines marker points in the
image sequence over a simple point and click interface with subpixel accuracy
by zooming into the images. The automatic marker point generation is based on a
standard point-of-interest detector introduced by Harris and Stephens [7]
followed by an automatic matching procedure [4]. The output of both procedures
is a direct assignment of 2D marker points and their 3D counterpart.

Fig. 3. Reconstruction of a 3D point from two
2D points over a known relative orientation.
Marker Lines
Man-made
objects, for example architectural buildings, require the usage of so called
marker lines for the modeling step. Marker lines provide higher accuracy and
are simpler to localize during the modeling step than marker points. The
extraction of 2D line segments is based on the method proposed by Rothwell et
al. [14]. Utilizing this algorithm it is possible to extract contour chains
with subpixel accuracy. After applying a RANSAC [6] based line detection method
for all extracted contour chains and an optimization step based on vanishing
points [2], we derive a set of 2D line segments. These line segments in
combination with the known relative orientation represent the input for our automatic
line matching algorithm. The overall procedure of our 3D line matching method
is desrcibed in [15]. Similar to marker points the outcome of the algorithm yields
to a direct connection between 2D and 3D marker lines.

Fig. 4. Automatically extracted 3D line set from
the Herz-Jesu church in Graz.
3D Point Cloud
The
generation of dense 3D point clouds from calibrated images is performed by an
iterative and hierarchical matching procedure exploiting the already known
epipolar geometry between the images. For every sampling point the matching
procedure optimizes a cost function, which contains the similarity between the
template windows and a regularization term to favour smooth surfaces in
textureless regions. Using a hierarchical approach many problems with
repetitive patterns, as often encountered with building facades, are resolved
[4]. Figure 5 shows a 3D point cloud of the clock tower on the castle hill in
Graz. Another possibility to incorporate 3D surfaces into the VR Modeler is
obviously achievable with 3D laser scanning data.

Fig. 5. 3D point cloud of the clock tower on
the castle hill in Graz.
The
construction of coarse 3D models within VR Modeler is an incremental process
supported by an human operator. Our approach allows the user to create 3D
surfaces applying two different techniques. The key concept of the first
strategy is based on a selection of the interesting area of the scene by a
human operator. In consideration of this segmentation issues a user connects in
the image space the available marker points to a polygon, whereas the
triangulation of the polygon and the 3D surface generation is performed by our
system. In a second method we emphasize marker lines to reconstruct polygons
from the image sequence. Consequently the user supplies the input to the VR
Modeler by specifying and grouping the marker lines in one of the images to
facades, windows or doors. As a consequence of the direct link between the 2D
and 3D model representation the 3D surface can be easily extracted.
Texturing
Typically,
texturing the reconstructed polygon from one image produce various disturbing
artefacts, for instance occlusions yield to incorrect textured polygons.
Therefore a multi view texturing approach [3] allows to texture a polygon from
all images more accurately and additionally increases the visual quality of the
scene. Hence the texture information of the polygon is generated from all
images in which the polygon is visible.
Results
Figure 6
shows a coarse model of the bell tower on the castle hill in Graz. Note, that
this model has been created with the VR Modeler in 5 approximately minutes.
This time takes into account all 3D modeling steps except the automatic
orientation of the image sequence.

Fig. 6. Two views of the reconstructed bell
tower on the castle hill in Graz.
So far we
have desrcibed the creation of a coarse model of the scene, but in general a
facade of an architectural building will have additional geometric details
which are not presented in the basic model. Therefore this section is dedicated
to explain the creation of this detailed 3D models. As already outlined in
section 3.1 we automatically recover a 3D point cloud from the image sequence
of the scene. This point cloud and the marker points, respectively the marker
lines are further used as an input for our detailed modeling process. Note,
that we obtain in both cases a segmentation and interpretation of the scene in
meaningful units, like windows, doors or roofs. Additionally the the whole
modeling procedure is supported by a human operator. As outlined the
segmentation process to create a coarse 3D model is based on two different
techniques. We either utilize marker points or marker lines to obtain a
segmented area of the scene. Obviously the detailed reconstruction is performed
exclusively inside of this emphasized borderline. To obtain a polygonal 3D
surface representation a standard image-based triangulation method can be
finally performed. Basically the algorithm works as follows: The already known
region of interest provides a set of 2D line segments. Each of this 2D line
segment corresponds to a 3D marker line, which is further utilized to construct
an object plane. In the next step the final plane parameters are computed with
a robust least-squares fit to the 3D lines endpoints. Finally, the acquired
object plane is merged with the 3D point cloud, thus that we obtain a high
resolution model of the segmented area of the scene. This process is
illustrated in Figure 7 where the red area indicates the acquired object plane.
Detailed modeling of the scene via marker points or marker lines is advantageous
for a number of reasons:
· Typical problems of area based
modeling approaches are avoided, hence the final high resolution model is free
of disturbing outliers
· As a consequence of the previous
mentioned constraints a significant increase of the performance can be achieved
· Almost all architectural models can
be easily created by arrangement of various polygons
The first
point represents the main advantage of our interactive modeling system, which
is shown in more detail in the following section.

Fig. 7. Illustration of the reconstruction
process: All selected marker lines are used to compute a object plane which is
merged with the previous extracted 3D point cloud. The red area indicates the
acquired plane.
Results
Figure 8
illustrates the raw 3D point cloud, whereas in Figure 9 the result of our
detailed reconstruction approach is shown. Another high resolution model, which
presents the bell tower on the castle hill in Graz is illustrated in Figure 10.
As expected in both models the segmented regions are free of disturbing
artifacts.

Fig. 8. Illustration of the Herz-Jesu church as
a raw 3D point cloud.

Fig. 9. Different views of the reconstruction result.
In contrast to the previous illustration the reconstructed regions are free of
disturbing artifacts.

Fig. 10. Two views of the front side of the
bell tower represented as high resolution model.
The
automatic creation of digital 3D models from images of real objects can be
split into two main components: the user component, which is represented by
simple human interaction and the computer component which is represented by
more or less computational complex algorithms. In general user interfaces are
directly related to the behaviour of the human interaction. Therefore we
designed the user interface as a monocular 3D modeling system, which emphasizes
the advantages of 2D segmentation and interpretation. Figure 11 illustrates our
implemented user interface which contains two types of windows: image viewer
and model viewer. Typically the user supplies the input to the program by
utilizing the image viewer, whereas the model viewer is used to verify the
reconstruction progress. The bottom image preview box comprises an overview of
the captured photographs, thus a human operator can easily select the
appropriate image for the reconstruction process. Since the reconstruction
problem is already solved a human operator can concentrate on the segmentation
and interpretation of the scene. Therefore the general idea of the user
interface is based on the fact that humans are clumsy at simultaneously
controlling multiple degrees of freedom. Furthermore they are not good at
precise or accurate operations in 3D, especially with a 2D interface such as a
standard monitor and mouse. In contrast to humans computers are the better 3D
operators, because they are not limited to two eyes. Additionally, they are
able to handle multiple views simultaneously. Due to this facts we implemented
our user interface as a monocular 3D modeling system, where the user is
responsible for the segmentation and interpretation in 2D, while the modeling
system deals with the corresponding 3D information. Another benefit of this
concept is that we obtain a full interpretation of the scene in logical units,
like windows, roofs, doors, facades etc. These are the main differences between
our user interface and those proposed in [5] and [10].

Fig. 11. Overview of the user interface of VR
Modeler, implemented as a monocular 3D modeling system.
We have
presented a method to semi-automatically reconstruct virtual environments from
a set of photographs. Furthermore we have discussed the underlying model
representations, as well as our incremental method for 3D model building.
Consequently our approach can be separated into two main components. The first
component is an convenient interactive modeling system to recover a coarse
geometric model of the scene. The second component represents a reconstruction
system to generate an accurate high resolution model of the scene. Additionally,
in VR Modeler we focus on the segmentation and interpretation of the
scene, which is supported by an intelligent user interface. During the modeling
process the need for manual interaction should be minimized to obtain a nearly
automatic reconstruction. While the results are very promising and already
satisfying for many scenes, improvements both in the modeling as well as in the
user interface are suggested. Future work includes evaluating the accuracy of
geometric reconstructions and improve the functionality of the user interface.
Further we will concentrate on an almost automatic 3D reconstruction based on
additional knowledge of the scene. Therefore it would be necessary to integrate
standard recognition techniques into the 3D modeling process.
This work
is partly funded by the VRVis Research Center, Graz and Vienna/Austria and the
Virtual Heart of Central Europe, Towers, Wells, and Rarities 3D Online project,
co-funded by the European Commission in Culture 2000 Framework Programme,
Agreement No. 2003 - 1467/ 001 / 001 CLT CA12.
[1] C.
Baillard, C. Schmid, A. Zisserman, and A. Fitzgibbon. Automatic line matching
and 3d reconstruction of buildings from multiple views. In ISPRS Conference
on Automatic Extraction of GIS Objects from Digital Imagery, IAPRS
Vol.32, Part 3- 2W5, pages 6980, September 1999.
[2] J. Bauer, A. Klaus,
K. Karner, K. Schindler, and C. Zach. Metropogis: A feature based city modeling system. Photogrammetric
Computer Vision (PCV), Septembe 2002.
[3] A.
Bornik, K. Karner, J. Bauer, F. Leberl, and H. Mayer. High-quality texture
reconstruction from multiple views. In Journal of Visualization and Computer
Animation, 2002.
[4] M. Z.
Brown, D. Burschka, and G. D. Hager. Advances in computational stereo. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 25(8):9931008,
2003.
[5] P. E.
Debevec. Modeling and Rendering Architecture from Photographs. PhD
thesis, University of California at Berkeley, Computer Science Division,
Berkeley CA, 1996.
[6] M. Fischler and R.
Bolles. Random sample
consensus: A paradigm for model fitting with applications to image analysis and
automated cartography. Communications of the Association for
Computing Machinery, pages 24(6):381395, 1981.
[7] C.G.
Harris and M. Stephens. A combined corner and edge detector. In Proc. 4th
Alvey Vision Conference, pages 147151, 1988.
[8] B.
Horn. Relative orientation. International Journal of Computer Vision,
4:5978, 1990.
[9] A. Klaus, J. Bauer,
and K. Karner. Metropogis:
A semi-automatic city documentation system. ISPRS Journal of
Photogrammetric Computer Vision, pages Part A 187192, September 2002.
[10] F. Leymarie, A de la
Fortelle, J. Koenderink, A. Kappers, M. Stavridi, B. van Ginneken, S. Muller,
S. Krake, O. Faugeras, L. Robert, C. Gauclin, S. Leveau, and C. Zeller. Realise: Reconstruction of reality
from image sequences. In IEEE International Conference on Image
Processing (ICIP), pages 651654, 1996.
[11]
REALVIZ Image Modeler. http://www.realviz.com/products/im/index.php, April 2004.
[12] D.
Nister. An efficient solution to the five-point relative pose problem. In CVPR03,
pages II: 195202, 2003. [13] M. Pollefeys, R. Koch, M. Vergauwen, A. A.
Deknuydt, and L. J. Van Gool. Three-dimensional scene reconstruction from
images. In Brian D. Corner and H. Nurre, Joseph, editors, Conference on Three-Dimensional
Image Capture and Applications II, pages 215226,
Bellingham,Washington, January 2425 2000. SPIE.
[14] C.A.
Rothwell, J.L. Mundy, W. Hoffman, and V.D. Nguyen. Driving vision by topology.
In IEEE Symposium on Computer Vision SCV95, pages 395400, 1995.
[15] C. Schmid and A.
Zisserman. The geometry
and matching of lines and curves over multiple views. IJCV,
40(3):199233, December 2000.
[16] B.
Schneiderman. Designing the User Interface: Strategies for Effective
Human-Computer- Interaction. Addison-Wesley, 1998.
[17] J.
Zara and P. Slavik. Cultural heritage presentation in virtual environment:
Czech experience. International Workshop on Database and Expert
Systems Applications (DEXA03), 2003.
[18] A.
Zisserman, A. Fitzgibbon, C. Baillard, and G. Cross. From images to virtual and
augmented reality. In Ales Leonardis, Franc Solina, and Ruzena Bajcsy, editors,
Conference of Computer Vision and Computer Graphics, NATO Science
Series, pages 123. Kluwer Academic Publishers, August 2000. ISBN
0-7923-6612-3.