Visual Querying for
Molecular Dynamics
Olga Sourina and Yubo Wang
Nanyang Technological University,
eosourina@ntu.edu.sg
Contents
2.
Molecular Visualization and Analysis Systems
3. Spatio-temporal Querying Model
4.
Design and Implementation of MDVQS
5. Conclusions
and Future Work
Nowadays, biologists deal with
gigabytes of data as the results of molecular dynamics simulation. To understand
and interpret the molecular dynamics results and to come up with new hypotheses
the user often needs visualization and querying tools to be
interactively involved in the process of visual data mining and querying of the
spatio-temporal data. In this paper, we propose a
novel visual query system for visualization of molecular dynamics simulation
results and visual querying. The system design and its implementation are
described. We propose specific visual queries for visual data analysis in molecular
dynamics. The proposed visual query system is unique and allows us to formulate
spatio-temporal queries that cannot be implemented
directly using any available spatio-temporal database
system and/or molecular visualization program. This paper is the result of
collaborative research and graduate study.
The
results of molecular dynamics simulation are spatio-temporal
data describing the movement of atoms/molecules in molecular system. The
volumes of data are extremely large, and thus it is tedious to sort and process
such data one by one. Usually, it is necessary to study part of the data, for
example, data within a certain region and/or time interval. But existing
systems of molecular visualization and analysis do not provide such geometric
queries. After studying domain of molecular dynamics, and molecular
visualization and analysis systems we developed novel techniques for visual
data mining of time-dependant data using arbitrary shape range querying. In
work [1], a geometric query model for relational databases with implicit
functions [2] was proposed. Then, in works [3], the uniform geometric query
model to handle spatio-temporal data and elaboration
on application of the model for computer simulation analysis in molecular
dynamics were proposed and described. In this paper, we describe design and
implementation of the novel visual query system for molecular dynamics
application named Molecular Dynamics Visual Query System (MDVQS). In our
system, we use function-based model of geometric solid in spatio-temporal
predicate to query spatio-temporal data.
Function-based spatio-temporal predicates firstly
were introduced in work [4]. Thus, proposed visual query system MDVQS allows us
to visualize spatio-temporal data and to pose
time-dependent complex shape queries on spatio-temporal
data. First, the molecular system is visualized for all time frames. Then, the
user analyses the data visually and formulates different geometric hypotheses
that can be tested by posing and implementing spatio-temporal
queries. MDVQS allows us to pose complex shape queries on data changing over
time visually. We introduced three basic types of queries for data analysis in
molecular dynamics. MDVQS was implemented by using 3D computer graphics system
Visualization Toolkit (VTK) [5]. MDVQS was coupled to gOpenMol
system to provide the user with full spectrum of other analysis tools as well.
System gOpenMol [6] is a tool for visualization and
analysis of molecular structures combined with several applications for data analysis.
The
paper is organized as follows. In the next Section, molecular visualization and
analysis system gOpenMol is briefly reviewed. Then, spatio-temporal querying model we use in our system is
introduced in Section
2. Molecular
Visualization and Analysis Systems
Molecular visualization and
analysis systems use advantages of computer graphics, data mining, virtual reality,
and even cognitive psychology to provide biologists with a deep insight into
complex structures, fine features, and obscure patterns in large-scale
datasets. There are different molecular visualization programs with
possibilities for geometry analysis and visualization within existing packages
for molecular dynamics simulation. With VMD [7] and GOpenMol
[6] software systems we can display, animate, and analyze large biomolecular systems using 3-D graphics and built-in
scripting. But structure geometry analysis still could be improved. In VMD [7],
extensive atom selection syntax is implemented but range queries are limited to
a sphere region - “find atoms within 6 Å of (1, -2.3, 0)”.
System gOpenMol is a tool for visualization and
analysis of molecular structures combined with several applications for data
analysis and presentation originated from quantum mechanics, molecular dynamics
and other computational chemistry calculations. But gOpenMol
lacks the tools for visual analysis of specific space regions as well. Thus, we
studied visual molecular dynamics system gOpenMol,
and developed additional functions that could bring up new solutions to the
problems in molecular dynamics. gOpenMol is
a tool for the visualization and analysis of molecular structure and its
chemical properties. The program has a graphical user interface (GUI) and an
internal command line interpreter based on the Tcl. gOpenMol can be used for the display and analysis of
molecular structures and properties calculated with external programs. In our
system, gOpenMol can be used to look through dynamic
molecular data presented in “xmol” format. In Figure
1 (a-b), visualization of molecular system at two time frames with gOpenMol is shown.

(a)
(b)
Fig. 1. Visualization of
dynamic molecular data at different time points by gOpenMol:
(a) time frame #1. (b) time frame #2
An example of “xmol” format file is shown in Figure 2.

Fig. 2. Snapshot of “xmol”
file
This is a common molecular
dynamics data format used for input and output in different molecular dynamics software
systems. The number in the first line indicates the
number of atoms and molecules. The second line specifies the snapshot time
point. Other lines start with chemical symbols of the atoms or molecules
followed by x, y and z coordinates of the atoms. There are many
snapshots of data corresponding to time sequence.
3. Spatio-temporal Querying
Model
Now,
let us introduce the formal mathematical specification of the spatio-temporal model used in our system. As it was
mentioned in Introduction, the model was first described in work [1], and it
was extended to handle time-dependant data in [3-4].
In this model, a geometric object can be a set
of points
P = {[x1,
x2., . . ., xn,
t]}= {[X, t]}
in n dimensional Euclidean space En,
and t is time.
Primitive
solid objects are defined with implicit functions as f(x1, x2., . . . ., xn)
³0 in Euclidean space En.
The implicit function f(x1, x2 ,.
. . . ,. xn) ³0 can be defined analytically or by procedure.
Such functions define closed n-dimensional objects in En
space under the following conditions:
f(X) > 0 - for the points inside the object,
f(X) = 0 - for the points on the object boundary,
f(X) < 0 - for the points outside the object.
In the
model, query solid can have time-dependent parameters and/or coordinates. Thus,
the geometric query model consists of the following geometric objects:
·
time-dependent
3-dimensional geometric object formed by n-dimensional points P = {[x1,
x2 . . . . . xn, t ]}
where t is time;
·
time-dependent 3-dimensional primitive geometric
objects for the construction of a query solid using geometric operations.
Here,
we give examples of ellipsoid and cone described with implicit functions as
follows.
Ellipsoid:
G1:
f1(X, t) = 1- ((x1- x0,1[t])/a1 [t])2
- ((x2- x0,2[t])/a2
[t])2 - ((x3- x0,3[t])/a3
[t])2 ³ 0
where x0,1,
x0,2,, x0,3 Î R and a1, a2,
a3 Î R.
Cone:
G1:
f1(X, t) = ((x1- x0,1[t])/a1 [t])2
- ((x2- x0,2[t])/a2
[t])2 - ((x3- x0,3[t])/a3
[t])2 ³ 0
where x0,1,
x0,2, x0,3 Î R and a1, a2,
a3 Î R.
Geometric
operations are applied to primitive geometric objects to obtain complex
geometric shapes at each time point. The analytical definition of set-theoretic
operations is realized in the form proposed by (Ricci 1973)[8],
where operations over implicit functions are considered. Affine transformations
(translation, rotation and scaling) are also used to increase an expressive
power of the proposed geometric model. Geometric operations include
set-theoretic union, intersection, difference, and orthographic projection and
are fully described in work [1, 3-4].
For query
implementation, we apply point/solid predicate introduced in work [4]. Let P
be a point in Euclidean space En and t is time,
G1 be a query solid described with implicit function f1
defined with time-dependent parameters and location changing over time, bG1
be a boundary of G1 and iG1 be
an interior of G1. Then a point/solid predicate is
described with the implicit function representation of the geometric object G1
by a 3-valued predicate:

After studying the
problems of molecular dynamics, the geometric model for visual mining and querying
of spatio-temporal data we proposed three types of
queries that can be easily implemented with MDVQS
4. Design and Implementation of MDVQS
Based on the spatio-temporal query model, we developed visual query system
MDVQS for visual mining and querying results of numerical simulation in
molecular dynamics. We proposed an implemental design for developing the MDVQS.
The system should be able to do the following:

Fig. 3. Proposed design
for MDVQS coupling with gOpenMol
In Figure 3, the proposed
system design is shown. The molecular system data can be imported from external
files in “xmol” format. The results of spatio-temporal querying can be visualized and analyzed in
MDVQS system or exported to gOpenMol system for
further analysis. MDVQS was implemented with VTK [5]. A graphical user
interface was implemented using Win32 API. It consists of two separated windows
for displaying the molecular system and querying objects, and operations
control. The “control window” contains all the necessary operations. It also
shows the operation processing status. The “displaying window” displays the
input data structures, query object, and operation results. The user can load
the molecular data to the system by choosing “file” option in menu as shown in
Figure 4(a). The MDVQS automatically extracts the basic information about
molecular system such as atomic radii, potential bonds, etc. and performs the
molecular structure modeling. The molecular system is displayed as 3D objects
in the “displaying window”. We can choose any geometric query shapes for
querying (including cone, cylinder, cuboid, tube, and
ellipsoid) from the menu option “shapes” as shown in Figure 4(b). The querying
shapes are always visualized as transparent. We can also perform operations
using a mouse including zoom, translation, and rotation of the displaying
objects. Querying of the displaying molecular system by using the posed
querying shapes could be done through the menu option “operation” as well. The
query results are visualized and can be exported to the “xmol”
format file (Figure 5) for further analysis by other molecular visualization
and analysis software. There are different molecular system representations
types, e.g. stick, ball-stick, CPK, licorice [6], each of these representations
is designed to show a particular aspect of a molecular system structure. In our
system, the ball-stick representation was implemented to provide a
comprehensive view of a molecular system structure. In a ball-stick view mode,
an atom is represented by the colorful ball with a specific radius as shown in
Figure 6 (a). Different atoms in the molecular system are represented by
different color balls whose radius is associated with the radius of its
covalent radius. The bond is represented by a small radius cylinder that
connects two atoms. All bonds have the same color and cylinder radius in our
MDVQS.

(a)
(b)
Fig. 4. The GUI: (a) Menu
of data loading, exporting of query result, viewing, and system reset. (b) Menu
of querying objects

Fig. 5. Input and
output “xmol” files for Query Type 1

(a)
(b)
Fig. 6. Query Type 1 : (a) Before the query, (b) Query result

(a)
(b)

(c)
(d)
Fig. 7. Geometric querying
with cuboid: (a) Add the query shape; (b) The query
result; (c) View of query result at time point t1; (d) View of query
result at time point t2
Now, let us
introduce the basic functions of MDVQS and how it works to perform the queries.
Query Type
1. Find and display
trajectories of atoms by atom name or by its exact location (x, y, z).
The user is able to pick
specific atoms that he/she is interested in, by their names or by location, and
only the selected atoms are displayed as changing its location over time. The
number of atoms that could be chosen would be limited due to the large amount
of data that the system may go through. Our tests were done with 500 files of
40-100 MB each.
Firstly, the
MDVQS displays the whole molecular system from the file in “displaying window”
as it is shown in Figure 6(a). Then, the program searches
through the whole list of atoms in the file starting from the first atom.
For example, if the atom C matches the atom name given by the user, the program
saves in the output “xmol” file its time, atom name,
and coordinates as shown in Figure 5. As the result of the query, only the
matched atoms are displayed on the “displaying window” as shown in Figure 6(b).
Query Type
2. Find and display
snapshots of atoms over time in the selected region where the region is the
final query solid constructed as a result of operations over primitive solids.
The region
can be of various shapes such as cuboid, ellipsoid,
cylinder, tube, sphere, and cone. The result of union, intersection, and/or
subtraction operation over the primitive solids can be a query solid as well.
The user loads the molecular data file. Then, the user selects a query shape
from the menu of “control window”, for example, cuboid
as shown in Figure 7(a). The query object is visualized on the screen as well.
The user can translate, shift, and zoom in or out both the query object and the
molecular system separately. The system computes implicit function parameters
of the query object. After selecting intersection operation, the system applies
spatio-temporal predicate for each atom coordinate of
the molecular system and forms an output file of the query result. The query
result is visualized as shown in Figure 7(b). The user can go through all time
frames of the query result and adjust the view of the query result with
operation of translation, shift, and zooming as shown in Figure 7(c-d).
In Figure 8 (a-b) and
9 (a-b), the results of visual querying with cone, cylinder, ellipsoid, and
tube are shown correspondently.

(a)
(b)
Fig. 8. Visual querying
with (a) cone and (b) cylinder

(a)
(b)
Fig. 9: Visual querying with (a) ellipsoid
and (b) tube

Fig. 10. An example of Query Type 3 with time interval setting from 2~6
Query Type
3. Find
trajectories of atoms for a specified time interval [t1, t2].
From the control
window, the user selects the time query type and keys in the time interval as
shown in Figure 10. Only those molecular data with the starting time t1 and
ending time t2 are visualized as the result of the query. The
querying results are always saved in “xmol” format,
and could be visualized and analyzed with gOpenMol
system as well.
5. Conclusions and Future Work
In this paper, we described
visual query system MDVQS for visual data analysis in molecular dynamics. The
proposed system allows the user to visualize results of molecular dynamics
simulation and to pose spatio-temporal queries on
time dependant data visually, and to visualize query results and the process of
querying as well. The MDVQS was implemented with the advanced computer graphics
tools on Microsoft Windows platform. We integrated visual mining with querying
of time-dependent data in one GUI. We introduced three basic types of queries
for visual data analysis in molecular dynamics. In future, we are going to
implement queries with arbitrary shapes changing over time as well. The
developed system allows the user to visualize the results of mining and
querying, visualize the querying process, and analyze spatial relationship
inside the molecular data changing over time. With MDVQS, the user can come up
with new biological hypotheses and test their validity.
[1] O. Sourina and S. H. Boey,
Geometric Query Types for Data Retrieval in Relational Databases, Data &
Knowledge Engineering, Elsevier Science B.V., Vol. 27(2), pp.207–229, 1998.
[2] J. Bloomenthal, An
Introduction to Implicit Surfaces, Morgan-Kaufmann,
[3] O. Sourina, N. Korolev,
Geometric querying of time-dependent data for data mining in molecular
dynamics. In Proc. of Cyberworlds 2004,
[4] O. Sourina,
N. Korolev, Visual Mining and Spatio-Temporal
Querying in Molecular Dynamics, Special issue on Computational Intelligence for
Molecular Biology and Bioinformatics of the Journal of Computational
and Theoretical Nanoscience, American
Scientific Publishers, Vol. 2(4), 2005.
[5] W. Schroeder, K. Martin, and B. Loresen,
The Visualization Toolkit An Object-Oriented Approach To 3D Graphics 3rd
Edition, 2000.
[6] D. L. Bergman, A. Laaksonen and L. Laaksonen, Visualization of Solvation
Structures in Liquid Mixtures, J.Mol.Graph.
Model.,
Vol. 15, pp.301-306, 1997.
[7] W. Humphrey, A. Dalke and K. Schulten, VMD - Visual Molecular Dynamics, J.Mol.Graph., Vol. 14, pp.33-38, 1996
[8] A. Ricci, A
Constructive Geometry for Computer Graphics, The Computer Journal, Vol.
16(2), pp.157-160, 1973.