Computer Graphics & Geometry
Parallel Implementation of Terrain Erosion
B. Beneš and R. Forsbach
ITESM Campus Ciudad de México
e-mail:beda@campus.ccm.itesm.mx, r_forsbach@yahoo.com
Contents
Abstract
We present a parallel version of the algorithm that simulates thermal erosion [11]. The algorithm splits the input data into the strips that are assigned to different processes and runs the erosion in parallel. When this task is finished the processes exchange information about the material transported through their boundaries, update their data, and run new erosion step. We use message passing for process synchronization. First, the information about the material transported through the boundary is saved into files and then messages are sent to the corresponding processes informing that their data is ready. Using files for process synchronization assures the implementation to be platform independent.
The parallel version is stable and runs very well on large data. We have achieved speedup 8.4 on ten CPUs. With small data the method is devoting too much effort to communication and the speedup decreases. We have tested this algorithm on the 3D map of martial surface obtained from Mars Global Surveyor [14].
Keywords: Terrain erosion, message passing, parallel algorithms, Mars Global Surveyor, voxels, volume graphics
1. Introduction
One of the challenging tasks helping us to understand the nature is its simulation. Nowadays, we are receiving data from spacecrafts visiting distant parts of the solar system. Processing and analyzing the data presents a great problem. Mars Global Surveyor spacecraft (MGS) has approached Mars in September 1997 and one of its tasks is to measure altitude of the martial surface, providing a 3D topographical map of Mars. The MGS uses Mars Orbiter Laser Altimeter (MOLA) for measuring its distance from the surface and till now this spacecraft has delivered a great amount of scientifically important data (see the MOLA web page [14] for details).
An important problem is modeling of the geological behavior of planets. Visual simulation, a term coined by computer graphics community, stresses visual plausibility as a proof of correctness of a simulation. That is why many algorithms for generating visually plausible models for purposes of 3D graphics are getting closer to physical simulations. Unfortunately time and space demand of these techniques leaves the majority of them practically unusable for modeling. On the other hand they help to solve scientific problems from geology.
In this paper we are focusing on erosion of the data obtained by the MGS spacecraft. Their amount, and the time and space demand of the algorithms, has led us to parallel implementation of the erosion algorithm on multiprocessor platform. The approach we have implemented works with the algorithm of thermal erosion [11] later extended by [1]. The purpose was not to implement perfect and complex algorithm, but to run an existing one in parallel. Our approach can be thought as a framework for implementation of these classes of algorithms.
This paper is structured as follows. After the section introducing the previous work, we describe the thermal erosion algorithm and make its analysis. The Section 4 describes the parallelization of the algorithms and in the next section we describe implementation of this approach. The most important Section 7 describes results of our simulation and mostly focuses on speedup and limits of the parallel version of our algorithm. The last section of the paper presents opened questions and the future work.
2. Previous Work
This section describes some algorithms simulating soil and fluvial erosion published in the context of computer graphics.
Musgrave in 1989 [11] has introduced probably one of the first approaches to erosion of artificial terrains. The purpose of this erosion model is to improve visual plausibility of fractal surfaces. The algorithm that serves as a basis of our simulation will be described in depth in Section 4.
Kelley et al. [7] introduced ad hoc algorithm working in the opposite way. Instead of eroding perfect fractal terrain, they generate landscape that corresponds to underlying structure of water streams. Fractal interpolation is extensively used here for generation hills supporting the water streams.
The time demand of these techniques has been focused in [2]. The authors are using semi-adaptive algorithm that leaves the non-important parts and is eroding only the areas with a high gradient or importance.
Another physically based model has been introduced by Chiba et al. [3]. Here the running water on the surface defines a velocity field array that is used for terrain erosion. The energy of the running water is evaluated and this is applied to change the shape of the terrain. Some portions of the material are transferred depending on the material resistance and the energy of the water.
The latest works of Musgrave [5,11,12] do not primarily simulate erosion-based models but generate landscapes by blending some elaborated noise functions (Perlin, fBm, etc.) These approaches give visually excellent results, but there are not known any facts that relate these techniques to reality. The author claims that the erosion is a much stronger tool but the number of parameters and the time that is necessary to run the simulation is making these techniques practically unusable.
The second argument leads straight to the idea of parallel implementation of the algorithm. An analysis of the erosion algorithms shows that the process has some good properties that help us to implement the algorithm in parallel. This is the main focus of this paper.
Many soil erosion algorithms have been published in the journals focusing remote sensing (see for example [6,15]. The algorithms are mostly based on moving some material in a discrete matrix from cell to cell according to more or less elaborated physically based techniques. We have seen the same schema in all cases and it could be implemented easily using approach presented in this paper.
3. Erosion Algorithms
Most of the algorithms for soil erosion simulation are based on the following data representation. The terrain is supposed to be represented as a two-dimensional elevation grid consisting of so called cells as displayed in Fig.1.
|

|
|
Fig. 1 Example of one cell (in the middle) and its eight neighbors. |
Each cell of the grid is an average representative
of the underlying area, i.e., one cell can represent
area of n x n meters. All values kept
in the cell are statistical representatives (typically
the average, or one sampled value) of the underlying
area. The cell has a set of properties, namely amount
of water and soils that it contains, altitude, average
gradient plane that corresponds to the slope of the
surface, etc.
The cell communicates with its eight neighbors by
exchanging information about the transferred material.
Typically, depending on the amount of water and the
gradient of the cell, some amount material is entering
the cell and some is leaving. The material is either
dissolved in water, or it can be transported in a form
of dust or mud etc. depending on the simulation.
The flow of one cell can be the best characterized
by the differential equation:
Where Qin is the incoming flow, Qout is the cell’s outlet, S is the volume of transferable material stored in the cell, and t is the time.
The algorithm of [11] also manipulates the material in the volume directly (see next Section) whereas the algorithm of Chiba et al. [3] uses water as a medium that transfers the material. Algorithms published in [6,15] transfer both, water and the material. More sophisticated techniques are used for modeling the transport of eroded material. They include human factors, like hedgerows, roads, etc.
Benes [1] has extended the data representation by a layered data structure. Instead of keeping one value for the top of the surface or representing the volume as a set of voxels, the terrain is supposed to be consisting of layers of equal material as shows Fig.1. This representation allows simulation of subterraneous water, slumping of the surface, simulation of caves etc. Advantage over the voxel representation is smaller demand on the memory and better correspondence to reality, where this kind of structure is obtained by geological measurements. We are using this data representation in our simulations.
4. Thermal Erosion Algorithm Analysis
The algorithm works on a 2D array of cells. Each element of the array keeps complete information about the underlying layers. Example of one element and its neighbors is given in Fig 1. The two leftmost elements in the front row keep three underlying layers, whereas the rightmost element in the front has just two layers. Each layer has different parameters, i.e., it can have certain amount of water, soils, gas, etc.
|

|
|
Fig. 2 Principle of the thermal erosion. One half of the highest gradient of the mid element is proportionally distributed to its neighbors. |
The algorithm of thermal erosion [11] distributes
portions of material depending on the height of the
element. This is schematically shown in Fig.2.
It is supposed that the material is broken-up due to thermal shocks, i.e., due to changes of the temperature during the day and night. Part of the material from one cell simply falls in the direction of the highest gradient i.e., it is distributed to the neighbors. To simulate inner friction of the material, so called talus angle is introduced. If the gradient is smaller than the certain predefined value, the material is not transported.
The total volume outlet S of a cell is given by the equation:
where h is the highest difference to the neighbors, 0<r<1 is the resistance coefficient and a is the surface area of the cell.
The algorithm processes each cell of the terrain and measures highest gradient to its neighbors. Half of this value gives the portion of the material that can be moved. This is the maximum that can be moved and this corresponds to completely weak material with very low inner friction. If the material is harder, this portion is multiplied by coefficient r that corresponds to the resistance. The volume to be removed is ah/2. The volume that is moved to each neighbor is determined by the gradient of the cell i.e., the neighbor with highest gradient will receive the greatest portion of the material.
Physical meaning of the previously mentioned talus angle is following. If the gradient given by the explored cell and its neighbor is smaller than the corresponding talus angle, the material is not transported to this position. If the potential energy of the material cannot overcome the inner friction, the material stays in its position.
Since the represented terrain consists of different layers, we can easily simulate erosion of different materials. Fig. 3 from [1] shows an example of the terrain erosion. The terrain consists of two different materials. One, on the first frame on the top, is very weak and is eroded easily. The underlying material cannot be eroded and remains fixed.
|

|
|
Fig. 3 An example of the layered erosion [1] |
The time demand of the algorithms is very high. We have to process each cell, evaluate gradients and move some volume of the material. The terrains are usually huge and these two facts cause the practical usability of these techniques to be very low.
On the other hand the algorithm itself has some good properties for example the computational time needed for evaluation of one cell is almost constant. The two extreme cases that can happen are completely flat area, where the algorithm performs no material transport and very noisy terrain, where the material is transported from each point to each side. In very unruly terrains, like the ones obtained from real data, we can assume that both limiting cases of the cells will be represented with more or less equal probability, so in average the time needed for evaluation of one cell is constant. The program for erosion of one cell is therefor consisting of two significant parts: gradient evaluation and the material transport. The first part must be performed for every cell and the second just in some cases.
5. Parallel Implementation
5.1. Data Subdivision
One important characteristic of the erosion algorithm is that it can be also described as a process communication. Cells can be thought of independently and can be treated as communicating computational units. Fig.4 schematically shows this view of the data. The cell communicates with its neighbors using duplex channel that transfers the material for each layer. This viewpoint is important for a parallel implementation.
|

|

|
|
Fig. 4 Material transfer among one cell and its neighbors |
Fig. 5 Data subdivision. |
Moving on a higher level of hierarchy, we can divide the underlying surface to some smaller areas that communicate with their neighborhood. We can group some cells, evaluate their input and output and let them communicate. This, together with the above-mentioned facts about the almost constant computation time needed for one cell, has led us to the following parallel implementation.
We divide entire extent, i.e., the 2D array of layers, into equally long and wide strips. Each strip is hosted by one process (in the best case also by one CPU) and the erosion algorithm is run in parallel. When the erosion of each strip is evaluated, the boundary is explicitly recomputed. The processes exchange some data, recompute the boundaries, and then each process continues with the next erosion step.
5.2. Algorithm
This parallel implementation can be schematically described as follows:
1. Split the terrain into n strips.
2. Evaluate one erosion step for each strip in parallel.
3. Exchange information about the material passed through the boundary areas in parallel.
4. Update the data in parallel.
5. Go to step 2
There are some reasons for doing this. First, the advantage of this algorithm is that each computational unit keeps the portion of the data and we exchange just the boundary information. We do not need shared memory and each computational unit can use independent memory. This allows us to implement our algorithm in a heterogeneous environment. Dividing the data into the strips is done just once at the beginning. We have also tried version, where the data were repeatedly divided and melted after each erosion step, but the time demand of this brute force technique was very high.
Second reason for division the data into the strips is the communication demand. In this case each process communicates at most with the two neighbors. Because there is certain time needed for synchronization, i.e., when one process finishes before another one it cannot continue without updating the boundary and has to wait for data, we want to diminish the communication as much as possible. Fig 5 shows the data subdivision, each strip is hosted by one process and the arrows show their communication. The first and the last process communicate only with one neighbor, whereas the processes keeping inner strips have to exchange some data to both sides. In the case of the thermal erosion algorithm implementation the only information that is exchanged is the amount of material transported by the neighbor cells from each strip.
5.3. Process Synchronization
Each process first runs the erosion independently to the neighbors. When all the cells of the terrain processed by one computational unit are eroded, the process waits for the data from the neighbors and sends its data to them. We are using message passing for this purpose. This is implemented as a file generation that assures the system to be running on any platform (even Win98).
After finishing the computation of the strip the process saves the boundary data on disk (in our implementation the largest file had 1M) and sends message to the neighbor. Schematically, for the process keeping the inner strip, this can be written as follows (this is detailed description of the steps 2-4 from the previous section.):
1. Erode the strip.
2. Save the left boundary.
3. Send message that the data is ready to the left neighbor.
4. Save the right boundary.
5. Send message that the data is ready to the right neighbor.
6. Wait for message from the left neighbor.
7. Read the data generated by the left neighbor.
8. Erode the left boundary
9. Wait for message from the right neighbor.
10. Read the data generated by the right neighbor.
11. Erode the right boundary
12. Go to the step 1.
The leftmost and the rightmost processes do not perform step 2, 3, 6, 7, 8 and 4, 5, 9, 10, 11 respectively, because the left, resp. the right neighbor does not exist. It is important to save the data first and then to wait for the message otherwise the processes will deadlock.
There is no need for a message delivery confirmation. The process just saves the data and sends the message that the data is ready.
The processes are identified by the number of the strip that host (i.e., 1,2,…,n), where n is the number of the running process. After erosion of the entire data every inner process saves two files, called i.L and i.R, where i is the number of the process and L resp. R signifies the left and the right boundary. This corresponds to the steps two and four from the algorithm. Then the messages informing the neighbors are sent (steps three and five).
We have implemented the message passing in the form of files as well. We could use IBM POE library, but we want to have system that is platform independent and simple. Each file (that has zero size) is called ij.msg, where number i describes the sender and j the receiver of the message. So for example the seventh strip will be expecting data from the left neighbor in the file 6.R and it will be waiting for existence of the file 67.msg. At the same time, if this is the process hosting the inner strip, it will generate two files 7.L and 7.R. Then it sends two messages, 76.msg for the process six informing that the data from the left boundary is ready, and 78.msg for the process eight. It is important to note that the files with data cannot be used as messages, because the waiting process could try to read to the file into which the other process is just writing. So our message passing can be thought as a form of locking the files.
The following example with three processes should explain better the synchronization. The image shows the situation when all processes have finished their work and have saved the material from their boundaries. The leftmost process (denoted by one) has generated file 1.R, the inner process (two) has saved files 2.L and 2.R and the process three has saved file 3L.

Then, as displayed in the next figure, all processes generate messages informing the neighbors about the existence of the files with the material from the boundaries.

Here the process one has sent message 12.msg to the process two. This process has sent message 21.msg to the first process and message 23.msg to the process three. Finally the last process has sent message 32.msg to the second process.

Sending the messages about the availability of the data causes reading the files with data by the corresponding processes as displayed in the last figure.
This kind of synchronization has an important advantage. Since all processes will finish at finite time no deadlock can occur.
The only real problem that can occur is in the case when one of the processes is killed. In this case the message will be missing and the other process will be waiting forever. We do not consider any dead reckoning in our implementation.
The synchronization by files is simple to implement and has also one important practical advantage. Since the waiting for the message is active, the operating system will probably not dealocate the process from the CPU that can occur in the case of the passive waiting. This means that the process will remain with high probability at one CPU and this will not delay the synchronization of the other processes and the entire system.
6. Implementation
The program was implemented in C under IBM AIX version 4.3. We have run it on IBM S80 R6000 with 12 CPUs RS64-III equipped by 18GB of the operating memory.
The entire system is consisting of three groups of algorithms. First is the basic set of routines manipulating the data. We have programs for manipulating the strips - one for splitting the original data and another one for gluing the strips back for further processing. We have also implemented set of routines for converting the data from the NASA EGDR data format [14] into our inner representation. We have programs converting this into the data suitable for displaying (we use OpenGL for previews and POVRay for raytraced images).
The second group of programs includes the UNIX scripts running all the processes.
The third, and the most important program, is the erosion algorithm itself. As mentioned above, the program reads the strip just once, at the startup, and keeps it in the memory.
7. Results
7.1. Speedup
To test the system, we have run 100 erosion steps of the surface of Mars on different levels of detail. The graph and the table in the Fig. 6 show the results. All values are expressed relatively to the speed of the sequential version and the last row shows the real runtimes of the sequential version in minutes. The measured tests were done on 24 million cells (denoted by MC). Corresponding size of the file is 98MB and the sequential version of the erosion runs 2.7 hours. We have also tested the system on 20MC, 15MC, 10MC, 5MC, 2MC and 1MC. All measurements were done three times, values differing more than 20% were excluded, and the average value is displayed. The total runtime of the tests was 8 days. During these tests the computer was totally devoted to the project and no other computations have been done there. This helped clarify the results, because the amount of swapping, cache misses, etc. caused by other running processes was not influencing the computations. It is important to note that the system was perfectly stable and the program did not crash at all.
 |
 |
|
Fig. 6 The speedup as a function of the number
of CPUs and size of the data. The last row of the table shows runtimes of the sequential version and the other values are expressed relatively to t |
The results are not very surprising. The speedup depends on the communication and it shows that the idea of decreasing the amount of communication with dividing the data into strips is correct. The small amount of data means a little computation and high communication overhead needed for the synchronization of the processes. This is clearly visible from the graph. For small data, like 1MC, it makes no sense to run the program on more than four or five CPUs. The speedup goes down very fast because of the communication overhead. The speedup is almost linear for huge data. Of course this is true as far as we can measure. Certainly with higher number of CPUs the speedup will go down as well, because it will drown in communication. For 24MC the speedup was 8.4 on 10 CPUs.
Number of the erosion steps
We were also interested on how the speedup depends on the number of iterations. We have measured the speedup after 20, 50, 100 and 150 iterations for two till ten CPUs. We ran the test on all data we have, i.e., 1MC, 2MC, 5MC, 10MC, 15MC, 20MC and 24MC. We have performed the test three times and computed average of the obtained values. The graph and table in Fig. 7 show the result.
|

|

|
|
Fig. 7 The speedup as a function of the number
of the erosion steps and used CPUs.The last row of the table shows runtimes of the sequential version. The speedup of the erosion increase as the surface flattens. |
As mentioned above, the communication is higher for higher number of processors as well as for wild terrains. The erosion smoothes the terrain, that leads to the equalization of the runtime of each processing unit. This is clearly visible from the graph. For low number of CPUs (2-4) the speedup is almost constant, regardless to the number of iterations. With higher number of CPUs, the speedup goes up with the number of iterations.
Figures 8 and 9 show ray-traced images of some interesting areas of Mars. Animations can be found at http://paginas.ccm.itesm.mx/~beda
|

|

|
|
Fig. 8 Part of the Valles Marineris (left) after
150 erosion steps (right). The images capture area 1000x1000km.
One cell corresponds to 3.7x3.7km |
|

|

|
|

|

|
|
Fig. 9 Side (up) and top view (down) of part of the area Cimmeria Terra (left) after 50 erosion steps (right). Images display area 1800x1800km. One cell corresponds to 3.7x3.7km |
8. Future Work
One of the ways to improve the speedup of the algorithm is by elaborating the assumption that the computational time for each cell is constant. This is certainly not true and this could be taken into account. We could make some statistics of the data distribution and run adaptive splitting of the data. More wild terrains would be split into small parts, because here the transport of the material is more intensive.
Another option is to incorporate some kind of dynamical load balancing. Apparently, the structure of the terrain will change over time, but there is no problem to keep track about the transported material. This information can serve as an input of some critical function. If the value of this critical function exceeds predefined threshold for some strip, it means that the load of the host CPU is too high and portion of the data should be moved to the neighbor. This dynamical load balancing can certainly be implemented. The question is how high would be the overhead for measuring the amount of transferred material, evaluating the critical function, and redistributing the data again.
Possible future works include implementation of algorithms from [3,6,15].
9. Conclusions
In this paper we have shown that the recently published erosion algorithms simulating soil erosion can be implemented in parallel. We have implemented thermal erosion algorithm as a representative of these techniques. We divide the terrain data into blocks that are run in parallel. After each erosion step the host process synchronize the data by exchanging the material transported through their boundaries and sending messages informing neighbors that the data is ready. We have shown that this implementation has good properties in the case of huge data and slows down with small ones.
The speedup is higher for bigger data, because for the small one the communication delays the entire computation. In the case of huge data the units run more or less independently and the communication occurs almost in the same time.
Acknowledgements
We would like to thank Jake de Boni and Pavel Tvrdik for proofreading of the paper.
This work is supported by CONACyT and ITESM.
References
B.Benes, and R.Forsbach, Layered data Representation for Visual Simulation of Terrain Erosion, Accepted to IEEE SCCG’2001
B.Benes, and I.Marak, and P.Simek, and P.Slavik, Hierarchical Erosion of Synthetical Terrains, In Proceedings of SCCG’97, pp.93-100
N.Chiba, and K.Muraoka, and K.Fujita, An Erosion Model Based on Velocity Fields for the Visual Simulation of Mountain Scenery, In The Journal of Visualization and Computer Animation 9, 1998, pp.185-194
A.R.Dixon, and G.H.Kirby, Data Structures for Artificial Terrain Generation, In Computer Graphics Forum 13, 1994, pp.37-48
D.S.Eckbert, and F.K.Musgrave, and P.Prusinkiewicz, and J.Stam, and J.Tessendors Simulating Nature: From Theory to Applications. In SIGGRAPH 2000 Course Notes, 2000
C.Gailard, and F.Zagoski, and F.Bonn, Modelling of Human Dimension on Soil Erosion Processes for Remote Sensing Applications, In Proceedings of IEEE Geoscience and Remote Sensing Symposium 1999, IGARSS’99, Vol.4, pp. 2137-2139
A.D.Kelley, and M.C.Malin, and G.M.Nielson; Terrain Simulation Using a Model of Stream Erosion, In Proceedings of SIGGRAPH 1989, Vol 23(3), pp.263-268
B.B.Mandelbrot, The Fractal Geometry of Nature, W.H.Freeman, 1982
I.Marak, and B.Benes, and P.Slavik, Terrain Erosion Based on Rewriting of Matrices, In Proceedings of The Fifth International Conference in Central Europe on Computer Graphics and Visualization – WSCG, 1997, pp.341-351
D.Merritts, and A.de Wet, and K.Menking Environmental Geology. To W.H. Freeman and Company, 1998
F.K.Musgrave, The Synthesis and Rendering of Eroded Fractals Terrains, In Proceedings of SIGGRAPH 1989, Vol 23(3), pp.11.1-11.9
F.K.Musgrave, Towards the Synthetic Universe, In IEEE Computer Graphics and Applications, November-December 1999, pp.10-13
A. Watt and M.Watt, Advanced Animation and Rendering Techniques: Theory and Practice, Addison-Wesley Readings, 1992
http://ltpwww.gsfc.nasa.gov/tharsis/mola.html
F. Zagolski, and C. Gaillard, A Modelling Study of the Human Impact on Soil Erosion Processes with an Agricultural Watershed, In Proceedings of IEEE Geoscience and Remote Sensing Symposium 1999, IGARSS’99, Vol.4 pp. 2152-2154