Gis Spatial Data Structures

95.
5204 :
Spatial Data Structures

for GIS
Jörg-Rüdiger Sack
School of Computer Science, Carleton University

Ottawa, Canada K1S 5B6, sack@scs.carleton.ca
© Jörg-Rüdiger Sack Course Notes

School of Computer Science Computational Aspects of GIS
Carleton University
Geometric Objects
A geometric object is an object which characterizes a
geometric component, i.e., the
• location and
• shape
of the object in space.
In addition, there is the attribute component which we will

ignore for the discussion in this chapter.

Carleton University
Example
Planar subdivisions for example are collections of polygons
which represent towns or municipality regions.
The geometric information about the location of the place is
stored through the polygon.
(Non-geometric information such as name, size, …. are also

stored.)

Carleton University
Operations
There are many operations that need to be carried out on
geometric objects, these include:
• point in polygon (point location)
• traversal of a subregion (window queries)
• intersection tests
• ….
• other operations include:
– distance, containment, intersection

Carleton University
Operations cont’d
1. objects are stored on disc examining, i.e., retrieving all objects
is extremely inefficient!
2. checking each object is time-consuming (even after retrieval) as

the geometry may be complex.
Idea: support spatial queries to geometric objects by realizing a

filter, i.e., providing a superset of the solution set and
subsequently refine that set to the correct solution.

Carleton University
Filter
Sometimes this approach is referred to as
coarse filter
fine filter
where coarse filter refees to the retrieval of a subset of

adjacent objects
followed by the fine filter which analyzes geometric
properties of the objects.

Carleton University
The Idea of a Filter
Create a bounding box for 2-d geometric objects.
Bounding box: =
smallest axis parallel rectangle containing the geometric object
The database search key for the geometric object is now that of the
bounding box.
There are many data structures for multi-dimensional
For d dimensional objects, let Ui = universe in the ith dimension. Then

U = U1x U2 x U3 … x Ud is the d-dimensional universe containing all
geometric objects.

Carleton University
Filter cont’d
G : be a particular set of geometric objects
g ε G described as:
– g.b d-dim bounding box
– g.rest other attributes that are not relevant for the search
g = (b, rest)
b= (l1, r1, l2, r2,…, ld, rd) d-dim interval

[l1, r1] x … x [ld, rd] where b.li : left and r.ri is the right interval
boundary of the ith interval.
we use: g. li for g.b. li and g. ri for g.b. ri

Carleton University
Example
dim 2
r2
l2
dim 1
l1 r1

Carleton University
The Task
Task: find a secondary storage structure S supporting the
following operations:
(1)Range query
(2) Search
(3) Insert
(4)Remove (delete)
more formally next

Carleton University
Rangequery
Rangequery (w, S(G))
range w, G is stored in S
report all objects g in G with g.b ∩ w ≠ Ø
assumption: two rectangles that only intersect at a boundary

do not intersect, i.e.,
intersection (A,B) := closure (interior of A ∩ interior of B)

Carleton University
Rangequery cont’d
2
1 7
3 4
reports: 1, 6, 3, 5

Carleton University
Search
Search (b, S(G))
for bounding box b and G stored in S

report all objects g in G with g.b =b

Carleton University
search - example
the object g (blue)

has bounding box
matching the query
box
g’

Carleton University
Search
Insert (g, S(G))
S(G) := S(G U {g}) add g to G and store it in S

Carleton University
Remove (Delete)
Remove (Delete) (b, S(G))
remove object g is g.b = b and

S(G) := S(G \ {g}) remove g from G and store the result

Carleton University
Comments
1. While uniqueness is somewhat the underlying assumption it does not
pose any serious implementation difficulties.
2. For insert, search and delete

the key is spatial, but
the spatial location is not referenced
-> this can be handled by traditional secondary data structures such
as B-trees, dynamic hashing, …
e.g., map the 2d key components into one 1-dimensional key
(lexicographic)

Carleton University
Comments
Thus searchers can be handled!
Problem: Queries of type Rangequery
they are space relevant and the above storage schemes show
serious deficiencies

Carleton University
Objective
Find data structure for geometric objects such as points,
polygons etc that allow efficient retrieval.
Primary concern:
When accessing data, long chains of pointers that are
crossing disk block boundaries must! be avoided.
Game: design data structures with

– small internal memory access structure
– efficient dynamically updates

Carleton University
Basic Concepts
Basic Concepts for spatial structures
access time: DRAM (dynamic random access memory) chips for

personal computers have access times of 50 to 150 nanoseconds
(billionths of a second).
Fast hard disk drives for personal computers boast access times of about 9
to 15 milliseconds.
Note that this is about 200 times slower than average DRAM.

Carleton University
Basic Concepts
Actually many machines have even larger ratios than that.
Typical numbers are:

Memory access time (seconds): 10-7 … 10-6
Disc access time (seconds): 10-2 … 10-1
ratio disc/memory access time: 104 … 105

Carleton University
Basic Concepts
Typical size of transfer unit (bits):

Memory : 10 … 102
Disc : 104 … 105
ratio disc/memory transfer size: 102 … 103

Carleton University
Basic Concepts
The time for an operation is thus determined by the time to
retrieve the data + the time required to carry out the local
computation.
For many operations, # of disc accesses is the dominating

factor. However, there are geometric problems where also the
internal computations are costly.

Carleton University
Objective
Find data structure for geometric objects such as points,
polygons etc that allow efficient retrieval.
Primary concern:
When accessing data, long chains of pointers that are
crossing disk block boundaries must! be avoided.
Game: design data structures with

– small internal memory access structure
– efficient dynamically updates

Carleton University
Proximity
Data on discs are seen to be organized in BLOCKS.
A block is a unit of data that is retrieved in one shot from a
disc.
A block contains many data, these should be useful for the

algorithm and its execution,.
1. local maintenance of proximity; i.e, physically close in
space
2. global maintenance of proximity; objects stored in
adjacent blocks are physically close.

Carleton University
Proximity
especially the last points is very difficult to obtain.
There is no perfect data organization!
Even small improvements in that, yield accelerations that are
noticeable.

Carleton University
Central issue
Organizing the embedding space versus organizing its
content.
We will discuss data organizations who are dependent on the

data and mostly those who are dependent on the space.
This is the key distinction between space and non-spatial data
structures.

Carleton University
Non-spatial data structures
Data structures for non-spatial data any search structure that
you may have encountered for example: binary search tree.
•searches are comparative:

•structures exist and are readily available also balanced
– AVL, 2-3 trees, red-black trees
excellent search structures also for statistical queries including

median, percentiles,

Carleton University
Non-spatial data structures
Such data structures are not designed for, nor can they
efficiently handle:
• general location queries
– nearest neighbour
– identify clusters in data

Carleton University
Review of address computation schemes
1. Hashing
2. radix trees
3. tries
these assign an address of a storage cell to any key value x
(course notes)

Carleton University
k-d trees
k-d trees were invented by Bentley ’75
as generalizations of search trees i.e. comparative
other relevant structures:

Lueker 78, Lee&Wong ’77, Willard’78, Bentley’79,
Bentley and Maurer’80

Carleton University
k-d trees
An example:
x : 50 dim 1
y : 15 y:4 dim 2
dim 3
…
dim d
dim 1
dim 2

Carleton University
k-d trees
Problems:
• it is hard to balance these structures, i.e., get log
height
• 1-d is easy
• space partitioning created lacks regularity
• difficult neighbour queries

Carleton University
First approaches
First approaches to spatial data structures
• based on the existing search structures
• data stored!
• not the space in which the data was embedded

Carleton University
filter illustration for
a rectangular space partitioning
hit query
cells
query q
report all
objects that
intersect q
drop ignored
the oval is examined not retrieved
and then droped

Carleton University
Comment
Spatial data structures cover the space with cells.
Each cell is stored on disc and therefore is associated with a
disc block or blocks.

Carleton University
Three-phase model
Three steps:
1. Cell addressing
for a given query find all “cells” of the partitiong
that could contain elements relevant to query
2. Coarse filter
retrieve the elements found in Step 1 from disc
3. Fine Filter
examine the elements (Step 2) if they fit the query

Carleton University
Tree-based schemes
Work has been done on the internal memory data structures:
segment trees and range trees
and how they can be extended external storage.
This is not covered here. Could be a good topic for a class

presentation.

Carleton University
Three philosophies
1. Space driven:
1. multi-dimensional linear hashing,
2. space filling curves
3. ...
2. Data driven
1. k-d-B-trees
2. ….
3. Combinations
1. grid file and its variants
2. Bang file, ….

Carleton University
Linear hashing
viewed as a spatial data structure
partition the 1-d data space into intervals
0 1
0 2 1 3
0 4 2 5 1 6 3 7
interval sizes half of previous; simple addressing scheme

Carleton University
doubling
Doubling is typically adding a bit to the front (or back) of the string
created thus far.
e.g., in some of the schemes you would see
0 1 00 10 01 11
added
bit
this means that when you run out of space a piece of the same size is appended
resulting in a doubling of the space used.
However address calculations are simple!

Carleton University
MOLPHE
Multidimensional Order Preserving Linear Hashing
2 3 2 5 3 7
0
0 1
0 1
0 4 1 6
Note the alternation of split in the dimensions. 1st split by x; 2nd split by y;
3rd split again by x-axis. Note also the each block is split.

Carleton University
z-hashing
Dynamic z-hashing
1 3 2 3 6 7
0
0 1
0 2
0 1 4 5
Note the addressing function is different to the one given above.

The reason is that proximity is better maintained between adjacent blocks.

Carleton University
space-filling curves
The above schemes define a traversal of the space.
Here we list other space filling curves that are typically used.
They have different properties and studies have been carried
out on them.
E.g., Peano, z-ordering and Hilbert

Carleton University
space-filling curves
Hilbert
Z-order
G.M. Morton

Carleton University
Z-order
z-order of
a point with
coordinate x,y
is obtain by
bit-wise
interleaving of
the x and y 25
bits.
Ex.:
y = 2 = 010
x = 5 = 101
25 = 0 1 1 0 0 1

Carleton University
Z-order
z-order of
a point with
coordinate x,y
is obtain by
bit-wise
interleaving of
the x and y
bits.
range queries
are possible
slight care
needs to be
taken to find
successors of
point in z-
order
Carleton University
Hilbert curve: maping
range queries
more natural,
but successor
function more
difficult than
with z-ordering.

Carleton University
Hilbert curve cont’d
direction in which
to draw the elements
of the Hilbnert curve

Carleton University
Peano

Carleton University

Gis Spatial Data Structures

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Gis Spatial Data Structures

Uploaded by

Copyright:

Available Formats

95.

Spatial Data Structures

School of Computer Science, Carleton University

© Jörg-Rüdiger Sack Course Notes

In addition, there is the attribute component which we will

© Jörg-Rüdiger Sack Course Notes

(Non-geometric information such as name, size, …. are also

© Jörg-Rüdiger Sack Course Notes

© Jörg-Rüdiger Sack Course Notes

2. checking each object is time-consuming (even after retrieval) as

Idea: support spatial queries to geometric objects by realizing a

© Jörg-Rüdiger Sack Course Notes

where coarse filter refees to the retrieval of a subset of

© Jörg-Rüdiger Sack Course Notes

There are many data structures for multi-dimensional

For d dimensional objects, let Ui = universe in the ith dimension. Then

© Jörg-Rüdiger Sack Course Notes

b= (l1, r1, l2, r2,…, ld, rd) d-dim interval

© Jörg-Rüdiger Sack Course Notes

© Jörg-Rüdiger Sack Course Notes

more formally next

© Jörg-Rüdiger Sack Course Notes

assumption: two rectangles that only intersect at a boundary

© Jörg-Rüdiger Sack Course Notes

© Jörg-Rüdiger Sack Course Notes

for bounding box b and G stored in S

© Jörg-Rüdiger Sack Course Notes

the object g (blue)

© Jörg-Rüdiger Sack Course Notes

S(G) := S(G U {g}) add g to G and store it in S

© Jörg-Rüdiger Sack Course Notes

remove object g is g.b = b and

© Jörg-Rüdiger Sack Course Notes

2. For insert, search and delete

© Jörg-Rüdiger Sack Course Notes

Problem: Queries of type Rangequery

© Jörg-Rüdiger Sack Course Notes

Game: design data structures with

© Jörg-Rüdiger Sack Course Notes

access time: DRAM (dynamic random access memory) chips for

© Jörg-Rüdiger Sack Course Notes

Actually many machines have even larger ratios than that.

Typical numbers are:

ratio disc/memory access time: 104 … 105

© Jörg-Rüdiger Sack Course Notes

Typical size of transfer unit (bits):

ratio disc/memory transfer size: 102 … 103

© Jörg-Rüdiger Sack Course Notes

For many operations, # of disc accesses is the dominating

© Jörg-Rüdiger Sack Course Notes

Game: design data structures with

© Jörg-Rüdiger Sack Course Notes

A block contains many data, these should be useful for the

© Jörg-Rüdiger Sack Course Notes

© Jörg-Rüdiger Sack Course Notes

We will discuss data organizations who are dependent on the

© Jörg-Rüdiger Sack Course Notes

•searches are comparative:

excellent search structures also for statistical queries including

© Jörg-Rüdiger Sack Course Notes

© Jörg-Rüdiger Sack Course Notes

these assign an address of a storage cell to any key value x

© Jörg-Rüdiger Sack Course Notes