You are on page 1of 12

GIS

representation
A BEHIND-THE-SCENES LOOK AT THE
CONSTRAINTS AND POSSIBILITIES OF GIS MAPPING
Morgan Ostrander
PHIL 595
April 28, 2014
I
ntroduction
Te prevalence of Google and other web technologies has made maps accessible to anyone with a data
connection. We are all map users, but for many people, the details of creating maps are not well-known.
Tis project describes some considerations that must be taken into account when designing and construct-
ing a map. Te examples I show are maps created in the framework of Geographic Information Systems
(GIS). GIS is a widely used term that can be defned as a science, a discipline, and a technology, but it
broadly refers to manipulating geographic data with computer software. GIS not only provides ways to
visualize existing data, but has a whole suite of analysis tools that can run statistics and build models to
generate new data.
1
Tis report has three major sections. Te frst, Finding your
purpose, describes how maps are selective and can convey
only a limited set of information according to a specifc
purpose. Te second section, Choosing your representa-
tion, explains how some datasets are better represented
by certain visualizations than others, and shows some
examples of how to visualize complex data. Creating
new data, the fnal section, discusses the challenges of
generating accurate new data, especially in multistage
processes where error can be introduced at many diferent
points.
Tis report was completed as a fnal project for a philosophy of
geography course, and the various authors I cite in this report
were studied earlier in the course. I have connected the opinions
of these philosophers and geographers to practical problems in
GIS to show the relevance of critical approaches to map creation
and analysis.
Te fgures and maps in this report tend to show data
from the Anhui province, China, because this data was
used in a recent GIS project. Tis report can be read in
conjunction with that project to give readers a better idea of
the processes and considerations that underlie GIS analysis.
(NOTE: Although the term data is plural, I have chosen to
use it in the colloquial singular sense, e.g., the data shows
instead of the data show. Tis was chosen because the style of
this report is more informal and when I use data, I am usually
referring to a single collection of geographic information, rather
than many individual points of information.)
Te Anhui province, one of
Chinas central provinces.
Hefei
F
inding your
purpose
2
MAPS CONVEY ONLY
A LIMITED SET OF INFORMATION
it is necessary to know how
a map will be used.[and]
what the map is for.
-Hopkin & Taylor (Board, 1981)
2
Tink of media reporting in an election. Maps are everywhere, dividing the country into
party colours by voting region. Tese coloured maps showing voter preference are an exam-
ple of choropleth maps thematic maps representing certain classes of information by
colouring or shading areas to assign them a certain value. Often tied to statistical data,
thematic maps convey information that cant be directly experienced by someone out in the
feld;
1
population density, average income, or crime rates are common quantities
represented by choropleth maps.
Although choropleth maps are very useful, they underscore a
limitation of all maps: maps can convey only a limited set of
information. Choropleth maps, like other map types, must
be selective in the information they choose to represent.
However, the bright colour schemes
and defnitive boundaries can give a
misleading impression of a defnitive
representation. Tere are multiple
ways of representing the same data in a
choropleth map, and with neat lines
and beautiful colours, it is easy to forget
that beautiful maps may or may not be
scientifcally accurate.
3
Hopkin and Taylor
emphasize that maps are limited by their pur-
pose: In designing a map it is necessary to know
how a map will be used. In assessing a map it is
equally necessary to know what the map is for. It
cannot be judged in isolation.
4
Fig. 1: A choropleth map representing counties
in the Anhui province, China, by their
population density (people/km
2
). Darker
colours symbolize higher population density.
Te choropleth map on the previous page
(Fig. 1) shows counties shaded by their popu-
lation density. Te visual appearance of a
choropleth map is dependent on how the
cartographer chooses to break the data into
categories. Te histogram to the right (Fig. 2)
shows where the breaks were set for Fig. 1.
But when the data is grouped diferently, the
choropleth map can show a very diferent
picture.
What if the breaks were set diferently?
Te selective nature of cartographic visualizations like choropleth maps is further clarifed by Helen
Couclelis. Information presented in GIS science is a model for a real-world phenomenon, not the
real phenomenon itself.
6
Maps are a particular lens through which we see a certain interpretation of
the world. Couclelis writes that rather than thinking of representing 'Te' World, cartographers
instead can represent an artifcial world with legitimate representations or models.
7
In this way,
map-makers create accurate maps that serve specifc purposes.
[GIS]...is about constructing useful representations
models of...real-world phenomena,
not about studying the phenomena themselves.
- Couclelis (2009)
5
Fig 3: A choropleth map representing
counties in the Anhui province, China,
by their population density
(people/km2). Tis map uses the same
dataset as Fig. 1, but has grouped the
data into diferent categories to obtain a
diferent visual result.
Although Figs. 1 and 3 are visualizations of the
same dataset, they can leave the viewer with
very diferent impressions. Fig. 1 might cause a
viewer to conclude that population density varies
greatly from province to province, while Fig. 3
suggests that there is a fairly homogeneous distribu-
tion of density throughout the province with the
exception of a few isolated high-density areas.
Fig 2: Histogram of the Fig. 1 choropleth map, showing categorical data breaks.
SELECTIVE REPRESENTATION
3
Fig 4: Histogram of the Fig. 3 choropleth map, showing categorical data breaks.
C
hoosing your
representation
THE NATURE OF THE DATA
DETERMINES THE STYLE OF REPRESENTATION
Visual representations are usually needed to make sense of data. Tables, graphs, and infographics
give us a richer understanding of relationships within the data, especially when a dataset is com-
plex. Cartography provides additional, specialized kinds of representations to add to the toolset
of data visualization techniques. However, some visual representations are better designed to deal
with certain kinds of datasets than others, and the type of representation must be chosen careful-
ly. Laura Perini argues that when defending research, a scientist's success is directly tied to the
style of visualization chosen for their fgures; the visual format directly afects how efectively the
fgures function as evidence.
8
...specifc symbolic features of
these fgures are involved in the
evidential support they provide.
- Perini (2005)
9
FIG. 5: MAGNITUDE
As datasets become more complex,
researchers need to fnd clear ways to
show relationships between multiple
variables. Zachary Irving describes one
research team's approach to visualizing
the structure of a severe storm: a 2D graphic
showing the storm's intertwined structure was the most efective
way of understanding the dynamics of wind currents from the data-
set's 3,000 points.
10
Multidimensional datasets are often best represented by visualization tech-
niques that group data into patterns, because complex datasets tend to have too many data
points to see individual relationships.
Te next few pages explore how diferent styles of representations can present datasets with
increasing complexity.
Fig. 5 is a bar graph showing the
population of major cities in the
Anhui province. Each city has one
representative variable: the magnitude
of its population. Because there is
only one relationship in the data
(between the city name and the popu-
lation count), this dataset is neatly
and simply illustrated with a chart.
Fig. 5: Major cities by population, Anhui Province.
4
0 200000 400000 600000 800000
Populaton
Guichi
Chaohu
Xuanzhou
Liu'an
Chuzhou
Suzhou
Jieshou
Bozhou
Fuyang
Huangshan
Anqing
Tongling
Huaibei
Ma'anshan
Huainan
Bengbu
Wuhu
Hefei
However, if a geographic location is added to
each city, there will be two relationships in
the data. Tis is not easily represented by a
graph. Each city must show both the magni-
tude of its population as well as its geograph-
ic location in the Anhui province. A map
with proportional symbols (Fig. 6) is a viable
solution to this increased data complexity.
Fig. 6: Major cities by population, Anhui Province.
Proportional symbols used to represent city size.
FIG. 6: MAGNITUDE
+ LOCATION
FIG. 8: MAGNITUDE
+ LOCATION
+ DENSITY
But sometimes the dataset is simply too large to
clearly distinguish population values with a tech-
nique like proportional symbols. Fig. 7 shows
15,000 cities across China, where each city is
represented as a black dot. How can we clearly get
a sense of their population values?
Fig. 7: Cities in China
Fig. 8 shows a kernel density map a
visualization technique that calculates the
density of features within a given area to
generate hot spots and cold spots.
11
Fea-
tures can be weighted more or less heavily
based on a variable in this case, the
density of cities was weighted by their
population counts. With this type of
representation, users can easily see group-
ings of population hot spots in a complex
dataset with thousands of points.
Fig. 8: Kernel density map of cities in China, weighted by
population. Warm colours indicate areas of high population
density, while cool colours indicate low population density.
5
CHOOSING REPRESENTATIONS
Hefei
Maanshan
Anqing
Wuhu
Bengbu
Xuanzhou
Tongling
Guichi
Huangshan
Huainan
Fuyang
Liuan
Suzhou
Huaibei
Jieshou
Bozhou
Chaohu
Chuzhou
Perini points out that some kinds of representations are insufcient to explain unfamiliar features.
Although her examples focus on biological features, they can be applied in geographical contexts.
Perini compares two ways of communicating information about cells: providing a linguistic descrip-
tion of the way the cells look, and providing a scan of the micrograph that shows the cells themselves.
She argues that the micrograph scan will provide more specifc information than the linguistic repre-
sentation, because the details of the cell shapes are made available for direct interpretation.
12
What if the representation doesnt provide enough detail?
CHOOSING REPRESENTATIONS
Fig. 9 (above): A vector
map showing the Hefei
city area and Chaohu
Lake.
Fig. 10 (left): Screenshot
of a Google Earth satellite
image showing Hefei city
and Chaohu Lake.
Because geographic representations can never
represent all the information on the ground (as
mentioned in the choropleth map section), the
user's purpose dictates the level of detail
required. Figs. 9 and 10 show two representa-
tions that, like Perini's examples, show the same
area with diferent levels of precision. Fig. 9 has
clean boundaries, but when compared to the
satellite image (Fig. 10), it is obvious that the
boundaries of the lake and the extent of the city
have been fairly gener-
alized. Fig. 9 may be a
good data source at a
provincial or national
scale, when precise
boundary measure-
ments are less import-
ant. However, if a
representation must be
used at a fne scale
(such as when measur-
ing a city's extent), a
high-resolution repre-
sentation like Fig. 10 is
more likely to be used.
Emerging forms of visualization
NEXT STEPS:
More complex tools are evolving to meet the need for visualizing larger and more complex datasets.
3D modeling and technologies like Google Earth provide more realistic ways of communicating data.
However, the realistic nature of Google Earth and other virtual globe systems can lead both scientists
and laypeople to place greater trust in the data than is warranted. Sheppard and Cizek point out that
the ability to look down into your own backyard gives users a very misleading impression of accuracy,
because virtual globes are still in development and do not always have the tools to provide precise
results. Te authors stress that high-resolution imagery must be treated with caution, because enthu-
siasm over the realistic nature of the data can overwhelm valid cognitive responses.
13
6
Hefei
Chaohu
Lake
Chaohu
C
reating
new data
MAP-MAKING INVOLVES COMPLEX PROCESSES
THAT LEAD TO GAINS AND LOSSES OF INFORMATION
Maps don't merely visualize existing data. Tey also show new data, derived with various algo-
rithms and statistical tools. Multiple steps in complex models are often needed to generate new
data. However, these multistep processes give many opportunities for signifcant error to enter the
dataset. Such error not only propagates through each successive step, but can be magnifed as the
data is manipulated.
Tis following section will describes how data is transformed in a multistep GIS analysis. Te case
study will be a multicriteria evaluation (MCE) that was used to identify best potential landfll sites
in Anhui.
Fig. 11: Suitability of Potential Landfll Locations,
Anhui, China. Te output was generated after several
stages of weighting and processing various layers of
data (see Fig. 12 for a fowchart of the process).
An MCE is a type of analysis that
determines optimal locations by
identifying and weighting a
number of criteria. MCEs can be
applied to such diverse examples as
urban planning (fnding an appropri-
ate site for a new facility) and conser-
vation (determining areas of prime
habitat). A number of relevant input
layers are frst identifed; common
examples are land use, administrative
boundaries, and distances from
relevant features. Tese layers,
representing criteria for the site
selection, are then weighted to
determine their importance. Tis
ranking allows certain factors to
have a stronger infuence in determin-
ing the fnal sites.
Fig. 11 shows the end product of this
MCE, which identifed the best (green)
and worst (red) landfll sites.
7
Bruno Latour argues that knowledge of the real world is built on what he calls inscriptions.
Inscriptions are abstracted representations of the world: fgures, drawings, and numbers that
simplify and make sense of the complex natural phenomena. But inscriptions aren't static they
exist in cascades.
14
Each time knowledge is spread to other people, the original inscription is
modifed to ft new contexts. Tese cascades, argues Latour, cause a loss of information, because
the inscriptions become simpler and simpler as they are merged with other data to be used for
diferent purposes. Board, too, in his survey of the cartographic literature, notes that a number of
authors have commented on the loss of data through progressive steps in the cartographic pro-
cess.
15
Information is transformed in every stage from the initial data to collection through to the
user's interpretation of the fnal map.
Latour and Board's ideas are highly relevant to current GIS analysis, because like Latour's
cascades of inscriptions, GIS models take an input dataset and transform it in multiple ways to
produce an altered input. Although the analysis doesn't always lead to a simplifcation of the
data (as Latour suggests), there is always some alteration of information along the way.
Background information
Fig. 12: Model of the
MCE process.
Multi-step process
Fig. 12 shows the steps involved in the MCE process. Te data layers used as inputs are represented
by blue ovals. Te yellow boxes indicate a process; the green ovals represent the data outputs. Tere
were two stages where weights were assigned to data in order to specify certain site criteria. Te frst
weighting occurred during the Reclassify stage, where features in each layer were assigned a value
between 1 (unsuitable) and 9 (suitable). Te same process occurred during the Weighted Overlay
stage, where entire layers (the green ovals) were ranked against each other.
1st weighting 2nd weighting
8
CREATING NEW DATA
1. Choices in the frst weighting afect
the quality of the second weighting
Because this MCE was a multistep process, error could be spread and magnifed through the pro-
cess. I will show two points in this MCE where error may have been introduced and magnifed.
In the frst weighting stage, identifed in Fig. 12, the features of each individual layer were weighted
on a scale of 1-9. Te distances to rivers, for example, were broken down into categories to identify
the safest areas for a potential landfll site areas too close to the rivers were given a low weighting,
because the landflls would risk contaminating the river. Distances to roads took in economic
factors: while roads couldn't be too close to the actual landfll site, they still needed to be close
enough to save transportation costs. Each of the layers was independently weighted in this way to
generate a suitability map for each layer. Tese suitability maps were then weighted a second time
and combined to generate the fnal suitability map (Fig. 11).
Error could be introduced during the frst weighting if the criteria for the study area aren't
well-known. Perhaps the researcher completing the weightings is unaware that there is a signifcant
amount of hazardous waste in the landflls, and fails to anticipate a large enough distance from
rivers and lakes. If this is the case, and the weights are assigned inappropriately, then there will be a
preference for unsuitable areas that will afect the fnal suitability map.
Fig. 13 shows the frst stage of weighting (the Reclassify stage). Each layer is considered separate-
ly, and the individual features within each layer are weighted on a scale of 1-9.
Distance
from lakes
> 5 km = 9
2-5 km = 5
> 2 km = 1
Distance from cites
Distance from towns
Land use
Railroads
Hot spot clusters
Distance
from roads
Fig. 13: Assigning weights to individual features in a layer (Reclas-
sify stage). In this fgure, the distance ranges in the Distance from
lakes layer have been weighted. Tis same method will be applied
to each of the layers stacked on the right side.
Distance from rivers
9
CREATING NEW DATA
2%
5%
L
a
k
e
s
R
i
v
e
r
s
D
e
n
s
i
t
y

c
l
u
s
t
e
r
s
T
o
w
n
s
C
i
t
e
s
L
a
n
d

u
s
e
R
o
a
d
s
R
a
i
l
r
o
a
d
s
12% 26% 26% 12%
7% 7%
2. Poor-quality source data leads to
uncertainty throughout the whole process
Te second instance where error may have been introduced was at the onset of the entire process
when the input layers were frst included. For this particular MCE, there was poor-quality source
data for a number of layers. It was very difcult to determine the extent of the city boundaries, and
the river data didn't give any indication of the rivers' breadth (although a literature search discov-
ered that some rivers in the Anhui province can be several kilometres wide). Te poor quality of the
data meant that there was a high degree of uncertainty in the results, since the same degree of error
existed in every step of the process. Fig. 14 shows the fnal weighting of each layer during the
second weighting phase (the percentages on each bar indicate the degree of infuence the layer has,
which is the weighting used in the Weighted Overlay stage). Te layers with poor-quality source
data are blurred to show the degree of uncertainty in the fnal weighting. From the graphic, over
half of the layers have poor-quality data, which causes the fnal output to have a high degree of
uncertainty.
10
CREATING NEW DATA
Fig. 14: Quality of fnal weighted layers.
Blurred layers are poor-quality data sources,
while sharply defned layers have a higher
confdence level.
In this report, I outlined some considerations that geographers need to be aware of when creating a
map or performing GIS analyses. Although I have focused mainly on the limitations of geospatial
methods, GIS should be understood as a powerful set of tools that, given accurate data and appro-
priate analyses, can provide highly efective solutions. As we continue to use digital maps at home,
work, or on the road, we should not only know how to use them, but how to create them as well.
C
onclusion
E
ndnotes
1 Board, C. (1981) Cartographic Communication, Cartographica 18: 42-78, 44.
2 Board, Cartographic Communication, 68 (citing Hopkin & Taylor).
3 Board, Cartographic Communication, 64 (citing Wright).
4 Board, Cartographic Communication, 68 (citing Hopkin & Taylor).
5 Couclelis, H. (2009) Ontology, Epistemology, Teleology: Triangulating Geographic Information
Science, in G. Navratil (ed.) Research Trends in Geographic Information Science , Springer-Verlag, 3-15,
13.
6 Couclelis, Ontology, Epistemology, Teleology, 13.
7 Couclelis, Ontology, Epistemology, Teleology, 6, emphasis removed.
8 Perini, L. (2005) Visual Representations and Confrmation, Philosophy of Science, 72: 913-926, 913.
9 Perini, Visual Representations and Confrmation, 921.
10 Irving, Z. C. (2011) Style, but Substance: An Epistemology of Visual versus Numerical Representation
in Scientifc Practice, Philosophy of Science, 78: 774-787, 777.
11 ESRI (2013). How Kernel Density Works, http://resources.arcgis.com/en/help/main/10.2/index.htm-
l#//009z00000011000000.
12 Perini, Visual Representations and Confrmation, 923.
13 Sheppard, S.R.J. and Cizek, P. (2009) Te ethics of Google Earth: Crossing thresholds from spatial data
to landscape visualisation, Journal of Environmental Management 90: 2102-2117, 2108.
14 Latour, B. (2011) Drawing Tings Together, in M. Dodge, R. Kitchin, and C. Perkins (ed.) Te Map
Reader: Teories of Mapping Practice and Cartographic Representation, Wiley-Blackwell, Oxford, 65-72,
68.
15 Board, Cartographic Communication, 56-57.
11

You might also like