2001
ESRI USER CONFERENCE
PreConference Seminar
SPATIAL ANALYSIS and GIS
Michael F. Goodchild
National Center for Geographic Information and Analysis
University of California
Santa Barbara, CA 93106
805 893 8049 (phone)
805 893 3146 (FAX)
805 893 8224 (NCGIA)
good@geog.ucsb.edu
July 8, 2001
Schedule
Four sessions:
Sunday July 8:
8:30am  10:00am
10:30am  12:00pm
Lunch
1:30pm  3.00pm
3:30pm  5:00pm
Instructor profile
Michael F. Goodchild is Professor of Geography at the University
of California, Santa Barbara; Chair of the Executive Committee, National
Center for Geographic Information and Analysis (NCGIA); Associate Director
of the Alexandria Digital Library Project; and Director of NCGIA’s Center
for Spatially Integrated Social Science. He received his BA degree from
Cambridge University in Physics in 1965 and his PhD in Geography from
McMaster University in 1969. After 19 years at the University of Western
Ontario, including three years as Chair, he moved to Santa Barbara in
1988. He was Director of NCGIA from 1991 to 1997. He has been awarded
honorary doctorates by Laval University (1999) and the University of Keele
(2001). In 1990 he was given the Canadian Association of Geographers Award
for Scholarly Distinction, in 1996 the Association of American Geographers
award for Outstanding Scholarship, and in 1999 the Canadian Cartographic
Association’s Award of Distinction for Exceptional Contributions to Cartography;
he has won the American Society of Photogrammetry and Remote Sensing Intergraph
Award and twice won the Horwood Critique Prize of the Urban and Regional
Information Systems Association. He was Editor of Geographical Analysis
between 1987 and 1990, and serves on the editorial boards of ten other
journals and book series. In 2000 he was appointed Editor of the Methods,
Models, and Geographic Information Sciences section of the Annals of the
Association of American Geographers. His major publications include Geographical
Information Systems: Principles and Applications (1991); Environmental
Modeling with GIS (1993); Accuracy of Spatial Databases (1989);
GIS and Environmental Modeling: Progress and Research Issues (1996);
Scale in Remote Sensing and GIS (1997); Interoperating Geographic
Information Systems (1999); Geographical Information Systems: Principles,
Techniques, Management and Applications (1999); and Geographic
Information Systems and Science (2001); in addition he is author of
some 300 scientific papers. He was Chair of the National Research Council’s
Mapping Science Committee from 1997 to 1999. His current research interests
center on geographic information science, spatial analysis, the future
of the library, and uncertainty in geographic data.
For a complete CV see the NCGIA web site www.ncgia.ucsb.edu
under Personnel
Other related web sites: UCSB Geography www.geog.ucsb.edu,
Alexandria Digital Library www.alexandria.ucsb.edu
TABLE OF CONTENTS
Outline:
1. What is Spatial Analysis?
Basic
GIS data models
GIS
function descriptions
2. Spatial Statistics
Spatial
interpolation
3. Spatial Interaction Models
4. Spatial Dependence
5. Spatial Decision Support
Spatial
search
Districting
What is Spatial Analysis?
GIS is designed to support a range of different kinds of
analysis of geographic information: techniques to examine and explore
data from a geographic perspective, to develop and test models, and to
present data in ways that lead to greater insight and understanding. All
of these techniques fall under the general umbrella of "spatial analysis".
The statistical packages like SAS, SPSS, S, or Systat allow the user to
analyze numerical data using statistical techniques—GIS packages like
ArcInfo give access to a powerful array of methods of spatial analysis.
Purpose of the Course
The course will introduce participants with some knowledge
of GIS to the capabilities of spatial analysis. Each of the five major
sections will cover a major application area and review the techniques
available, as well as some of the more fundamental issues encountered
in doing spatial analysis with a GIS.
Outline
Section 1  What is spatial analysis?  Basic GIS
concepts for spatial analysis  GIS functionality  Integrating GIS and
spatial analysis  Issues of error and uncertainty:
 Definition of spatial analysis, major types and areas
for application.
 How should an analyst view a spatial database? Fields
and discrete objects, attributes, relationships
 How to organize the functions of a GIS into a coherent
scheme.
 Levels of integration of GIS and spatial analysis 
loose and tight coupling, and full integration. Scripts and macros,
lineage and analytical toolboxes.
 The uncertainty problem  why is it such an issue in
spatial analysis? What can we do now about data quality?
Section 2  Spatial statistics  Simple measures for
exploring geographic information  The value of the spatial perspective
on data  Intuition and where it fails  Applications in crime analysis,
emergencies, incidence of disease:
 Measures of spatial form  centrality, dispersion,
shape.
 Spatial interpolation  intelligent spatial guesswork
 spatial outliers.
 Exploratory spatial analysis  moving windows, linking
spatial and other perspectives.
 Hypothesis tests  randomness, the null hypothesis,
and how intuition can be misleading.
Section 3  Spatial interaction models  What
they are and where they're used  Calibration and "whatif"  Trade area
analysis and market penetration:
 The Huff model and variations.
 Site modeling for retail applications  regression,
analog, spatial interaction.
 Modeling the impact of changes in a retail system.
 Calibrating spatial interaction models in a GIS environment.
Section 4  Spatial dependence  Looking at causes
and effects in a geographical context:
 Spatial autocorrelation  what is it, how to measure
it with a GIS.
 The independence assumption and what it means for modeling
spatial data.
 Applying models that incorporate spatial dependence
 tools and applications.
Section 5  Site selection  Locational analysis and
location/allocation  Other forms of operations research in spatial analysis
 Spatial decision support systems  Linking spatial analysis with GIS to
support spatial decisionmaking:
 Shortest path, traveling salesman, traffic assignment.
 What is location/allocation, and where can it be applied?
 Modeling the process of retail site selection. Criteria.
 Electoral districting and sales territories.
 What is an SDSS? What are its component parts? How
does it compare to a GIS or a DSS? Why would you want one? Building
SDSS.
 Examples of SDSS use  site selection, districting.
SECTION 1
WHAT IS SPATIAL ANALYSIS?
Section 1  What is spatial analysis?  Basic GIS concepts
for spatial analysis  GIS functionality  Integrating GIS and spatial
analysis  Issues of error and uncertainty:
 Definition of spatial analysis, major types and areas
for application.
 How should an analyst view a spatial database? Objects,
layers, relationships, attributes, object pairs, data models.
 How to organize the functions of a GIS into a coherent
scheme.
 Levels of integration of GIS and spatial analysis 
loose and tight coupling, and full integration. Scripts and macros,
lineage and analytical toolboxes.
 The uncertainty problem  why is it such an issue in
spatial analysis? What can we do now about data quality?
What is spatial analysis?
A set of techniques for analyzing spatial data
used to gain insight as well as to test models
ranging from inductive to deductive
finding new theories as well as testing old ones
can be highly technical, mathematical
 can also be very simple and intuitive
Definitions
"A set of techniques whose results are dependent on the
locations of the objects being analyzed"
move the objects, and the results change
e.g. move the people, and the US Center of Population
moves
e.g. move the people, and average income does not
change
most statistical techniques are invariant under changes
of location
compare the techniques in SAS, SPSS, Systat etc.
"A set of techniques requiring access both to the locations
of objects and also to their attributes"
requires methods for describing locations (i.e. a GIS)
some techniques do not look at attributes
mapping is a form of spatial analysis?
Is spatial analysis the ultimate objective of GIS?
Some books on spatial analysis:
 Anselin L (1988) Spatial Econometrics: Methods and
Models. Kluwer
 Bailey T C, Gatrell A C (1995) Interactive Spatial
Data Analysis. Harlow: Longman Scientific & Technical
 Berry B J L, Marble D F (1968) Spatial Analysis:
A Reader in Statistical Geography. PrenticeHall
 Boots B N, Getis A (1988) Point Pattern Analysis.
Sage
 Burrough P A, McDonnell R A (1998) Principles of
Geographical Information Systems. New York: Oxford University Press
 Cliff A D, Ord J K (1973) Spatial Autocorrelation.
Pion
 Cliff A D, Ord J K (1981) Spatial Processes: Models
and Applications. Pion
 Fischer M, Scholten H J, Unwin D J, editors (1996)
Spatial Analytical Perspectives on GIS. London: Taylor &
Francis
 Fotheringham A S, O'Kelly M E (1989) Spatial Interaction
Models: Formulations and Applications. Kluwer
 Fotheringham A S, Rogerson P A (1994) Spatial Analysis
and GIS. Taylor and Francis
 Fotheringham A S, Wegener M (2000) Spatial Models
and GIS: New Potential and New Models. London: Taylor and Francis
 Fotheringham A S, Brundson C, Charlton M (2000) Quantitative
Geography: Perspectives on Spatial Data Analysis. London: SAGE
 Getis A, Boots B N (1978) Models of Spatial Processes:
An Approach to the Study of Point, Line and Area Patterns. Cambridge
University Press
 Ghosh A, Imgene C A (1991) Spatial Analysis in Marketing:
Theory, Methods, and Applications. JAI Press
 Ghosh A, Rushton G (1987) Spatial Analysis and LocationAllocation
Models. Van Nostrand Reinhold
 Goodchild M F (1986) Spatial Autocorrelation.
CATMOG 47, GeoBooks
 Griffith D A (1987) Spatial Autocorrelation: A Primer.
Association of American Geographers
 Griffith D A (1988) Advanced Spatial Statistics.
Special Topics in the Exploration of Quantitative Spatial Data Series.
Kluwer
 Haggett P, Chorley R J (1970) Network Analysis in
Geography. St Martin's Press
 Haggett P, Cliff A D, Frey A (1977) Locational Methods.
Wiley
 Haggett P, Cliff A D, Frey A (1978) Locational Models.
Wiley
 Haining R P (1990) Spatial Data Analysis in the
Social and Environmental Sciences. Cambridge University Press
 Harries K (1999) Mapping Crime: Principle and Practice.
Washington, DC: Crime Mapping Research Center, Department of Justice
 Haynes K E, Fotheringham A S (1984) Gravity and
Spatial Interaction Models. Sage
 Hodder I, Orton C (1979) Spatial Analysis in Archaeology.
Cambridge: Cambridge University Press
 Leung Y (1988) Spatial Analysis and Planning under
Imprecision. Amsterdam: North Holland
 Longley P A, Batty M, editors (1996) Spatial Analysis:
Modelling in a GIS Environment. Cambridge: GeoInformation International
 Mitchell,
A (1999) The ESRI Guide to GIS Analysis, Volume 1: Geographic Patterns
and Relationships. ESRI Press
 Odland J (1988) Spatial Autocorrelation. Sage
 Raskin R G (1994) Spatial Analysis on the Sphere:
A Review. Santa Barbara, CA: National Center for Geographic Information
and Analysis
 Ripley B D (1981) Spatial Statistics. Wiley
 Ripley B D (1988) Statistical Inference for Spatial
Processes. Cambridge University Press
 Taylor P J (1977) Quantitative Methods in Geography:
An Introduction to Spatial Analysis. Houghton Mifflin
 Unwin D (1981) Introductory Spatial Analysis.
Methuen
 Upton G J G, Fingleton B (1985) Spatial Data Analysis
by Example. Wiley
Geographic Information Systems and Science
Paul Longley,
Mike Goodchild, David Maguire, and David Rhind
Wiley, 2001
Some background slides:
Landsat image of New
York area
Indianapolis database
Snow map of Soho, 1854
the pump
Openshaw GAM map of NE England
Atlantic Monthly mystery map
Northridge earthquake epicenters
Environmental justice in LA
World map
England and Wales demography
South Wales demography
Vandenberg service station
Service station subsurface
Service station plume
How does an analyst/modeler/decisionmaker work with a GIS?
What tools exist for helping/conceptualizing/problemsolving?
Assumption: these (analysis, modeling, decisionmaking)
are the primary purposes of GIS technology.
The cost of input to a GIS is high, and can only
be justified by the benefits of analysis/modeling/decisionmaking performed
with the data.
60 polygons per hour = $1 per polygon
estimates as high as $40 per polygon
500,000 polygon database costs $500,000 to create using
the low estimate
$20m using the high estimate
What types of analysis can justify these costs?
 Query (if it is faster than manual lookup)
 very repetitive
 highly trained user
 Analyses which are simple in nature but difficult to
execute manually
 overlay (topological)
 map measurement, particularly area
 buffer zone generation
 Analyses which can take advantage of GIS capabilities
for data integration
 Browsing/plotting independently of map boundaries and
with zoom/scalechange
 seamless database
 need for automatic generalization
 editing
 Complex modeling/analysis (based on the above and extensions)
The list of possibilities is endless
 List of generic GIS functions has 75 entries
ESRI's ARC/INFO has over 1000 commands/functions
How can we organize/conceptualize the possibilities?
 A taxonomy/classification of GIS functions
 A customized view of a spatial database designed for
the needs of the analyst/modeler
 A set of tools to support analysis and database manipulation
 Associated tools for defining needs in the analysis/modeling
area, and testing systems against those needs
 Methods for dealing with problems associated with analysis/modeling
of spatial databases, particularly error/inaccurac
A geographical data model consists of the set of
entities and relationships used to create a represention of the geographical
world. The choices made when the world is modeled determine how the database
is structured, and what kinds of analysis can be done with it. These choices
occur when the data are captured in the field, recorded, mapped, digitized,
and processed.
There are two distinct ways of conceiving of the geographical
world.
In the field view, the world is conceived as a finite
set of variables, each having a single value at every point on the Earth's
surface (or every point in a threedimensional space; or a fourdimensional
space if time is included).
Examples of fields: elevation, temperature, soil type, vegetation
cover type, land ownership
Some fieldlike phenomena: elevation,
spectral response
To be represented digitally, a field must be constructed out
of primitive one, two, three, or fourdimensional objects. There are six
ways of representing fields in common use in GIS:
Other methods can be found in environmental modeling, but
not commonly in GIS.
finite element methods
splines
The field view underlies the following ESRI implementation
models:
coverage
TIN
grid
but not shapefiles
in the Arc8 Geodatabase the distinction can be
implemented in object behaviors
In the discrete object view an otherwise empty space
is littered with objects, each of which has a series of attributes. Any
point in space (two, three, or four dimensional) can lie in any number
of discrete objects, including zero, and objects can therefore overlap,
and need not exhaust the space.
objects can be counted
how many mountains are
there in Scotland?
what's a mountain?
objects can be manipulated
they maintain integrity as they move
objects are homogeneous
the whole thing is the object
parts can inherit properties from the whole
Field and discrete object views can be implemented in either
raster or vector forms
compare manipulation of shapefiles
(objects) and coverages (fields)
the distinction concerns how the world is conceived,
and the rules governing object behavior
a field can be represented as raster cells, points (e.g.,
spot heights), triangles (TIN), lines (contours), or areas (land ownership)
in many of these cases the primitive elements
are not real (cannot be located on the ground), but are artifacts of
the representation
If we ignore the field/discrete object distinction we may
easily apply meaningless forms of analysis
buffer makes sense only for discrete objects
interpolation
makes sense only for fields
Attributes can be of several types:
numeric
alphanumeric
quantitative
qualitative
nominal
ordinal
interval/ratio
cyclic
Spatial objects are distinguished by their dimensions or
topological properties:
points (0cells)
lines (1cells)
areas (2cells)
volumes (3cells)
A class of objects is a set with the same topological
properties (e.g. all points) and with the same set of attributes (e.g.
a set of wells or quarter sections or roads). In the Arc8 Geodatabase
a class also has the same behaviors, and may inherit behaviors from other
classes. A class is associated with an attribute table.
Geodatabase introduces a consistent set of terms
for primitive geometric objects
When a class represents a field, certain rules apply to
the component objects. The objects belonging to one class of area or volume
objects will fill the area and will not overlap (they are spaceexhausting,
they partition or tesselate the space, they are planar
enforced).
the layer
provides one value at every point (recall the definition of a field)
 e.g. soil type
 e.g. elevation
 e.g. zoning
Slide: Planar enforcement
Spatial objects are abstractions of reality. Some objects
are welldefined (e.g. road, bridge) but others are not. Objects representing
a discrete entity view tend to be welldefined; objects representing a
field are not.
 A TIN or DEM is an approximation to a topographic surface,
with an accuracy which is usually undetermined. Even if accuracy is
known at the sampled points, it is unknown between them.
 We assume that all of the points within an area object
have the attributes ascribed to the object. In reality the area inside
the object is not homogeneous, and the boundaries are zones of transition
rather than sharp discontinuities (e.g. soil maps, climatic zones, geological
maps).
A topographic surface can be represented as either a TIN or
a DEM.
Slides: Elevation model options
digital
elevation model (raster)
digitized
contours
triangular mesh
TIN
Advantages of TIN:
 sampling intensity can adapt to local variability
 many landforms are approximated well by triangular
mosaics
 triangles can be rendered quickly by graphics processors
Advantages of DEM:
 uniform sampling intensity is suited to automatic data
collection via e.g. analytical stereoplotter
 many applications require uniformsized spatial objects.
A spatial database consists of a number of classes of spatial
objects with associated attribute tables.
The methods used to store the attribute and locational
information about the objects are not of immediate concern to the analyst/modeler.
In fact this object/attribute view of the database
may have little in common with the actual data structures/models used
by the system designer.
A database encodes and represents the complex relationships
which exist between objects.
 spatial relationships
 functional relationships
A GIS must be capable of computing these relationships
through such geometrical operations as intersection.
Spatial relationships include:
 Relationships between objects of different classes
 Relationships between objects of the same class
The potential set of relationships within a complex spatial
database is enormous. No system can afford to compute and store all of
them in the database.
A cartographic data structure stores no spatial
relationships among objects.
Since it must compute any relationship as and when
needed it is inefficient for complex spatial analyses.
A topological data structure stores certain spatial
relationships among objects. Common stored relationships are:
 ID of incident links stored as attributes of nodes
in line networks
UML relationship types
association
a functional linkage between objects in different
classes
aggregation and composition
linkage between an object and its component
objects
type inheritance
classes inherit properties from more general
classes
Relations between objects
An object pair is a combination of objects of the
same or different types/classes which may have its own attributes.
e.g. the hydrologic relationship between a spring and
a sink may have attributes (direction, volume of flow, flow through
time) but may not exist as a spatial object itself.
The ability to generate object pairs, give them attributes
and include them in analysis is an important component of a full GIS.
giving attributes to associations
Examples of object pairs:
 Matrix of distances between pairs of objects
 Traffic flows between origin/destination pairs
Object pairs in ESRI products
turntable (linklink pairs)
distance matrix (first object, second object, distance)
association class in UML
attributed relationship class in Geodatabase
Visio example
Example: Data Model for Traffic Routing
What are the essential components of a data model for route
planning in a complex street network?
 Stop signs  are attributes of link/node
object pairs.
Visio example
Data modeling examples
1. Design a database to capture and analyze data on recreational
fishing in the Scottish Highlands, to support decisionmaking by the tourist
industry and regulatory agencies. The database should be able to represent
the following:
 locations of fishing (rivers, lakes)
 locations of accommodation (hotels, guest houses)
 preferences and rights (fishing locations owned by
hotels, locations accessible to hotels)
2. Design a database to support analysis and modeling of shoreline
erosion on the Great Lakes. It is necessary to represent conditions and
processes transverse to the shoreline in much more detail than variation
parallel to the shoreline.
3. Design a database to support water resource analysis
and planning for complex hydrographic networks that include streams, rivers,
lakes and reservoirs.
GEOGRAPHIC INFORMATION SYSTEM FUNCTION DESCRIPTIONS
A. BASIC SYSTEM CAPABILITIES
A1 Digitizing (di)
Digitizing is the process of converting point and line
data from source documents to a machinereadable format.
A2 Edgematching (ed)
Edgematching is the process of joining lines and polygons
across map boundaries in creation of a "seamless" database.† The join
should be topological as well as graphic, that is, a polygon so joined
should become a single polygon in the data base, a line so joined should
become a single line segment.
A3 Polygonization (po)
Polygonizing is the process of connecting together arcs
("spaghetti") to form polygons.
A4 Labelling (la)
This process transfers labels describing the contents (attributes)
of polygons, and the characteristics of lines and points, to the digital
system.† This input of labels must not be confused with the process of
symbolizing and labelling output described below.
A5 Reformatting digital data for input from other systems
(rf)
Data previously digitized are made accessible through an
interface or converted by software to the system format, and made to be
topologically useful as well as graphically compatible.
A6 Reformatting for output to other systems (ro)
This function is the inverse of the previous one. Internal
data is reformatted to meet the requirements of other systems or standards.
A7 Data base creation and management (db)
Data is typically digitized from mapsheets, and may be
edgematched. The creation of a true "seamless" database requires the establishment
of a map sheet directory, and may include tiling to partition the database.
A8 Raster/vector conversion (rv)
The ability to convert data between vector and raster forms
with grid cell size, position and orientation selected by the user.
A9 Edit and display on input (ei)
This function allows continuous display and editing of
input data, usually in conjunction with digitizing.
A10 Edit and display on output (eo)
The ability to preview and edit displays before creation
of hard copy maps.
A11 Symbolizing (sy)
To create high quality output from a GIS, it is necessary
to be able to generate a wide variety of symbols to replace the primitive
point, line and area objects stored in the database.
A12 Plotting (pl)
Creation of hard copy map output.
A13 Updating (up)
Updating of the digital data base with new points, lines,
polygons and attributes.
A14 Browsing (br)
Browse is used to search the data base to answer simple
locational queries, and includes pan and zoom.
B. DATA MANIPULATION AND ANALYSIS FUNCTIONS
B1 Create lists and reports (cl)
This is the ability to create lists and reports on objects
and their attributes in userdefined formats, and to include totals and
subtotals.
B2 Reclassify attributes (ra)
Reclassification is the change in value of a set of existing
attributes based on a set of user specified rules.
B3 Dissolve lines and merge attributes (dm)
Boundaries between adjacent polygons with identical attributes
are dissolved to form larger polygons.
B4 Line thinning and weeding (lt)
This process is used to reduce the number of points defining
a line or set of lines to a user defined tolerance.
B5 Line smoothing (ls)
Automatically smooth lines to a userdefined tolerance,
creating a new set of points (compare B4).
B6 Complex generalization (cg)
Generalization which may require change in the type of
an object, or relocation in response to cartographic rules.
B7 Windowing (wi)
The ability to clip features in the database to some defined
polygon.
B8 Centroid calculation and sequential numbering (cn)
Calculate a contained, representative point in a polygon
and assign a unique number to the new object.
B9 Spot heights (sh)
Given a digital elevation model, interpolate the height
at any point.
B10 Heights along streams (hs)
Given a digital elevation model and a hydrology net, interpolate
points along streams at fixed increments of height.
B11 Contours (isolines) (ci)
Given a set of regularly or irregularly spaced point values,
interpolate contours at userspecified intervals.
B12 Elevation polygons (ep)
Given a digital elevation model, interpolate contours of
height at userspecified intervals.
B13 Watershed boundaries (wb)
Given a digital elevation model and a hydrology net, interpolate
the position of the watershed between basins.
B14 Scale change (sc)
Perform the operations associated with change of scale,
which may include line thinning and generalization.
B15 Rubber sheet stretching (rs)
The ability to stretch one map image to fit over another,
given common points of known locations.
B16 Distortion elimination (de)
The ability to remove various types of systematic distortion
generated by different input methods.
B17 Projection change (pc)
The ability to transform maps from one map projection to
another.
B18 Generate points (gp)
The ability to generate points and insert them in the database.
B19 Generate lines (gl)
The ability to generate lines and insert them in the database.
B20 Generate polygons (ga)
The ability to generate polygons and insert them in the
database.
B21 Generate circles (gc)
The ability to generate circles defined by center point
and radius.
B22 Generate grid cell nets (gg)
The ability to generate a network of grid cells given a
point of origin, grid cell dimension and orientation.
B23 Generate latitude/longitude nets (gn)
The ability to generate graticules for a variety of map
projections.
B24 Generate corridors (gb)
This process generates corridors of given width around
existing points, lines or areas.
B25 Generate graphs (gr)
Create a graph illustrating attribute data by symbols,
bars or fitted trend line.
B26 Generate viewshed maps (gv)
Given a digital elevation model and the locations of one
or more viewpoints, generate polygons enclosing the area visible from
at least one viewpoint.
B27 Generate perspective views (ge)
From a digital elevation model, generate a threedimensional
block diagram.
B28 Generate cross sections (cs)
Given a digital elevation model, show the crosssection
along a userspecified line.
B29 Search by attribute (sa)
The ability to search the data base for objects with certain
attributes.
B30 Search by region (sr)
The ability to search the data base within any region defined
to the system.
B31 Suppress (su)
The ability to exclude objects by attribute (the converse
of selecting by attribute).
B32 Measure number of items (mi)
The ability to count the number of objects in a class.
B33 Measure distances along straight and convoluted
lines (md)
The ability to measure distances along a prescribed line.
B34 Measure length of perimeter of areas (mp)
The ability to measure the length of the perimeter of a
polygon.
B35 Measure size of areas (ma)
The ability to measure the area of a polygon.
B36 Measure volume (mv)
The ability to compute the volume under a digital representation
of a surface.
B37 Calculate  arithmetic (ca)
The ability to perform arithmetic, algebraic and Boolean
calculations separately and in combination.
B38 Calculate bearings between points (cb)
The ability to calculate the bearing (with respect to True
North) from a given point to another point.
B39 Calculate vertical distance or height (ch)
Given a digital elevation model, calculate the vertical
distance (height) between two points.
B40 Calculate slopes along lines (gradients) (al)
The ability to measure the slope between two points of
known height and location or to calculate the gradient between any two
points along a convoluted line which contains two or more points of known
elevation.
B41 Calculate slopes of areas (sl)
Given a digital elevation model and the boundary of a specified
region (e.g., a part of a watershed), calculate the average slope of the
region.
B42 Calculate aspect of areas (aa)
Given a digital elevation model and the boundary of a specified
region, calculate the average aspect of the region.
B43 Calculate angles and distances along linear features
(ad)
Given a prescribed linear feature, generalize its shape
into a set of angles and distances from a start point, at userset angular
increments, and constrained to any known points along the linear feature.
B44 Subdivide area according to a set of rules (sb)
Given the corner points of a rectangular area, topologically
subdivide the area into four quarters.
B45 Locations from traverses (lo)
Given a direction (one of eight radial directions) and
distance from a given point, calculate the end point of the traverse.
B46 Statistical functions (sf)
The ability to carry out simple statistical analyses and
tests on the database.
B47 Graphic overlay (go)
The ability to superimpose graphically one map on another
and display the result on a screen or on a plot.
B48 Point in polygon (pp)
The ability to superimpose a set of points on a set of
polygons and determine which polygon (if any) contains each point.
B49 Line on polygon overlay (lp)
The ability to superimpose a set of lines on a set of polygons,
breaking the lines at intersections with polygon boundaries.
B50 Polygon overlay (op)
The ability to overlay digitally one set of polygons on
another and form a topological intersection of the two, concatenating
the attributes.
B51 Sliver polygon removal (sp)
The ability to delete automatically the small sliver polygons
which result from a polygon overlay operation when certain polygon lines
on the two maps represent different versions of the same physical line.
B52 Line of sight (ln)
The ability to determine the intervisibility of two points,
or to determine those parts of pairs of lines or polygons which are intervisible.
B53 Nearest neighbor search (nn)
The ability to identify points, lines or polygons that
are nearest to points, lines or polygons specified by location or attribute.
B54 Shortest route (ps)
The ability to determine the shortest or minimum cost route
between two points or specified sets of points.
B55 Contiguity analysis (co)
The ability to identify areas that have a common boundary
or node.
B56 Connectivity analysis (cy)
The ability to identify areas or points that are (or are
not) connected to other areas or points by linear features.
B57 Complex correlation (cx)
The ability to compare maps representing different time
periods, extracting differences or computing indices of change.
B58 Weighted modelling (wm)
The ability to assign weighting factors to individual data
sets according to a set of rules and to overlay those data sets and carry
out reclassify, dissolve and merge operations on the resulting concatenated
data set.
B59 Scene generation (sg)
The ability to simulate an image of the appearance of an
area from map data. The image would normally consist of an oblique view,
with perspective.
B60 Network analysis (na)
Simple forms of network analysis are covered in Shortest
route and Connectivity. More complex analyses are frequently carried out
on network data by electrical and gas utilities, communications companies
etc. These include the simulation of flows in complex networks, load balancing
in electrical distribution, traffic analysis, and computation of pressure
loss in gas pipes. In many cases these capabilities can be found in existing
packages which can be interfaced to the GIS database.
Other groupings of GIS functions:
Berry, J.K., 1987, "Fundamental operations in computerassisted
map analysis". International Journal of GIS 1 11936.
 Measuring distance and connectivity
 Characterizing neighborhoods
Goodchild, M.F., 1988, "Towards an enumeration and classification
of GIS functions". Proceedings, IGIS '87
Tomlin, Dana, 1990. Geographic Information Systems and
Cartographic Modeling. Prentice Hall.
based on a standard, semiformal taxonomy of analytic functions
for raster data
 Focal: operations that process a single cell
 Local: operations that process a cell and a fixed neighborhood
 Zonal: operations that process an area of homogeneous
characteristics
 Global: operations that process the entire map
Maguire, David, 1991. Chapter 21: The Functionality of GIS.
In D.J. Maguire, M.F. Goodchild and D.W. Rhind, editors, Geographical
Information Systems: Principles and Applications. Longman, London.
A Sixway Classification of Spatial Analysis
1. Query and reasoning
based on database views
catalog
map
table
histogram
scatterplot
linked views
2. Measurement
simple geometric measurements associated with
objects
area, distance, length, perimeter, shape
3. Transformation
buffers
point in polygon
polygon overlay
interpolation
density estimation
4. Descriptive summaries
centers
dispersion
spatial dependence
fragmentation
5. Optimization
best routes
raster version
network version
Paul's ride
best locations
6. Hypothesis testing
inference from sample to population
Integration of GIS and Spatial Analysis
1. Full integration (embedding)
 spatial analysis as GIS commands
 requires modification of source code
 difficult with proprietary packages
 analysis is not the strongest commercial motivation
 third party macros, scripting languages
2. Loose coupling
 unsatisfactory
 hooks too awkward
 loss of higher structures in data
 transfer of simple tables
3. Close coupling
 discretization problem
 discretization often not explicit in models
 e.g. slope, length
 user interface design
 models easy to use?
 the userfriendly grand piano
 user community is already frustrated
SECTION 2
SPATIAL STATISTICS
Section 2  Spatial statistics  Simple measures
for exploring geographic information  The value of the spatial perspective
on data  Intuition and where it fails  Applications in crime analysis,
emergencies, incidence of disease:
 Measures of spatial form  centrality, dispersion,
shape.
 Spatial interpolation  intelligent spatial guesswork
 spatial outliers.
 Exploratory spatial analysis  moving windows, linking
spatial and other perspectives.
 Hypothesis tests  randomness, the null hypothesis,
and how intuition can be misleading.
Measures of spatial form:
How to sum up a geographical distribution in a simple measure?
Two concepts of space are relevant:
Continuous:
 travel can occur anywhere
 best for small scales, or where a network is too
complex or too costly to capture or represent
an infinite number of locations exist
a means must exist to calculate distances between
any pair of locations, e.g. using straight lines
Discrete:
 travel can occur only on a network
 only certain locations (on the network) are feasible
 all distances (between all possible pairs of locations)
can be evaluated using any measure (travel time, cost of transportation
etc.)
In discrete space places are identified as objects; in continuous
space, places are identified by coordinates
A metric is a means of measuring distance between
pairs of places (in continuous space)
 e.g. straight lines (the Pythagorean metric)
e.g. by moves in NS and EW directions (the Manhattan
or cityblock metric)
simple metrics can be improved using barriers or routes
of lower travel cost (freeways)
The most useful single measure of a geographical distribution
of objects is its center
Definitions of center:
The centroid
 computed by taking a weighted average of coordinates
 the point about which the distribution would balance
 the basis for the US Center of Population (now in MO
and still moving west)
The centroid is not the point for which half of the distribution
is to the left, half to the right, half above and half below
 this is the bivariate median
The centroid is not the point that minimizes aggregate distance
(if the objects were people and they all traveled to the centroid, the total
distance traveled would be minimum)
 this is the point of minimum aggregate travel
(MAT), sometimes called the median (very confusingly)
 for many years the US Bureau of the Census calculated
the Center of Population as the centroid, but gave the MAT definition
 there is a long history of confusion over the MAT
 no ready means exist to calculate its location
 the MAT must be found by an iterative process
 an interesting way of finding the MAT makes use of
a valid physical analogy to the resolution of forces  the Varignon
frame
 on a network, the MAT is always at a node (junction
or point where there is weight)
The definition of centrality becomes more difficult
on the sphere
e.g. the centroid is below the surface
the centroid of the Canadian population in 1981 was
about 90km below White River, Ontario
the bivariate median (defined by latitude and longitude)
was at the intersection of the meridian passing through Toronto and
the parallel through Montreal, near Burke's Falls, Ontario
the MAT point (assuming travel on the surface by
great circle paths) was in a school yard in Richmond Hill, Ontario
What use are centers?
 for tracking change in geographic distributions, e.g.
the march of the US Center of Population westward is still worth national
news coverage
 for identifying most efficient locations for activities
 location at the MAT minimizes travel
 a central facility should be located to minimize
travel to the geographic distribution that it serves
 should we use continuous or discrete space?
 this technique was considered so important to central
planning in the Soviet Union in the early 20th century that an entire
laboratory was founded
 the Mendeleev Centrographic Laboratory flourished
in Leningrad around 1925
 centers are often used as simplifications of complex
objects
 at the lower levels of census geography in many countries
 e.g. ED in US, EA in Canada, ED in UK
 to avoid the expense of digitizing boundaries
 e.g. land parcel databases
 or where boundaries are unknown or undefined
 e.g. ZIPs
 in the census examples, common practice is to eyeball
a centroid
 some very effective algorithms have been developed
for redistributing population from centroids
Measures of dispersion:
 what you would want to know if you could have two measures
of a geographical distribution
 the spread of the distribution around its center
 average distance from the center
 measures of dispersion are used to indicate positional
accuracy
 the error ellipse
 the Tissot indicatrix
 the CMAS
Potential measures:
 a measure which increases with the weight of geographic
objects and with proximity to them
 calculated as:
V = summation of (w(i)/d(i))
where i is an object, d is the distance to the object and
w is its weight
the summation can be carried out at any location
V can be mapped  a "potential" map
Potential is a useful measure of:
 the market share obtainable by locating at a point
 the best location is the place of maximum potential
 population pressure on a recreation facility
 accessibility to a geographic distribution
 e.g. a network of facilities
 potential measures omit the "alternatives" factor
 imply that market share can potentially increase
without limit
 potential measures have been used as predictors of
growth
 economic growth most likely in areas of highest potential
 potential calculation exists as a function in SPANS
GIS
 the objects used to calculate potential must be discrete
in an empty space
 adding new objects will increase potential without
limit
 it makes no sense to calculate potential for a set
of points sampled from a field
 potential makes sense only in the object view
Potential measures and density estimation
think of a scatter of points representing people
how to map the density of people?
replace each dot by a pile of sand, superimposing the
piles
the amount of sand at any point represents
the number and proximity of people
the shape of the pile of sand is called the kernel
function
example  density estimation
and Chicago crime
Measures of shape:
 shape has many dimensions, no single measure can capture
them all
 many measures of shape try to capture the difference
between compact and distended
 many of these are based on a comparison of the shape's
perimeter with that of a circle of the same area
 e.g. shape = perimeter / (3.54 * sqrt(area))
 this measure is 1.0 for a circle, larger for a distended
shape
 all of these measures based on perimeter suffer from
the same problem
 within a GIS, lines and boundaries are represented
as straight line segments between points
 this will almost always result in a length that is
shorter than the real length of the curve, unless the real shape is
polygonal
 consequently the measure of shape will be too low,
by an undetermined amount
 shape (compactness) is a useful measure to detect
gerrymanders in political districting
Spatial Interpolation
Spatial interpolation is defined as a process of
determining the characteristics of objects from those of nearby objects
 of guessing the value of a field at locations where
value has not been measured
The objects are most often points (sample observations) but
may be lines or areas
The attributes are most often intervalscaled (elevations)
but may be of any type
From a GIS perspective, spatial interpolation is a process
of creating one class of objects from another class
Spatial interpolation is often embedded in other processes,
and is often used as part of a display process
e.g. to contour a surface from a set of sample points,
it is necessary to use a method of spatial interpolation to determine
where to place the contours among the points
Many methods of spatial interpolation exist:
Distanceweighted interpolation
Known values exist at n locations i=1,...,n
The value at a location x_{i} is denoted
by z(x_{i})
We need to guess the value at location x, denoted
by z(x)
The guessed value is an average over the known values at
the sample points
 the average is weighted by distance so that nearby
points have more influence.
Let d(x_{i},x) denote the distance from
location x, where we want to make a guess, to the ith sample point.
Let w[d] denote the weight given to a point at distance
d in calculating the average.
The estimate at x is calculated as:
z(x) = summation over every point i (w[d(x_{i},x)]
z(x_{i})) / summation over every point i (w[d(x_{i},x)])
in other words, the average weighted by distance.
The simplest kind of weight is a switch  a weight of 1
is given to any points within a certain distance of x, and a weight
of 0 to all others
this means in effect that z(x) is calculated
as the average over points within a window of a certain radius.
Better methods include weights which are continuous, decreasing
functions of distance such as an inverse square:
w[d] = d^{2}
All of the distance weighted methods (e.g IDW) share the
same positive features and drawbacks. They are:
 easy to implement and conceptually simple
 adaptable  the weighting function can be changed to
suit the circumstances. It is even possible to optimize the weighting
function in this sense:
Suppose the weighting function has a parameter, such
as the size of the window
Set the window size to some test value
Then select one of the sample points, and use the method
to interpolate at that point by averaging over the remaining n1 sample
values
Compare the interpolated value to the known value at
that point. Repeat for all n points and average the errors
Then the best window size (parameter value) is the
one which minimizes total error
In most cases this will be a nonzero and noninfinite
value.
 all interpolated values must lie between the minimum
and maximum observed values, unless negative weights are used
 This means that it is impossible to extrapolate trends
 If there is no data point exactly at the top of a
hill or the bottom of a pit, the surface will smooth out the feature
 the interpolated surface cannot extrapolate a trend
outside the area covered by the data points  the value at infinity
must be the arithmetic mean of the data points
Although distanceweighted methods underlie many of
the techniques in use, they are far from ideal
Polynomial surfaces
A polynomial function is fitted to the known values  interpolated
values are obtained by evaluating the function
e.g. planar surface  z(x,y) = a + bx + cy
e.g. cubic surface  z(x,y) = a + bx + cy + dx^{2}
+ exy + fy^{2} + gx^{3} + hx^{2}y + ixy^{2}
+ jy^{3}
 useful only when there is reason to expect that the
surface can be described by a simple polynomial in x and y
 very sensitive to boundary effects
Kriging
Most real surfaces are observed to be spatially autocorrelated
 that is, nearby points have values which are more similar than distant
points.
The amount and form of spatial autocorrelation can be described
by a variogram, which shows how differences
in values increase with geographical separation
Observed variograms tend to have certain common features
 differences increase with distance up to a certain value known as the
sill, which is reached at a distance known as the range.
To make estimates by Kriging, a variogram is obtained from
the observed values or past experience
 Interpolated bestestimate values are then calculated
based on the characteristics of the variogram.
 perhaps the most satisfactory method of interpolation
from a statistical viewpoint
 difficult to execute with large data sets
 decisions must be made by the user, requiring either
experience or a "cookbook" approach
 a major advantage of Kriging is its ability to output
a measure of uncertainty of each estimate
 This can be used to guide sampling programs by identifying
the location where an additional sample would maximally decrease uncertainty,
or its converse, the sample which is most readily dropped.
Locallydefined functions
Some of the most satisfactory methods use a mosaic approach
in which the surface is locally defined by a polynomial function, and
the functions are arranged to fit together in some way
With a TIN data structure it is possible to describe the
surface within each triangle by a plane
 Planes automatically fit along the edges of each triangle,
but slopes are not continuous across edges
 This can be arranged arbitrarily by smoothing the appearance
of contours drawn to represent the surface
 Alternatively a higherorder polynomial can be used
which is continuous in slopes.
Another popular method fits a plane at each data point, then
achieves a smooth surface by averaging planes at each interpolation point
Hypothesis tests:
 compare patterns against the outcomes expected from
welldefined processes
 if the fit is good, one may conclude that the process
that formed the observed pattern was like the one tested
 unfortunately, there will likely be other processes
that might have formed the same observed pattern
 in such cases, it is reasonable to ignore them as
long as a) they are no simpler than the hypothesized process, and
b) the hypothesized process makes conceptual sense
 the best known examples concern the processes that
can give rise to certain patterns of points
 attempts to extend these to other types of objects
have not been as successful
Point pattern analysis
 a commonly used standard is the random or Poisson process
 in this process, points are equally likely to occur
anywhere, and are located independently, i.e. one point's location
does not affect another's
 CSR = complete spatial randomness
 a real pattern of points can be compared to this process
 most often, the comparison is made using the average
distance between a point and its nearest neighbor
 in a random pattern (a pattern of points generated
by the Poisson process) this distance is expected to be 1/(2 * sqrt(density))
where density is the number of points per unit area, and area is measured
in units consistent with the measurement of distance
 when the number of points is limited, we would expect
to come close to this estimate in a random pattern
 theory gives the limits within which average distance
is expected to lie in 95% of cases
 if the actual average distance falls outside these
limits, we conclude that the pattern was not generated randomly
There are two major options for nonrandom patterns:
 the pattern is clustered
 points are closer together than they should be
 the presence of one point has made other points more
likely in the immediate vicinity
 some sort of attractive or contagious process is
inferred
 the pattern is uniformly spaced
 points are further apart than they should be
 the presence of one point has made other points less
likely in the vicinity
 some sort of repulsive process is inferred, or some
sort of competition for space
Unfortunately it is easy for this process of inference to
come unstuck
 the process that generated the pattern may be nonrandom,
but not sufficiently so to be detectable by this test
 this false conclusion is more likely reached when
there is little data  the more data we have, the more likely we are
to detect differences from a simple random process
 in statistics, this is known as a Type II error 
accepting the null hypothesis when in fact it is false
 the process may be nonrandom, but not in either of
the senses identified above  contagious or repulsive
 points may be located independently, but with nonuniform
density, so that points are not equally likely everywhere
 it is possible to hypothesize more complex processes,
but the test becomes progressively weaker at confirming them
SECTION 3
SPATIAL INTERACTION MODELS
Section 3  Spatial interaction models  What they
are and where they're used  Calibration and "whatif"  Trade area analysis
and market penetration:
 The Huff model and variations.
 Site modeling for retail applications  regression,
analog, spatial interaction.
 Modeling the impact of changes in a retail system.
 Calibrating spatial interaction models in a GIS environment.
What is a spatial interaction model?
 a model used to explain, understand, predict the level
of interaction between different geographic locations
 examples of interactions:
 migration (number of migrants between pairs of states)
 phone traffic (number of calls between pairs of cities)
 commuting (number of vehicles from home to workplace)
 shopping (number of trips from home to store)
 recreation (number of campers from home to campsite)
 trade (amount of goods between pairs of countries)
 interaction is always expressed as a number or quantity
per unit of time
 interaction occurs between defined origin and destination
 these may be the same or different classes of objects
 e.g. the same class in the case of migration between
states
 e.g. different classes in the case of journeys to
shop or work
 the matrix of interactions can be square or rectangular
Interaction is believed to be dependent on:
 some measure of the origin (its propensity to generate
interaction)
 some measure of the destination (its propensity to
attract interaction)
 some measure of the trip (its propensity to deter interaction)
 these measures are assumed to multiply
Let:
i denote an origin object (often an area)
j denote a destination object (a point or area)
I^{*}_{ij} denote the observed interaction
between i and j, measured in appropriate units (e.g. numbers of trips,
flow of goods, per defined interval of time)
I_{ij} denote the interaction predicted by
the spatial interaction model
 if the model is good (fits well), the predicted
interactions per interval of time will be close in value to the
observed interactions
 each I_{ij} will be close to its corresponding
I^{*}_{ij}
E_{i} denote the emissivity of the origin area
i
A_{j} denote the attraction of the destination
area j
C_{ij} denote the deterrence of the trip between
i and j (probably some measure of the trip length or cost)
a a constant to be determined
Then the most general form of spatial interaction model is:
I_{ij} = a E_{i} A_{j} C_{ij}
 that is, interaction can be predicted from the product
of a constant, emissivity, attraction and deterrence
The model began life in the mid 19th century as an attempt
to apply laws of gravitation to human communities  the gravity model
 such ideas of social physics have long since
gone out of fashion, but the name is still sometimes used
 even in the form above, the model bears some relationship
to Newton's Law of Gravitation
In any application of the model, some aspects are assumed
to be unknown, and determined by calibration
 e.g. the value of a might be unknown in a given application
 its value would be calibrated by finding the value
that gives the best fit between the observed interactions and the interactions
predicted by the model
 the conventional measure of fit is the total squared
difference between observation and prediction, that is, the summation
over i and j of (I_{ij}  I*_{ij})^{2}
 this is known as least squares calibration
 other unknowns might be the method of calculating deterrence
(C_{ij}) from distance, or the attraction value to give to certain
retail stores
Measurement of the variables:
C_{ij}
 deterrence is often strongly related to distance
 the further the distance, the less interaction and
thus the lower C_{ij}
 a common choice is a decreasing function of distance:
C_{ij} = d_{ij}^{b}
(C_{ij} = 1 / d_{ij}^{b})
or C_{ij} = exp (bd_{ij})
(exp denotes 2.71828 to the power)
 generally the fit of the model is not sufficiently
good to distinguish between these two, that is, to identify which gives
the better fit
 the negative exponential has a minor technical advantage
in not creating problems when d_{ij} = 0 (origin and destination
are the same place)
 the b parameter is unknown and must be calibrated
 its value depends on the type of interaction, and
also probably on the region
 b has units in the negative exponential case (1/distance)
but none in the negative power case
 other measures of deterrence include:
 some function of transport cost
 some function of actual travel time
 in either case the function used is likely to be
the negative power or negative exponential above
 there are examples where distance has a positive
effect on interaction
E_{i}
 how to measure the propensity of each origin to emit
interaction?
 some more appropriate measure weighting each cohort,
e.g. age and sex cohorts
 some cohorts are more likely to interact than others
 E_{i} could be treated as unknown and calibrated
A_{j}
 the propensity of each destination to attract interaction
 could be unknown and calibrated
 for shopping models, gross floor area of retail space
is often used
 some forms of interaction are symmetrical
 flow from origin to destination equals reverse flow
 e.g. phone calls
 requires E_{i} and A_{j} to be the
same, e.g. population
The Huff model
what happens when a new destination is added?
interactions with existing destinations are unaffected
assumes outflow from origins can increase without
limit
in practice, in many applications flow from origin
to existing destinations will be diverted
we need some form of "production constraint"
Huff proposed this change:
 summing interaction to all destinations from a given
origin:
 that is, total interaction from an origin will always
equal E_{i} regardless of the number and locations of destinations
 flow will now be partially diverted from existing
destinations to new ones
 E_{i} is now the total outflow, can be set
equal to the total of observed outflows from origin i
 the Huff model is consistent with the axiom of Independence
of Irrelevant Alternatives (IIA)
 the ratio of flows to two destinations from a given
origin is independent of the existence and locations of other destinations
Because of its production constraint, the Huff model is very
popular in retail analysis
it is often desirable to predict how much business
a new store will draw from existing ones
e.g. how much will a new mall draw business away
from downtown?
Other "what if" questions:
 population of a neighborhood increases by x%
 ethnic mix of a neighborhood changes
 a new bridge is constructed
 an earthquake takes a freeway out of operation
 an anchor store moves out
 a store changes its signage
Site modeling for retail applications
three major areas:
use of the spatial interaction model
analog techniques
regression models
Analog:
 the business done by a new store or an old store operating
under changed circumstances is best estimated by finding the closest
analog in the chain
 criteria include:
 physical characteristics of each store
 intangibles such as management, signage
 local market area
 a GIS can help compare market areas (local densities,
street layouts, traffic patterns)
 a multimedia GIS can help with the intangibles
 bring up images of site, layout, signage...
Regression:
 identify all of the factors affecting sales, and construct
a model to predict based on these factors
 an enormous range of factors can affect sales
 some factors are exogenous
 determined by external, physical, measurable variables
 some of these travel with the store if it moves (site
factors), others are attributes of place (situation factors)
 other factors are endogenous
 determined by crowding, types of customers, trends,
advertizing
 unpredictable, determined by the state of the system
Exogenous factors:
 site layout  on a corner? parking spaces, etc.
 trade area  number of households in primary, etc
 characteristics of neighborhood
Example model:
Sales per 2week period for convenience store:
$12749
+ 4542 if gas pumps on site
+ 3172 if major shopping center in vicinity
+ 3990 if side street traffic is transient
+ 3188 per curb cut on side street
+ 2974 if store stands alone
 1722 per lane on main street
 use of surrogate variables
 problems in use of model for prediction in planning
Calibration of the spatial interaction model
 many different circumstances
 major issues involved in calibration
 specific tools are available
 SIMODEL
 possible to use standard tools in e.g. SAS, GLIM
 calibration possible using aggregate flows or individual
choices
Linearization:
transformations to make the right hand side of the
equation a linear combination of unknowns, the left hand side known
Linearization of the unconstrained model:
 suppose the E_{i} are known, the A_{j}
unknown
 the constant a can be absorbed into the A_{j}
(i.e. find aA_{j})
 suppose we use the negative power deterrence function
I_{ij} = E_{i} A_{j}
/ d_{ij}^{b}
 move the E_{i} to the left:
I_{ij}/E_{i} = A_{j} / d_{ij}^{b}
take the logs of both sides:
log (I_{ij}/E_{i}) = log A_{j}
 b log d_{ij}
 now a trick  introduce a set of dummy variables
u_{ijk}, set to 1 if j=k, otherwise zero:
log (I_{ij}/E_{i}) = u_{ij1}
log A_{1} + u_{ij2} log A_{2} + ...  b log
d_{ij}
 now the left hand side is all knowns, the right hand
side is a linear combination of unknowns (the logs of the As and b)
 the model can now be calibrated (the unknowns can be
determined) using ordinary multiple regression in a package like SAS
 it may be easier to avoid linearizing altogether by
using the nonlinear regression facilities in many packages
The objective function:
 normally, we would try to maximize the fit of the observed
and predicted interactions
 linearization changes this
 e.g. we minimize the squared differences between
observed and predicted values of log (I_{ij}/E_{i})
if ordinary regression is used on the linearized form above
 this is easy in practice, but makes no sense
 intuitively, an error of 30 in a prediction of 1000
trips is much more acceptable than an error of 30 in a prediction of
10 trips
 these ideas are formalized in the technique of Poisson
regression, which assumes that I_{ij} is a count of events,
and sets up the objective function accordingly
 the function minimized to get a good fit is roughly
the difference between observed and predicted, squared, divided by
the predicted flow
SECTION 4
SPATIAL DEPENDENCE
Section 4  Spatial dependence  Looking at causes
and effects in a geographical context:
 Spatial autocorrelation  what is it, how to measure
it with a GIS.
 The independence assumption and what it means for modeling
spatial data.
 Applying models that incorporate spatial dependence
 tools and applications.
Two concepts:
Spatial dependence
 what happens at one place depends on events in nearby
places
 all things are related but nearby things are more related
than distant things (Tobler's first law of geography)
 positive spatial dependence:
 nearby things are more alike than things are in general
 negative spatial dependence:
 nearby things are less alike than things are in general
 conceptual problems with negative spatial dependence
 e.g. the chessboard
 spatial autocorrelation measures spatial dependence
 an index, rather than a parameter of a process
 dependence between discrete objects, or dependence
in a continuous field?
 a world without positive spatial dependence would be
an impossible world
 impossible to map
 impossible to describe, live in
 hell is a place with no spatial dependence
Geary index:
compares the squared differences in value between neighboring
objects with overall variance in values
Moran index:
 calculates the product of values in neighboring objects
 related to Geary but not in a simple algebraic sense
Calculation of the Geary index of spatial autocorrelation
a is the mean of x values
w_{ij} = 1 if i,j adjacent, else 0
c is 1 if neighbors vary as much as the sample as a whole
c < 1 if neighbors are more similar than the sample
as a whole (positive dependence)
c > 1 if neighbors are less similar (negative dependence)
c = 3 x 16 / (2 x 10 x 2) = 48 / 40 = 1.2
 i.e. neighboring values are slightly more similar than
one would expect if the values were randomly allocated to the four areas
Continuous space
see the discussion of variograms and Kriging
the term geostatistics is normally associated
with continuous space, spatial statistics more with discrete
space
Measures of spatial dependence can be calculated in GIS:
 Idrisi calculates autocorrelation over a raster
 code has been written to calculate autocorrelation
in ARC/INFO (see NCGIA Technical Paper 915)
More extensive codes have been written using the statistical
packages, e.g. MINITAB, SAS
 contact Dan Griffith, Syracuse University; Luc Anselin,
University of Illinois
 some of these fail to take advantage of GIS capabilities,
for generating input data and displaying output
Spatial heterogeneity:
 suppose there is a relationship between number of AIDS
cases and number of people living in an area
 the form of this relationship will vary spatially
 in some areas the number of cases per capita will
be higher than in others
 we could map the constant of proportionality
 spatial heterogeneity describes this geographic variation
in the constants or parameters of relationships
 when it is present, the outcome of an analysis depends
on the area over which the analysis is made
 often this area is arbitrarily determined by a map
boundary or political jurisdiction
Geographically weighted regression (GWR)
fits a model such as y = a + bx
but assumes that the values of a and
b will vary geographically
determines a and b at any point by weighting
observations inversely by distance from that point
diagram
Geographical brushing:
 a userdefined window is moved over the map
 analysis occurs only within the window
Conventional analysis (analysis done aspatially, e.g. using
a statistical package) assumes independence (no spatial dependence) and
homogeneity (no spatial heterogeneity)
 e.g. regression analysis assumes that the observations
(cases) are statistically independent
 this violates the first law of geography
 in general, analysis in space is very different from
conventional statistical analysis (although this is very often carried
out on spatial data)
An example:
 the relationship between land devoted to growing corn
and rainfall in a Midwestern state like Kansas
 rainfall available at 50 weather stations
 percent of land growing corn available for 100 counties
 use a method of spatial interpolation to estimate rainfall
in each county from the weather station data
 plot one variable against the other, and perhaps fit
a regression equation
 how many data points are there?
 the more data points, the more significant the results
 100 (the number of counties)?
 50 (the real number of weather observations)?
 something in between?
 more data points can be invented by intensifying the
sample network using spatial interpolation, but no more real data has
been created by doing so
 both variables are strongly spatially autocorrelated,
violating an assumption of regression
 the significance of the analysis is now uncertain
 methods of spatial regression try to overcome this
problem in a systematic way
 see Spacestat
An example
Crime and income in Los Angeles
rate of car thefts (per sq km per year)
median annual income in thousands
per census tract
5,000 observations
b = increase in car thefts per sq km per thousand
dollars median income
= 0.22
R^{2} = percentage of variation in car thefts
explained by income
= 0.26
is this significant?
is it significant at the 95% level of confidence?
in a population of millions of census tracts, exhibiting
the same range of rates of car thefts and median incomes, but no relationship
between them (b = 0, R^{2} = 0), could a sample of
5,000 census tracts have exhibited the same
degree of apparent relationship, or more, purely
by chance?
but, but, but...we don't have a random sample of a larger
population
there are only 5,000 tracts in LA and we have
all there is
A related issue  the MAUP
 many statistics are reported by averaging or summing
over polygons  e.g. populations of counties, average elevation
 it is commonly necessary to interpolate such values
to new polygons which do not coincide
 e.g. from census tracts with known populations to
school districts
 source zones have known populations
 populations of target zones are unknown
 the best method of solving this problem is to create
a continuous surface from the source data, then to integrate this surface
to the new target areas
Various assumptions can be made about the underlying surface:
 density is constant within source zones
 density is constant within target zones
 density is constant within some third set of control
zones
 density varies smoothly (Tobler's Pycnophylactic interpolation)
Analysis carried out on modifiable units can produce frightening
results
 two variables  % over 65, and % Republican
 correlation for the counties was .3466
Results of analysis using some alternative reporting zones:
6 Republicanproposed congressional districts .4823
6 Democratproposed congressional districts .6274
6 existing congressional districts .2651
6 urban/rural regional types .8624
6 functional regions .7128
By regrouping the counties into larger regions, Openshaw and
Taylor were able to generate a vast range of outcomes of the analysis:
 e.g. 48 regions  correlations between .548 and +.886
 e.g. 12 regions  correlations between .936 and +.996
What to do?
 are we asking the right question?
 is scale part of the question rather than a mere
matter of implementation?
SECTION 5
SPATIAL DECISION SUPPORT
Section 5  Site selection  Locational analysis
and location/allocation  Other forms of operations research in spatial
analysis  Spatial decision support systems  Linking spatial analysis
with GIS to support spatial decisionmaking:
 Shortest path, traveling salesman, traffic assignment.
 What is location/allocation, and where can it be applied?
 Modeling the process of retail site selection. Criteria.
 Electoral districting and sales territories.
 What is an SDSS? What are its component parts? How
does it compare to a GIS or a DSS? Why would you want one? Building
SDSS.
 Examples of SDSS use  site selection, districting.
Methods of analysis on networks
A spatial database can be used to support the solution
of a variety of network problems, including optimal location, routing
and vehicle scheduling
these include:
Routing:
Shortest path problem
Traveling salesman problem and variants
Transshipment problem
Hitchcock transportation problem
Traffic assignment problem
Location:
Pmedian problem
Coverage problems
Minimax location problems
Plant location problem
 many of these are implemented in current GIS, e.g.
network extensions in ArcGIS, TransCAD and GIS*PLUS from Caliper Corp.
 additional code interfaced with GIS has been developed
 solution of many of these problems raises a number
of issues of data modeling
 some of these have been raised earlier in the example
of modeling a street network for shortest path analysis
Example: Brine disposal in the Petrolia, Ontario oil field
 oil extraction from the field generates large quantities
of waste fluid
 there are 14 active producers in the field, each operating
a single extraction facility
 the only effective method of disposal is by pumping
to a formation below the oil producing layer
 options include:
 a single, central disposal facility
 requiring each producer to install a facility
 some intermediate configuration of shared facilities
One disposal well per producer:
One central facility:
The locationallocation problem:
find locations for one or more central facilities and
allocate producers to them in order to minimize the total of capital
and transport costs
Two alternatives for transport of waste brine to central facilities:
pipe and truck.
Pipe cost:
A installed cost per metre
D_{0} distance in metres
B pipe life in years
C pump cost per year
Truck cost:
C2 = E D V_{0} / (365 F) + Q V_{0} (H +
D_{0}/(1000P)) / G
E holding period, days
D holding capacity, m^{3}
V_{0} volume of brine, m^{3} per year
F life of holding capacity, years
Q truck cost, $/hour
H time to load and unload truck
P speed of truck in km/hour
G truck load, m^{3}
Disposal well cost:
Slide: Petrolia
area
Slide: Transport
cost functions
GIS implementation:
Network of streets and rights of way  potential
routes for trucks/pipes
Links with attributes of length
Nodes with attributes of volume produced  producer
sites plus other potential well locations
GIS database with nodes and links and associated
attributes:
 data input functions (editing)
 data display  graphics, plots
 storage of geographic data
 provides data to the analysis module
Analysis module interacting with GIS database
 obtains nodes and links from the GIS
 performs analysis, reports results directly to the
user
 includes several heuristic methods for solving the
optimization problem
 allows the user access to the display/analysis functions
of the GIS
An analysis module supported by a GIS database provides a
spatial decision support system (SDSS) tailored to specific,
advanced forms of spatial analysis
Locationallocation analysis module:
1. Finds shortest paths between points on network (could
be a GIS function)
2. Define and modify model parameters
3. Use paths and parameters to calculate transport costs
4. Search for optimum solution using add, drop and swap
heuristics
5. Evaluate solutions and print results
Option 
Number

Facility cost

Transport cost

$/m^{3} brine

$/m^{3} oil

All producers 
14

165,000

0

1.32

26.42

Central by truck 
2

45,000

395,827

3.53

70.59

Central any nodes 
2

60,000

79,619

1.12

22.36

Central any producers 
2

60,000

80,658

1.13

22.52

Existing disposal wells 
2

30,000

92,031

0.98

19.54

Parameter 
Value

% pipe

% truck

Optimum sites

Cost $000s

Pipe cost A 
30

74

26

4,8

80.7


60

53

47

2,4,7,9

76.3


15

87

13

4,8

56.6

Pipe life B 
10

74

26

4,8

80.7


8

67

33

2,4,7

73.0


6

62

38

2,4,7,9

69.4


4

47

53

2,4,7,9

86.0

Pump cost C 
2000

74

26

4,8

80.7


1000

77

23

2,4,7

52.8


500

77

23

2,4,7

46.8

Well cost R 
60,000

74

26

4,8

80.7


100,000

74

26

4,8

80.7


40,000

74

26

2,4,7,9

54.6

Life of well S 
4

74

26

4,8

80.7


8

74

26

2,4,7,9

54.6

Brine ratio U 
25

74

26

4,8

80.7


30

82

18

2,4,7

69.0


40

90

10

2,4,7,9

59.8


60

96

4

2,4,7

70.1

Other examples of complex GISbased analysis:
Vehicle routing and scheduling
Traffic modeling
Corridor location for pipelines/powerlines/highways
Runoff modeling based on DEM
Load balancing in electrical networks
Spatial search
Boolean search
Search through an attribute table to find objects satisfying
a set of criteria
Example:
Forest stands  area object type, nonoverlapping
Attributes: area (reserved)
species
age
For each stand, compare species and age to desired criteria.
Dissolve and merge boundaries between neighboring stands
if both fit the criteria
Use tables to obtain estimated yield for given species/age
and area
Generate a map showing merged groups of cuttable stands,
with new IDs, plus a table showing yield for each group.
Topological overlay
Two or more coverages can be overlayed to obtain new object
types with concatenated attributes. This allows Boolean search and related
operations to be conducted on multiple object types, i.e. with more information
available.
Example:
Add soil moisture information, from a separate coverage,
to the criteria used to identify cuttable stands.
Buffer zone generation
A buffer zone allows Boolean searches to include criteria
based on distance
Example:
A stand is cuttable only if it is not less than 200m
from the nearest stream/lake
In many cases it is not possible to reduce all criteria to
simple yes/no requirements.
e.g. from those stands satisfying criteria 1 and 2, select
that stand which minimizes total cost (sum of criteria 3, 4 and 5)
When all nonconditional criteria are commensurate (dollars)
they can be summed.
In many cases criteria are not commensurate and cannot
be summed.
Example
1. Timber extraction/hauling costs  direct $ costs
2. Environmental cost of extraction  intangible
3. Road construction cost  $, but longterm benefits
Decision Theory provides methods for determining:
Single Utility Functions (SUFs) for each criterion
Multiple Utility Functions (MUFs) to combine criteria.
Both SUFs and MUFs can be determined by experimental designs
involving groups of decisionmakers
Decision theoretic methods can be incorporated into GIS
technology. The GIS is used to evaluate the criteria for each alternative,
then to weigh them using SUFs and MUFs to arrive at a decision.
A model for spatial analysis with a GIS
Example of multistage GIS analysis
Generation of a Recreation Opportunity Spectrum (ROS) map
for a National Forest 1:24,000 quad (7.5 minute)
Problem: generate zones and associated ROS classes for
Forest Service land based on distance from transportation features, with
urban exclusions.
Data needed:
D1: Roads and railways (1:24,000)  line objects
D2: Forest Service ownership map (1:24,000)  area objects
D3: City and town boundaries map (1:24,000)  area objects
GIS functions:
Reclassify attributes (B2)
Dissolve and merge (B3)
Generate corridors (B24)
Topological overlay (B50)
Measure size of areas (B35)
Centroid calculation and sequential numbering (B8)
Plot (A12)
Create list and report (B1)
Steps to make product:
1. Using the forest service ownership data, reclassify
area objects as forest land / not forest land. (B2)
2. Dissolve boundaries between polygons with the same
value of the forest land / not forest land attribute, and merge polygons
(B3)
3. Using the transportation map, generate corridors 0.5
miles wide around all roads and railways. (B24)
4. Using the transportation map, generate corridors 1.0
miles wide around all roads and railways. (B24)
5. Topologically overlay the results of 2, 3 and 4 and
concatenate the attributes, to obtain polygons with the following attributes:
forest land / not forest land
within/outside 0.5 mile corridor
within/outside 1.0 mile corridor (B50)
6. Topologically overlay the urban boundary map, and concatenate
attributes, adding urban/nonurban to the list in 5. (B50)
7. Reclassify the area objects resulting from 6 according
to the following rules:
Class
Criteria
Null
not forest land
RMU
forest land and urban
SPM
forest land, nonurban and within 0.5 miles of road/rail
SPN
forest land, nonurban, outside 0.5 mile and inside 1.0 mile corridors
P
forest land, nonurban, outside both 0.5 mile and 1.0 mile corridors
(B2)
8. Dissolve and merge adjacent polygons with the same class
(B3)
9. Measure areas of polygons resulting from 8 (B35)
10. Reclassify polygons of class SPM according to the
following rules:
Class
Criteria
SPM
Areas of less than 2500 acres
RN
Areas of more than 2500 acres (B2)
11. Calculate centroids and sequentially number polygons
(B8)
12. Plot classified polygons with classes and numbers
assigned in 11, plus roads and railways and urban areas (A12)
13. Create a list of all polygons, with IDs, areas and
classes. (B1)
Summary sequence of operations:
Initial data sets: D1, D2, D3
1. B2 on D2 > E1
2. B3 on E1 > E2
3. B24 on D1 > E3
4. B24 on D1 > E4
5. B50 on E2, E3, E4 > E5
6. B50 on E5, D3 > E6
7. B2 on E6 > E7
8. B3 on E7 > E8
9. B35 on E8 > E9
10. B2 on E9 > E10
11. B8 on E10 > E11
12. A12 on E11, D1, D3
13. B1 on E11
Many GIS applications require complex decision rules in
reclassification operations.
e.g. finding the most cuttable stand of timber:
Criterion
1. Area
of stand > 100 acres (B35)
2. More
than 100m from stream/lake (B24)
3. Subrules
based on slope, aspect and soil mechanics determine method of timber
extraction.
4. Analysis
of existing roads and terrain leads to estimates of costs of constructing
new
roads and hauling timber to mill
5. Subrules
based on costs of replanting, silviculture
Districting
 GIS technology useful in designing sales areas, analyzing
trade areas of stores
 similar applications occur in politics
 design of voting districts (apportionment, gerrymandering)
has enormous impact on outcome of elections
 major interest in reapportionment after 1990 census
 GIS applications in these areas are still at early
stage
Characteristics of application area
 scale:
 street centerline, census reporting zones  i.e.
1:24,000 and smaller
 data at block group/enumeration district scale (250
households) is needed for locating smaller commercial operations like
gas stations and convenience stores
 data at census tract scale (2,000 households) is
good for the location of larger facilities like supermarkets and fast
food outlets
 data sources:
 much reliance on existing sources of digital data
 especially TIGER and DIME
 similar data available in other countries
 additional data added to standard datasets by vendors
 e.g. updating TIGER files by digitizing new roads,
correcting errors
 e.g. adding ZIP code boundaries, locations of existing
retailers
 functionality:
 dissolve and merge operations, e.g. to build voting
districts out of small building blocks
 modeling, e.g. to predict consumer choices, future
population growth
 overlay operations, e.g. to estimate populations
of userdefined districts, correlate ZIP codes with census zones
 point in polygon operations, e.g. to identify census
zone containing customer's residence
 mapping, particularly choropleth and point maps of
consumers
 geocoding, address matching
 data quality:
 more concern with accuracy of statistics, e.g. population
counts, than accuracy of locations
Types of applications
 districting
 designing districts for sales territories, voting
 objective is to group areas so that they have a given
set of characteristics
 "geographical spreadsheets" allow interactive grouping
and analysis of characteristics
 e.g. Geospreadsheet program from GDT
 site selection
 evaluating potential locations summarizing demographic
characteristics in the vicinity
 e.g. tabulating populations within 1 km rings
 searching for locations that meet a threshold set
of criteria
 e.g. a minimum number of people in the appropriate
age group are within trading distance
 market penetration analysis
 analyzing customer profiles by identifying characteristics
of neighborhoods within which customers live
 targeting
 identifying areas with appropriate demographic characteristics
for marketing, political campaigns
Organizations
 many data vendors and consulting companies active in
the field, many large retailers
 no organization unique to the field
 American Demographics is influential magazine
Districting example
 GIS has applications in design of electoral districts,
sales territories, school districts
 each area of application has its own objectives, goals
 this example looks at designing school districts
Background
 the Catholic school system of London, Ontario, Canada
provides elementary schools for Kindergarten through Grade 8 to a city
of approx. 250,000
 about 25% of school children attend the Catholic
system
 27 elementary schools were open prior to the study
 population data is available for polling subdivisions
from taxation records
 approx. 700 polling subdivisions have average population
of 350 each
 forecasts of school age populations are available for
5, 10, 15 years from the base year at the polling subdivision level
 children are bussed to school if their home location
is more than 2 miles away, or if the walking route to school involves
significant traffic hazard
Objectives
 minimal changes to the existing system of school districts
 minimal distances between home and school, and minimal
need for bussing
 longterm stability in school district boundaries
 preservation of the concepts of community and parish
 if possible a school should serve an identifiable community, or be
associated with a parish church
 maintenance of a viable minimal enrollment level in
each school, defined as 75% of school capacity and > 200 enrollment
Technical requirements
 digitized boundaries of the polling subdivision "building
blocks"
 an attribute file of building blocks giving current
and forecast enrollment data
 for forecasting, we must include developable tracts
of land outside the current city limits, plus potential "infill" sites
within the limits
 748 polygons
 development tracts are isolated areas outside the
contiguous polling subdivisions
 infill sites are shown as points
 the ability to merge building blocks and dissolve boundaries
to create school districts
 school districts are not required to be conterminous
 if necessary a school can serve several unconnected
subdistricts
 a table indicating whether walking or bussing is required
for each buildingblock/school combination
Slide: City
and development areas
Current districts
"starbursts" show allocations of building blocks to
29 current schools (includes two special education centers)
note bussed areas in NW and SW  separate enclaves
of recent highdensity housing allocated to distant schools
this strategy allows an expanding city to deal with
dropping school populations in the core leading
to an excess of capacity
rising school populations in the periphery but
lack of funds for new school construction
without constantly adjusting boundaries
Slide: Current districts
Projections of enrollment based on current school districts
 rapid increase in developing areas, e.g. St Joseph's
(#3), St Thomas More (#4) NW
 decrease in maturing areas of periphery, e.g. St Jude's
(#8)  SW area
 rejuvenation in some innercity schools due to infilling,
e.g. St Martin's (#15)  lower center
 stagnation in other innercity schools, e.g. St Mary's
(#17), decline e.g. St John's (#14)  center
Redistricting
 general strategy  begin with current allocations,
shift building blocks between districts in order to satisfy objectives
 requires interaction between graphic display and tabular
output
 quick response to "what if this block is reassigned
to the school over here?"
 implementation allowed School Board members to make
changes during meetings, observe results immediately
 using map on digitizer tablet, tables on adjacent
screen
Proposals
 one of the alternative plans developed
 note:
 assumes closure of 6 schools
 rise in enrollment as percent of capacity
 stability of projections through time
 reduction in number of "nonviable" schools (<200
enrollment)
 increase in percent not assigned to nearest school
 increase in average distance traveled
Slide: Projected
enrollments
Slide: Planned
enrollments
