You are on page 1of 67

Data Analysis

John Bing-Canar
FIELDS Group
U.S. EPA, Region 5, Superfund
312.886.6182
bing-canar.john@epa.gov
Outline
Data Analysis topics:
ƒ overview
ƒ data requirements
ƒ data processing
ƒ statistical analysis
ƒ secondary sampling
ƒ cleanup objectives
ƒ software and resources
ƒ case studies
Overview
Stating the obvious:
Overview
Why this topic?
ƒ data, data issues, in-field devices, clean-up
estimation
ƒ available softwares and where to get them
ƒ case studies to highlight concepts
Data Requirements
Basic data needs
ƒ contaminate concentration(s)
ƒ locational data (X and Y)
ƒ true coordinates (e.g., UTM)
ƒ latitutde/longitude
ƒ local coordinates (e.g., northing and easting from one
corner of site)
ƒ date/time collected
ƒ additional data
ƒ elevation (Z)
Data Processing
Exploratory Data Analysis (EDA)
ƒ data error detection
ƒ contaminate, X, Y, Z, date values
ƒ units (ppm vs. ppb)
ƒ data error detection techniques
ƒ descriptive statistics such as maximum/minimum,
graphing (boxplots, histograms)
ƒ negative values, values outside of range (% oxygen > 100)
ƒ mapping (spatial max/min values)
ƒ are locations far from majority of sample locations?
ƒ outlier tests?
ƒ not a good idea; high contamination values usually
indicate a source not an outlier
Data Processing
Exploratory Data Analysis (EDA)
ƒ LOD treatment
ƒ 1/2 LOD?
ƒ Duplicates and splits treatments
ƒ take maximum?
ƒ take average?
ƒ take median?
ƒ Summary
ƒ choose a method and document it (transparency)
Statistical Analysis
Calibration of field methods with lab methods
ƒ Goal: calibration equation, e.g.,
ƒ y = (m)(x) + b
ƒ adjusted value = (slope)(lab value) + intercept
ƒ For example, InnovX XRF
ƒ adjusted lead = (1.54)(lab value) – 14.13
ƒ Natural log adjusted lead = (1.00)(Natural log of lab
value) + 0.37
ƒ if lab value = 100ppm, then adjusted lead value = 149.75
Statistical Analysis

Calibration of field methods with lab methods


ƒ problems with a calibration equation, y = (m)(x) + b
ƒ non-linearity
linear
curvilinear
Statistical Analysis

Calibration of field methods with lab methods


• problems with a calibration equation, y = (m)(x) + b
• “outliers”
Statistical Analysis
Calibration of field methods with lab methods
• problems with a calibration equation, y = (m)(x) + b
• heteroscedasticity of residuals
Statistical Analysis
Calibration of field methods with lab methods
• problems with a calibration equation, y = (m)(x) + b
• heteroscedasticity of residuals
Statistical Analysis
Calibration of field methods with lab methods
ƒ factors to consider when comparing field to lab
methods:
ƒ differences in field equipment
ƒ Niton XLp712 with Americium 241 radioactive source versus
InnovX 4000 with an X-ray source. (“apples to oranges”)
ƒ differences in weather conditions
ƒ weather (dry soil versus wet soil)
ƒ differences in material tested
ƒ slag or ground slag versus “soil”
ƒ residential soil versus cattail marsh (huge differences in
organic matter, i.e., roots) (XRF measures metals in soil not
vegetation)
Statistical Analysis
Calibration of field methods with lab methods
ƒ basic guide to use when using in-field devices:
ƒ send a range of samples values to the lab (especially
want higher values). DO NOT randomly send samples
to the lab, or send every 5th sample to the lab
ƒ try to send at least 20 samples to the lab. (20 lab
values will give the minimum number of values to use
for calibration of field values to lab values.)
ƒ Method 6200 (SW-846) calls for a minimum of 5% of
samples sent to the lab.
Statistical Analysis
Upper confidence limits
ƒ use upper confidence limits
(UCLs) to determine if site
has met cleanup goals
ƒ EPA-funded software:
ProUCL
Statistical Analysis
Upper confidence limits
Secondary Sampling
Secondary or phased sampling
ƒ given interpretation of previous sampling event,
where to continue sampling?
Secondary Sampling
Adaptive fill sampling
ƒ user-specified number of samples in the most poorly
sampled areas

Initial locations
Secondary locations
Kalamazoo River
Secondary Sampling
Radial sampling
ƒ to meet the need to adequately describe spatial
variation over distance

Initial locations
Radial locations

Kalamazoo River
Secondary Sampling
Radial sampling
ƒ to meet the need to adequately describe spatial
variation over distance

Initial locations
Radial locations
Kalamazoo River
Cleanup Objectives
Four components:
ƒ 2D and 3D data
ƒ spatial estimation (interpolation)
ƒ identify cleanup areas
ƒ estimate mass/volume
Cleanup Objectives
2D and 3D data
ƒ 2D data (e.g., mercury levels in sediment surface)
ƒ 3D data (e.g., downhole gamma logging)
Cleanup Objectives
Spatial estimation (interpolation)
ƒ interpolation generates estimates at unsampled
locations

interpolation

Scio Pottery
Cleanup Objectives
Spatial estimation (interpolation)
ƒ why interpolate?
ƒ hot spot delineations
ƒ mass/volume calculations
ƒ determine remediation areas

interpolation
Cleanup Objectives
Identify cleanup areas
ƒ goal:
ƒ average concentration; or
ƒ any area above a specified value

Deer Lake
Cleanup Objectives
Estimate mass and volume
ƒ what is expected volume of soil (sediment) and
mass of contaminant to meet a cleanup goal?

Deer Lake
Cleanup Objectives
Estimate mass and volume

Deer Lake
Software and Resources
Software
Freeware
ƒ FIELDS Tools for ArcGIS (http://epa.instepsoftware.com/FIELDS/)
ƒ F/S Plus [stand-alone] (http://epa.instepsoftware.com/FIELDS/)
ƒ SADA [stand-alone] (http://www.tiem.utk.edu/~sada/index.shtml)
ƒ GMS [stand-alone] (http://www.ems-i.com/)

Proprietary Software
ƒ WinGslib (www.statios.com)
ƒ EVS/MVS (www.ctech.com)
ƒ SAGE2001 (www.isaaks.com)
ƒ earthVision (www.dgi.com)
Software and Resources
Software
Freeware
ƒ FORMS II Lite (http://www.epa.gov/superfund/programs/clp/f2lite.htm)
ƒ Generates sample labels, bottle tags, and Chain of Custody (COC) forms;
ƒ Tracks samples from the field to the laboratory;
ƒ Facilitates electronic capture of sample information into databases; and
ƒ Exports data electronically as .xml, .dbf or .txt files.
ƒ MARSSIM (http://www.epa.gov/radiation/marssim/index.html)
ƒ The Multi-Agency Radiation Surveys and Site Investigation Manual
(MARSSIM) provides detailed guidance for planning, implementing, and
evaluating environmental and facility radiological surveys conducted to
demonstrate compliance with a dose- or risk-based regulation
ƒ ProUCL (http://www.epa.gov/esd/tsc/software.htm)
ƒ Estimates Upper Confidence Limit (UCL) of the mean using various
parametric and nonparametric methods
ƒ Statistical methods that can be used to verify the attainment of cleanup
standards
Software and Resources
Software
Freeware
ƒ VSP (http://dqo.pnl.gov/index.htm)
ƒ Statiscally-based sample design software
ƒ Supported and financed by the USEPA to meet the DQ) (Data Quality
Objectives)

Proprietary Software
ƒ TerraSeer (http://www.terraseer.com/index.php)
ƒ Space-Time Intelligence System (STIS)
ƒ A GIS software that allows for space and time analyses (e.g., cluster
analysis)
End of Talk
Site Info Management: Decision Making
Tools
™ ArcGIS

™ Internet Tools
ƒ Google Earth
ƒ ArcGIS Explorer
ƒ MS Virtual Earth

™ IMAAC
How Robust are these tools?
™ ArcGIS
ƒ Capable of sophisticated Spatial and Statistical Analysis
ƒ On the fly Complex Queries
ƒ Requires extensive knowledge of GIS
™ Internet Tools (Google Earth, ArcGIS Explorer, MS
Virtual Earth)
ƒ More for Presentation Purposes and general use
ƒ Less functionality for on the fly or detailed data analysis
ƒ Designed for non-GIS personnel
ƒ More extensive analysis would require customized
programming
™ IMAAC
ƒ Capable of short term, quick response plume mapping
ƒ More detailed modeling can be conducted
GIS = Geographic Information Systems

It is a collection of computer hardware, software, and geographic data for capturing,


managing, analyzing, and displaying all forms of geographically referenced
information.

How does a full-blown GIS software like ArcGIS differ from Google Earth or other
Internet-based free software?

ArcGIS can answer the following questions:

Query and display:


Find and display all samples collected in the last week
Find and display all H2S levels above a criterion
Find and display the highest Cadmium levels in the top 2 feet of soil
ArcGIS can answer the following questions:

Quantification of changes that have occurred over space or time:


What is the area (yds2) of a proposed clean up?
How much has the elevation of a landfill decreased (subsided) over
time?
Where have PCB concentrations increased in the sediment surface from
last summer?
What is the mass of Mercury in a proposed removal area?

Statistical analysis:
Is there a statistically significant relationship between residential soil levels for Lead?
Is there a pattern of Arsenic levels in soil from an airborne release?
Case Study
Nicor Gas Company Response, Park Ridge, IL
ƒ In June 2007, Nicor Gas company (Nicor) discovered PCBs in gas
meters at four homes in Park Ridge, IL. Nicor cleaned up the
homes and contacted U.S. EPA, whose inspectors then performed
follow-up testing of indoor air, soil, and hard surfaces. In July and
August, further sampling of other homes in Park Ridge was
conducted in order to determine the extent of contaminated gas
meters.
ƒ In July and August, further sampling of other homes in Park Ridge
was conducted in order to determine the extent of contaminated
gas meters. In total, more than 140 homes were sampled. Nicor,
U.S. EPA Region 5 Land and Chemicals Division (LCD), U.S. EPA
Region 5 Superfund Emergency Response Branch (ERB), and
Illinois Environmental Protection Agency (IEPA) contributed to the
sampling effort.
Nicor Gas Company Response
™ In September, ERB requested FIELDS
assistance in producing maps of PCB data
collected in Park Ridge.
™ These data included “source” samples, or
samples from components of the gas
transmission system, interior surface wipe
samples, and interior air samples.
™ Sample data was compiled from multiple
formats into a single database.
™ Geographic coordinates for individual
properties were obtained using an Internet
geocoding service. Using this database
with ESRI ArcMap GIS software, various
maps showing the first seven properties
with detected levels of PCBs were created.
™ For a presentation to the RA, data files
were converted to ESRI ArcExplorer and
displayed.
Nicor Gas Company Response
™ Following the production of these
maps, new sample data was
received and added to the
database. In preparation for
another site meeting, additional
data queries were created using
ArcMap. For example, queries
were generated that showed the
results data and locations of the
original four sampled homes, all
commercial sampled properties,
and properties at which detected
PCBs were found concurrently in
meter, air, and wipe samples.
PCB was detected in “source” samples. “Source”
samples were from gas meters, regulators, and drip
legs
Decision Making Tools: Internet Tools

™ Google Earth

™ ArcGIS Explorer

™ MS Virtual Earth
Google Earth
™ Utilizes Google search (address) with satellite imagery, maps, and terrain.
™ Free and license versions with various functionality
ƒ Google Earth (free)
ƒ Earth Plus ($20)
ƒ Import GPS data
ƒ Earth Pro ($400)-EPA OSCs all have a 1 year license
ƒ Increased functionality and import/export capabilities
ƒ Allows GIS coverages to be read directly
ƒ Enterprise
™ Limitations
ƒ Requires Internet
ƒ Aerials can’t be downloaded and used somewhere else.
ƒ Limited cartographic features
ƒ Limited GIS functionality
™ Advantages
ƒ Simple
ƒ Fast
ƒ Maintained
Ohio River Tools
™ Sample Designs and navigation
™ Map generation with save
™ Data saved as ArcView shapefile, DBF,
and AutoCAD DXF
Data Collection Dictionary
Query Excel Spreadsheet

Queried data can be saved and exported to


a csv and kml format
Google KML Display
Outfall Point Link
Kerr-McGee
ArcGIS Explorer
™ Layers
ƒ Search (address)
ƒ Imagery
ƒ Topographic maps
ƒ historical maps
ƒ street maps
™ Free
™ Pros
ƒ Simple
ƒ Fast
ƒ Maintained
ƒ Connect to local data and services
from ArcGIS Server
™ Cons
ƒ Requires Internet
ƒ Limited GIS functionality
MW Flood
EOC staffed for a week on Midwest
Floods including GIS:
1. Incident coordinates and information reported
and brought into ARCGIS Explorer
2. Used ArcGIS Explorer 9.2 and added incident
labels and coordinates rapidly along with SF
sites, flood plains, etc.
NRC Locations of Interest NPL Locations of Interest
FEMA 500 Year Flood Plain FEMA 100 Year Flood Plain
FEMA 100 Year Flood Plain
MS Virtual Earth

™ Build/Develop web page on top of their platform


™ Free next 2 years for EPA
™ Flexibility
™ Pros
ƒ Flexibility and customizability
ƒ Access all on the web
™ Cons
ƒ Not an out of the box tool
ƒ Web Security
ƒ In order to make dynamic need to setup/maintain
database
Virtual Earth at EPA
Virtual Earth at EPA
What is IMAAC?
™ The IMAAC (Interagency Modeling and
Atmospheric Assessment Center) provides
atmospheric hazards predictions in support of
Federal agencies responding to incidents of
airborne releases with national significance.

™ The IMAAC leverages existing Federal


capabilities and is responsible for providing
accurate, reliable estimates of predicted hazard
areas, with associated concentrations, that
serve as the foundation for decisions by the
authorized emergency managers.
What is NARAC?
™ The NARAC (National Atmospheric Release Advisory
Center), located at the University of California’s
Lawrence Livermore National Laboratory, has been
designated as the primary interim provider of IMAAC
capabilities and is currently supporting hundreds of
Department of Homeland Security stakeholders in
addition to its traditional suite of customers and users

™ NARAC is a distributed system, providing modeling and


geographical information tools for deployment to an end
user's computer system as well as real-time access to
global meteorological and geographical databases and
advanced three-dimensional model predictions from the
national center
NARAC Features
™ NARAC provides a suite of multi-scale (local-, regional-,
continental- and global-scale) atmospheric flow and
dispersion models for a wide range of hazards. Some of
the key features of this modeling system are:
™ Automated, and validated real-time 3-D centralized
modeling system at NARAC that simulates complex
wind flows, detailed particle dispersion, wet and dry
deposition on multiple spatial scales:
ƒ Local-scale and regional-scale meteorological forecast and
dispersion models
ƒ Long-range meteorological forecast and dispersion models
™ Models re-locatable anywhere in the world in real-time
™ Nuclear explosion fallout model
™ Fast-running, deployable local-scale dispersion model
Types of NARAC Support
™ The IMAAC/NARAC supports customers
through several channels:
ƒ Direct telephone calls to expert operations
staff
ƒ Internet interface (NARAC Web) to a high-
performance computing center

ƒ Internet-based remote access software


installed on a customer's local computer.
Case Study: Detroit Refinery Fire
Exercise
Ardent Sentry
Database Query based on NARAC Plume
Contact Information

NARAC - http://narac.llnl.gov
Customer Support
Emergency Only: 925-424-6465 (24x7)
Non-emergency: 935-424-2722 (daytime)
e-mail: narac@llnl.gov
To request account:
https://naracwebx2.llnl.gov/NaracWeb/jsp/RequestAccount.jsp

IMAAC Web
http://imaacweb.llnl.gov
925-422-9159 7:30 am - 4:15 pm (PT), M-F
925-422-7627

IMAAC emergency contact dispatcher 925-422-9100 (24x7)

You might also like