Introduction To Artificial Intelligence

Data-driven methods in
Environmental Sciences
Exploration of Artificial Intelligence Techniques
Valliappa.Lakshmanan@noaa.gov
lakshman@ou.edu 1
Data Driven Methods
What is Artificial Intelligence?
Common AI techniques
Choosing between AI techniques
Pre and post processing
lakshman@ou.edu 2
What is AI?
Machines that perceive, understand and

react to their environment
Goal of Babbage, etc.
Oldest endeavor in computer science
Machines that think

Robots: factory floors, home vacuums
Still quite impractical
lakshman@ou.edu 3
AI vs. humans
AI applications built on Aristotlean logic
Induction, semantic queries, system of logic
Human reasoning involves more than just induction
Computers never as good as humans

In reasoning and making sense of data
In obtaining a holistic view of a system
Computers much better than humans

In processing reams of data
Performing complex calculations
lakshman@ou.edu 4
Successful AI applications
Targetedtasks more amenable to

automated methods
Build special-purpose AI systems
Determine appropriate dosage for a drug
Classify cells as benign or cancerous
Called expert systems

Methodology based on expert reasoning
Quick and objective ways to obtain answers
lakshman@ou.edu 5
Data Driven Methods
lakshman@ou.edu 6
Fuzzy logic
Fuzzy logic addresses key problem in

expert systems
How to represent domain knowledge
Humans use imprecisely calibrated terms
How to build decision trees on imprecise

thresholds
lakshman@ou.edu 7
Fuzzy logic example
Source: Matlab fuzzy logic toolbox tutorial
http://www.mathworks.com/access/helpdesk/help/toolbox/fuzzy/fp350.html
lakshman@ou.edu 8
Advantages of fuzzy logic
Considerable skill for little investment

Fuzzylogic systems piggy bank on human
analysis
Humans encode rules after intelligent analysis of
lots of data
Verbal rules generated by humans are robust
Simple to create
Not much need for data or ground truth
Logic tends to be easy to program
Fuzzy rules are human understandable

lakshman@ou.edu 9
Where not to use fuzzy logic
Do not use fuzzy logic if:
Humans do not understand the system
Different experts disagree
Knowledge can not be expressed with verbal rules
Gut instinct is involved

Not just objective analysis
A fuzzy logic system is limited
Piece-wise linear approximation to a system
Non-linear systems can not be approximated
Many environment applications are non-linear
lakshman@ou.edu 10
Neural Networks
Neural networks can approximate non-

linear systems
Evidence-based
Weights
chosen through optimization procedure
on known dataset (training)
Workseven if experts cant verbalize their
reasoning, or if there is ground truth
lakshman@ou.edu 11
A example neural network
Diagram from:
http://www.codeproject.com/useritems/GA_ANN_XOR.asp
lakshman@ou.edu 12
Advantages of neural networks
Can approximate any smooth function
The three-layer neural network
Can yield true probabilities
If output node is a sigmoid node
Not hard to train
Training process is well understood
Fast in operations
Training is slow, but once trained, the network can
calculate the output for a set of inputs quite fast
Easy to implement
Just a sum of exponential functions
lakshman@ou.edu 13
Disadvantages of neural networks
A black box
The final set of weights yields no insights
Magnitude of weights doesnt mean much
Measure of skill needs to be differentiable

RMS error, etc.
Can not use Probability of Detection, for example
Training set has to be complete

Unpredictable output on data unlike training
Need lots of data
Need expert willing to do lot of truthing
lakshman@ou.edu 14
Recap:
Fuzzy logic
Humans provide the rules
Not optimal
Neural network
Humans can not understand system
Optimal
Middle ground?
Genetic Algorithms
Decision Trees
lakshman@ou.edu 15
Genetic algorithms
In genetic algorithms
One fixes the model (rule base, equations,
class of functions, etc.)
Optimize the parameters to model on
training data set
Use optimal set of parameters for unknown
cases
lakshman@ou.edu 16
An example genetic algorithm
Sources:
http://tx.technion.ac.il/~edassau/web/genetic_algorithms.htm
http://cswww.essex.ac.uk/research/NEC/
lakshman@ou.edu 17
Advantages of genetic algorithms
Near-optimal parameters for given model

Human-understandable rules
Best parameters for them
Cost function need not be differentiable

The process of training uses natural
selection, not gradient descent
Requires less data than a neural network
Search space is more limited
lakshman@ou.edu 18
Disadvantages of genetic algorithms
Highly dependent on class of functions

If poor model is chosen, poor results
Optimization may not help at all
Known model does not always lead to
better understanding
Magnitude of weights, etc. may not be
meaningful if inputs are correlated
Problem may have multiple parametric
solutions
lakshman@ou.edu 19
Decision trees
Can automatically build decision trees from
known data Root
Prune trees
30 50
Select thresholds
Choose operators
Disadvantages
Piece-wise linear, so typically less T < 10C T > 10C
skilled than neural networks 20 15 10 35
Large decision trees are effectively a
blackbox
Can not do regression, only
classification Z > 45 Z < 45 V<5 V>5
Advantages: 18 2 2 13 82 2 33
Fast to train
New advances: bagged, boosted
decision trees approach skill of neural
networks, but are no longer fast to train
lakshman@ou.edu 20
Radial Basis Functions
Diagram from: A. W. Jayawardena & D. Achela K. Fernando 1998: Use of Radial Basis Function Type Artificial
Neural Networks for Runoff Simulation, Computer-Aided Civil and Infrastructure Engineering 13:2
Radial Basis Functions are a form of neural network

Localized gaussians
Linear sum of non-linear functions
Advantage: Can be solved by inverting a matrix, so very fast
Disadvantage: Not a general-enough model
lakshman@ou.edu 21
Data Driven Methods
lakshman@ou.edu 22
Typical data-driven application
Input Data Which features?
How do we find f()
Features
AI application
in run-time f(features)
Result
lakshman@ou.edu 23
What is the role of the data?
Validation
Test known model
Technique:
Difference between model output and ground truth helps to validate the
model
Calibration
Find parameters to model with desired structure
Technique:
Tuned fuzzy logic method
Genetic algorithms
Induction
Find model and parameters from just data
Technique:
Neural network methods, bagged/boosted decision trees, support vector
machines, etc.
lakshman@ou.edu 24
What is the problem to solve?
Do you have a bunch of data and want to:
Estimate an unknown parameter from it?
True rainfall based on radar observations?
Amount of liquid content from in-situ measurements of
temperature, pressure, etc?
Regression
Classify what the data correspond to?
A water surge?
A temperature inversion?
A boundary?
Classification
Regression and classification arent that different
Classification: estimate probability of an event
A function from 0-1
lakshman@ou.edu 25
Which AI technique?
Do you have expert knowledge?
Humans have a model in their head? Should the final f() be
understandable?
Create fuzzy logic rules from experts reasoning
Aggregate the individual fuzzy logic rules
Can tune the fuzzy rules based on data
Using regression, decision trees or neural networks for RMS error criterion
Genetic algorithms for error criteria like ROC, economic cost, etc.
Many times the original rules are just fine
Do you already know the model?
A power-law relationship? Gaussian? Quadratic? Rules?
Just need to find parameters to this model?
If linear, just use linear regression
If non-linear: use genetic algorithms
Use continuous GAs
Both of these can be used for regression (therefore, also
classification)
lakshman@ou.edu 26
Which AI technique (contd.)
Do you know nothing about the data?
Not the suspected equation/model (GA)?
Not the suspected rules (fuzzy logic)?
Use a AI technique that supplies its equations/rules
black box.
For classification, use:
Bagged decision trees or Support Vector Machines
If output is probabilistic, remember to apply Platt scaling
Summary statistics on bagged DTs can help answer why
Neural Networks
For regression, use:
Neural networks
lakshman@ou.edu 27
Where do your data come from?
Observed data
Compute features
Choose AI technique
The 4 choices in the previous two slides
Simulated data:
Example: trying to replicate a very complex model
Throw randomly-generated data at model
Compute features
Choose AI technique:
GA for parametric approximations
NN when you dont know how to approximate
lakshman@ou.edu 28
Where do you get your inputs?
What type of data do you have?
Individual observations?
Sample them (choose at random) and use directly
Sparse observations in a time series?
Generate time-based features (1D moving windows)
Signal processing features from time series
Data from remotely sensed 2D grids?
Generate image-based features using convolution filters
Do you need:
Pixel-based regression/classification?

Use convolution features directly
Object-based regression/classification?

Identify regions using region growing

Use region-aggregate features
lakshman@ou.edu 29
Typical data-driven application
Observed data
Signal/image processing;sampling
Features
normalize/create chromosome/
determine confidences
FzLogic/GenAlg/NN/DecTree
f()
Platt method/region-average/threshold
A data-driven application
in run-time
Result
lakshman@ou.edu 30
Data Driven Methods
lakshman@ou.edu 31
Preprocessing
Often can not use pixel data directly
Too much data, too highly correlated
May need to segment pixels into objects and use features
computed on the objects
Different data sets may not be collocated
Need to interpolate to line them up
Mapping, objective analysis
Noise in data may need to be reduced
Smoothing
Present statistic of data, rather than data itself
Features need to be extracted from data
Human experts often good source of ideas on signatures to
extract from data
lakshman@ou.edu 32
Postprocessing
Theoutput of an expert system may be grid
point by grid point
May need to provide output on objects
Storms, forests, etc.
Can average outputs over objects pixels
May need probabilistic output
Scale output of maximum marginal techniques
Use a sigmoid function
Called Platt scaling
lakshman@ou.edu 33
Summary
Data-driven methods to perform specific targeted tasks
Fuzzy logic, neural networks, genetic algorithms, decision
trees
Understand the role of your data
Do experts understand the system? (have a model)
Do experts expect to understand the system? (readability)
Image processing techniques on spatial grids
lakshman@ou.edu 34

Introduction To Artificial Intelligence

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Artificial Intelligence

Uploaded by

Copyright:

Available Formats

Data-driven methods in

What is Artificial Intelligence?

Choosing between AI techniques

Pre and post processing

Machines that perceive, understand and

Machines that think

Computers never as good as humans

Computers much better than humans

Targetedtasks more amenable to

Called expert systems

What is Artificial Intelligence?

Choosing between AI techniques

Pre and post processing

Fuzzy logic addresses key problem in

How to build decision trees on imprecise

Considerable skill for little investment

Fuzzy rules are human understandable

Knowledge can not be expressed with verbal rules

Gut instinct is involved

Neural networks can approximate non-

Measure of skill needs to be differentiable

Training set has to be complete

Near-optimal parameters for given model

Cost function need not be differentiable

Highly dependent on class of functions

Radial Basis Functions are a form of neural network

What is Artificial Intelligence?

Choosing between AI techniques

Pre and post processing

Input Data Which features?

How do we find f()

What is Artificial Intelligence?

Choosing between AI techniques

Pre and post processing

You might also like