Professional Documents
Culture Documents
Contents
Introduction
Simple Neural Networks for Pattern
Classification
Pattern Association
Neural Networks Based on
Competition
Backpropagation Neural Network
November 11, 200
4
Introduction
Much of these notes come from
Fundamentals of Neural Networks:
Architectures, Algorithms, and
Applications by Laurene Fausett,
Prentice Hall, Englewood Cliffs, NJ,
1994.
Introduction
Aims
Introduce some of the fundamental
techniques and principles of neural
network systems
Investigate some common models and
their applications
10
A Brief History
1949 Hebb published his book The Organization of Behavior, in which the Hebbian
learning rule was proposed.
1958 Rosenblatt introduced the simple single layer networks now called Perceptrons.
1969 Minsky and Paperts book Perceptrons demonstrated the limitation of single layer
perceptrons, and almost the whole field went into hibernation.
1982 Kohonen developed the Self-Organizing Maps that now bear his name.
1986 The Back-Propagation learning algorithm for Multi-Layer Perceptrons was rediscovered and the whole field took off again.
2000s The power of Ensembles of Neural Networks and Support Vector Machines
becomes apparent.
11
Overview
Artificial Neural Networks are powerful computational
systems consisting of many simple processing elements
connected together to perform tasks analogously to
biological brains.
They are massively parallel, which makes them efficient,
robust, fault tolerant and noise tolerant.
They can learn from training data and generalize to new
situations.
They are useful for brain modeling and real world
applications involving pattern recognition, function
approximation, prediction,
November 11, 200
4
12
13
14
15
16
17
18
19
Networks of McCulloch-Pitts
Neurons
Artificial neurons have the same basic components as
biological neurons. The simplest ANNs consist of a set of
McCulloch-Pitts neurons labeled by indices k, i, j and
activation flows between them via synapses with strengths
wki, wij:
20
21
for all x
22
Bipolar sigmoid
2
g ( x) 2 f ( x) 1
1
x
1 e
23
24
Review
Biological neurons, consisting of a cell body,
axons, dendrites and synapses, are able to
process and transmit neural activation
The McCulloch-Pitts neuron model (Threshold
Logic Unit) is a crude approximation to real
neurons that performs a simple summation and
thresholding function on activation levels
Appropriate mathematical notation facilitates the
specification and programming of artificial
neurons and networks of artificial neurons.
November 11, 200
4
25
Networks of McCulloch-Pitts
Neurons
One neuron cant do much on its own.
Usually we will have many neurons labeled
by indices k, i, j and activation flows between
them via synapses with strengths wki, wij:
26
The Perceptron
We can connect any number of McCulloch-Pitts
neurons together in any way we like.
An arrangement of one input layer of McCullochPitts neurons feeding forward to one output layer of
McCulloch-Pitts neurons is known as a Perceptron.
27
28
x2
0
1
0
1
y
0
1
1
1
x1
=2
y
x2
29
x2
0
1
0
1
y
0
0
0
1
x1
=2
y
x2
30
y
1
0
x1
-1
=2
y
1
2
bias
31
x2
0
1
0
1
y
0
0
1
0
x1
=2
y
x2
-1
32
Logical XOR
Logical XOR
x1
0
0
1
1
x2
0
1
0
1
y
0
1
1
0
x1
?
y
x2
33
Logical XOR
How long do we keep looking for a
solution? We need to be able to calculate
appropriate parameters rather than
looking for solutions by trial and error.
Each training pattern produces a linear
inequality for the output in terms of the
inputs and the network parameters. These
can be used to compute the weights and
thresholds.
November 11, 200
4
34
35
36
ANN Topologies
Mathematically, ANNs can be represented as weighted
directed graphs. For our purposes, we can simply think in
terms of activation flowing between processing units via
one-way connections
Single-Layer Feed-forward NNs One input layer and one
output layer of processing units. No feed-back connections.
(For example, a simple Perceptron.)
37
ANN Topologies
38
39
40
x1
-1
y1
2
z1
2
Cold
x2
z2
y2
41
Apply
Cold
Cold
42
Remove Cold
Cold
43
Cold
44
Cold
Perceive Heat
45
Apply Cold
Cold
46
Cold
47
Cold
Perceive Cold
48
Example: Classification
Consider the
example of
classifying airplanes
given their masses
and speeds
How do we construct
a neural network that
can classify any type
of bomber or fighter?
November 11, 200
4
49
1. Understand and specify your problem in terms of inputs and required outputs,
e.g. for classification the outputs are the classes usually represented as binary
vectors.
2. Take the simplest form of network you think might be able to solve your
problem, e.g. a simple Perceptron.
4. Make sure that the network works on its training data, and test its generalization
by checking its performance on new testing data.
5. If the network doesnt perform well enough, go back to stage 3 and try harder.
6. If the network still doesnt perform well enough, go back to stage 2 and try harder.
7. If the network still doesnt perform well enough, go back to stage 1 and try harder.
50
51
52
53
54
55
56
57
z1
-1
z2 = x2 AND NOT
x1
y = z1 OR z2
y
-1
x2
z2
58
59
60
General Decision
Boundaries
Generally, we will
want to deal with
input patterns that are
not binary, and expect
our neural networks to
form complex decision
boundaries
We may also wish to
classify inputs into
many classes (such as
the three shown here)
November 11, 200
4
61
62
63
Generalization in
Classification
Suppose the task of our network is to learn a classification
decision boundary
Our aim is for the network to generalize to classify new
inputs appropriately. If we know that the training data
contains noise, we dont necessarily want the training data
to be classified totally accurately, as that is likely to reduce
the generalization ability.
64
Generalization in Function
Approximation
Suppose we wish to recover a function for which we only
have noisy data samples
We can expect the neural network output to give a better
representation of the underlying function if its output curve
does not pass through all the data points. Again, allowing a
larger error on the training data is likely to lead to better
generalization.
65
66
67
Perceptron Learning
For simple Perceptrons performing classification,
we have seen that the decision boundaries are
hyperplanes, and we can think of learning as the
process of shifting around the hyperplanes until
each training pattern is classified correctly
Somehow, we need to formalize that process of
shifting around into a systematic algorithm that
can easily be implemented on a computer
The shifting around can conveniently be split up
into a number of small steps.
November 11, 200
4
68
Perceptron Learning
If the network weights at time t are wij(t),
then the shifting process corresponds to
moving them by an amount wij(t) so that
at time t+1 we have weights
wij(t+1) = wij(t) + wij(t)
It is convenient to treat the thresholds as
weights, as discussed previously, so we
dont need separate equations for them
November 11, 200
4
69
70
Perceptron Algorithm
Step 0: Initialize weights and bias
For simplicity, set weights and bias to zero
Set learning rate (0 <= <= 1)
()
71
Perceptron Algorithm
Step 4: Compute response of output
unit:
y _ in b xi wi
i
if y_in
1
y 0 if - y_in
1
if y_in -
72
Perceptron Algorithm
Step 5: Update weights and bias if an error
occurred for this pattern
if y != t
wi(new) = wi(old) + txi
b(new) = b(old) + t
else
wi(new) = wi(old)
b(new) = b(old)
73
Convergence of Perceptron
Learning
The weight changes wij need to be applied
repeatedly for each weight wij in the network,
and for each training pattern in the training
set. One pass through all the weights for the
whole training set is called one epoch of
training
Eventually, usually after many epochs, when
all the network outputs match the targets for
all the training patterns, all the wij will be
zero and the process of training will cease. We
then say that the training process has
converged to a solution
November 11, 200
4
74
Convergence of Perceptron
Learning
It can be shown that if there does exist a
possible set of weights for a Perceptron
which solves the given problem correctly,
then the Perceptron Learning Rule will find
them in a finite number of iterations
Moreover, it can be shown that if a problem
is linearly separable, then the Perceptron
Learning Rule will find a set of weights in a
finite number of iterations that solves the
problem correctly
November 11, 200
4
75
76
Hebbian Learning
In 1949 neuropsychologist Donald Hebb postulated how biological
neurons learn:
In other words:
77
78
Hebbian vs Perceptron
Learning
In the notation used for Perceptrons, the Hebbian
learning weight update rule is:
wij (new)= outj . ini
79
Adaline
Adaline (Adaptive Linear Network) was
developed by Widrow and Hoff in 1960.
Uses bipolar activations (-1 and 1) for its
input signals and target values
Weight connections are adjustable
Trained using the delta rule for weight
update
wij(new) = wij(old) + (targj-outj)xi
November 11, 200
4
80
81
82
Autoassociative Net
The feed forward
autoassociative net has
the following diagram
Useful for determining is
something is a part of
the test pattern or not
Weight matrix diagonal
is usually zero
improves generalization
Hebbian learning if
mutually orthogonal
vectors are used
x1
y1
xi
yj
xn
ym
83
BAM Net
Bidirectional Associative Net
84