You are on page 1of 40

fo

.in
rs
de
ea
yr
.m
w
w
,w
ty
or

www.myreaders.info/ , RC Chakraborty, e-mail rcchak@gmail.com , Aug. 10, 2010


http://www.myreaders.info/html/soft_computing.html

kr

ab

Fundamentals of Neural Networks : Soft Computing Course Lecture 7 14, notes, slides

ha

www.myreaders.info

Fundamentals of Neural Networks


Soft Computing
Neural network, topics : Introduction, biological neuron model,
artificial neuron model, neuron equation. Artificial neuron : basic
elements, activation and threshold function, piecewise linear and
sigmoidal function. Neural network architectures : single layer feedforward

network,

multi

layer

feed-forward

network,

recurrent

networks. Learning methods in neural networks : unsupervised


Learning

Hebbian

learning,

competitive

learning;

Supervised

learning - stochastic learning, gradient descent learning; Reinforced


learning. Taxonomy of neural network systems : popular neural
network systems, classification of neural network systems as per
learning methods and architecture. Single-layer NN system : single
layer

perceptron,

learning

algorithm

for

training

perceptron,

linearly separable task, XOR problem, ADAptive LINear Element


(ADALINE)
networks:

architecture,

clustering,

and

training.

classification,

approximation, prediction systems.

Applications

pattern

of

recognition,

neural
function

fo
.in
rs
de
ea
,w

.m

yr

Fundamentals of Neural Networks

kr

ab

or

ty

Soft Computing

ha

Topics

(Lectures 07, 08, 09, 10, 11, 12, 13, 14

8 hours)
Slides
03-12

1. Introduction

Why neural network ?, Research History, Biological Neuron model,


Artificial Neuron model, Notations, Neuron equation.
13-19

2. Model of Artificial Neuron

Artificial neuron - basic elements, Activation functions Threshold


function, Piecewise linear function, Sigmoidal function, Example.
20-23

3. Neural Network Architectures

Single layer Feed-forward network, Multi layer Feed-forward network,


Recurrent networks.
24-29

4. Learning Methods in Neural Networks

Learning

algorithms:

Competitive

Unsupervised

learning;

Supervised

Learning
Learning

Hebbian

Learning,

Stochastic

learning,

Gradient descent learning; Reinforced Learning;


5. Taxonomy Of Neural Network Systems

30-32

Popular neural network systems; Classification of neural network


systems with respect to learning methods and architecture types.
32-39

6. Single-Layer NN System

Single layer perceptron :

Learning algorithm for training Perceptron,

Linearly separable task, XOR Problem; ADAptive LINear Element


(ADALINE) : Architecture, Training.
7.

Applications of Neural Networks

39

Clustering, Classification / pattern recognition, Function approximation,


Prediction systems.
8.
02

References :

40

fo
.in
rs
de
ea

What is Neural Net ?

kr

ab

or

ty

,w

.m

yr

Fundamentals of Neural Networks

ha

A neural net is an artificial representation of the human brain that

tries

to

simulate

its

learning

process.

An

artificial

neural

network

(ANN) is often called a "Neural Network" or simply Neural Net (NN).

Traditionally, the word neural network is referred to a network of


biological neurons in the nervous system that process and transmit
information.

Artificial neural network is an interconnected group of artificial neurons


that uses a mathematical model or computational model for information
processing based on a connectionist approach to computation.

The artificial neural networks are made of interconnecting artificial


neurons which may share some properties of biological neural networks.

Artificial Neural network is a network of simple processing elements


(neurons) which can exhibit complex global

behavior, determined by the

connections between the processing elements and element parameters.


03

fo
.in
rs
de
ea

SC - Neural Network Introduction

Neural Computers mimic certain processing capabilities of the human brain.

ab

or

ty

,w

.m

yr

1. Introduction

ha

kr

- Neural

Computing is an information processing paradigm, inspired by

biological system, composed of a large number of highly interconnected


processing elements (neurons) working in unison to solve specific problems.
- Artificial Neural Networks (ANNs), like people, learn by example.
- An ANN is configured for a specific application, such as pattern recognition or

data classification, through a learning process.


- Learning

in

biological

systems

involves

adjustments

to

the

synaptic

connections that exist between the neurons. This is true of ANNs as well.
04

fo
.in
rs
de
ea

SC - Neural Network Introduction

Neural Networks follow a different paradigm for computing.

ab

or

ty

,w

.m

yr

1.1 Why Neural Network

ha

kr

The conventional computers are good for - fast arithmetic and does

what programmer programs, ask them to do.


The conventional computers

are not so good for - interacting with

noisy data or data from the environment, massive parallelism,

fault

tolerance, and adapting to circumstances.


The neural network systems help where we can not formulate an
algorithmic solution or where we can get lots of examples of the
behavior we require.
Neural Networks follow different paradigm for computing.
The von Neumann machines are based on the processing/memory
abstraction of human information processing.
The neural networks
biological

are based on the

parallel architecture of

brains.

Neural networks are a form of multiprocessor computer system, with


- simple processing elements ,
- a high degree of interconnection,
- simple scalar messages, and
- adaptive interaction between elements.
05

fo
.in
rs
de
ea

SC - Neural Network Introduction

or

ty

,w

.m

yr

1.2 Research History


The history is relevant because for nearly two decades the future of

ha

kr

ab

Neural network remained uncertain.

McCulloch and Pitts (1943) are generally recognized as the designers of the
first neural network. They combined many simple processing units together
that could lead to an overall increase in computational power. They
suggested many ideas like : a neuron has a threshold level and once that
level is reached the neuron fires. It is still the fundamental way in which
ANNs operate. The McCulloch and Pitts's network had a fixed set of weights.
Hebb (1949) developed the first learning rule, that is if two neurons are
active at the same time then the strength between them should be
increased.
In the 1950 and 60's, many researchers (Block, Minsky, Papert, and
Rosenblatt worked on perceptron. The neural network model could be
proved to converge to the correct weights, that will solve the problem. The
weight adjustment (learning algorithm) used in the perceptron was found
more powerful than the learning rules used by Hebb. The perceptron caused
great excitement. It was thought to produce programs that could think.
Minsky & Papert (1969) showed that perceptron could not learn those
functions which are not linearly separable.
The neural networks research declined throughout the 1970 and until mid
80's because the perceptron could not learn certain important functions.
Neural network regained importance in 1985-86. The researchers, Parker
and LeCun discovered a learning algorithm for multi-layer networks called
back propagation that could solve problems that were not linearly
separable.
06

fo
.in
rs
de
ea

SC - Neural Network Introduction

The human brain consists of a large number, more than a billion of

ty

,w

.m

yr

1.3 Biological Neuron Model

kr

ab

or

neural cells that process information. Each cell works like a simple

ha

processor. The massive interaction between all cells and their parallel

processing only makes the brain's abilities possible.


Dendrites

are

branching

fibers

that

extend from the cell body or soma.


Soma or cell body of a neuron contains
the nucleus and other structures, support
chemical processing and production of
neurotransmitters.
Axon

is

singular

fiber

carries

information away from the soma to the


synaptic sites of other neurons (dendrites
and somas), muscles, or glands.
Axon hillock is the site of summation
for

incoming

information.

At

any

moment, the collective influence of all


neurons that conduct impulses to a given

Fig. Structure of Neuron

neuron will determine whether or not an


action potential will be initiated at the

axon hillock and propagated along the axon.


Myelin Sheath consists of fat-containing cells that insulate the axon from electrical
activity. This insulation acts to increase the rate of transmission of signals. A gap
exists between each myelin sheath cell along the axon. Since fat inhibits the
propagation of electricity, the signals jump from one gap to the next.
Nodes of Ranvier are the gaps (about 1 m) between myelin sheath cells long axons
are Since fat serves as a good insulator, the myelin sheaths speed the rate of
transmission of an electrical impulse along the axon.
Synapse is the point of connection between two neurons or a neuron and a muscle or
a gland. Electrochemical communication between neurons takes place at these
junctions.
Terminal Buttons of a neuron are the small knobs at the end of an axon that release
chemicals called neurotransmitters.
07

fo
.in
rs
de
ea

SC - Neural Network Introduction

The input /output and the propagation of information are shown below.

ha

kr

ab

or

ty

,w

.m

yr

Information flow in a Neural Cell

Fig. Structure of a neural cell in the human brain


Dendrites receive activation from other neurons.
Soma processes the incoming activations and converts them into

output activations.
Axons act as transmission lines to send activation to other neurons.
Synapses

the

junctions

allow

signal

transmission

between

the

axons and dendrites.


The

process of transmission is by diffusion of chemicals called

neuro-transmitters.
McCulloch-Pitts introduced a simplified model of this real neurons.
08

fo
.in
rs
de
ea

SC - Neural Network Introduction

or

ty

,w

.m

yr

1.4 Artificial Neuron Model


An artificial neuron is a mathematical function conceived as a simple

ha

kr

ab

model of a real (biological) neuron.

The McCulloch-Pitts Neuron


This is a simplified model of real neurons, known as a Threshold Logic Unit.
Input1
Input 2

Output

Input n
A set of input connections brings in activations from other neurons.
A processing unit sums the inputs, and then applies a non-linear

activation function (i.e. squashing / transfer / threshold function).


An output line transmits the result to other neurons.

In other words ,
- The input to a neuron arrives in the form of signals.
- The signals build up in the cell.
- Finally the cell discharges (cell fires) through the output .
- The cell can start building up signals again.
09

fo
.in
rs
de
ea

SC - Neural Network Introduction

Recaps :

Scalar, Vectors, Matrices and Functions

ab

or

ty

,w

.m

yr

1.5 Notations

ha

kr

Scalar : The number xi can be added up to give a scalar number.

s = x 1 + x2 + x 3 + . . . . + x n =

i=1

xi

Vectors : An ordered sets of related numbers. Row Vectors


X = ( x1 , x2 , x3 , . . ., xn ) ,

(1 x n)

Y = ( y1 , y2 , y3 , . . ., yn )

Add : Two vectors of same length added to give another vector.

Z = X + Y = (x1 + y1 , x2 + y2 , . . . . , xn + yn)
Multiply: Two vectors of same length multiplied to give a scalar.

p = X . Y = x1 y1 + x2 y2 + . . . . + xnyn =
10

i=1

xi yi

fo
.in
rs
de
ea

SC - Neural Network Introduction

Matrices : m x n matrix ,

kr

ab

or

ty

,w

.m

yr

row no = m , column no = n

w11

w11

. .

. .

w1n

w21

w21

. .

. .

w21

. .

. .

. .

. .

. .

. .

wmn

ha

W =

wm1 w11
Add or Subtract :

Matrices of the same size are added or subtracted

component by component.
a11 a12
a21 a22

b11

b12

b21

b22

A+B =C,

cij

aij + bij

c11 = a11+b11

c12 = a12+b12

C21 = a21+b21

C22 = a22 +b22

Multiply : matrix A

multiplied by matrix B gives


(m x n)
(n x p)

elements
a11 a12
a21 a22

11

cij =

k=1

b11

b12

b21

b22

matrix C.
(m x p)

aik bkj

c11

c12

c21

c22

c11

(a11 x b11)

(a12 x B21)

c12

(a11 x b12)

(a12 x B22)

C21

(a21 x b11)

(a22 x B21)

C22

(a21 x b12)

(a22 x B22)

fo
.in
rs
de
ea

SC - Neural Network Introduction

or

ty

,w

.m

yr

1.6 Functions
The Function y= f(x) describes a relationship, an input-output mapping,

ha

kr

ab

from x to y.

Threshold or Sign function :

sgn(x) defined as

Sign(x)

O/P

.8

1
sgn (x) =

if x 0

0 if x < 0

.6
.4
.2
0
-4

-3

Threshold or Sign function :

-2

-1

4 I/P

sigmoid(x) defined as a smoothed

(differentiable) form of the threshold function


Sign(x)

O/P

.8

1
sigmoid (x) =
1+e

.6
-x

.2
0
-4

12

-3

-2

-1

4 I/P

fo
.in
rs
de
ea

SC - Neural Network Artificial Neuron Model

or

ty

,w

.m

yr

2. Model of Artificial Neuron


A very simplified model of real neurons is known as a Threshold Logic

kr

ab

Unit (TLU).

The model is said to have :

ha

- A set of synapses (connections) brings in activations from other neurons.

- A processing unit sums the inputs, and then applies a non-linear activation

function (i.e. squashing / transfer / threshold function).


- An output line transmits the result to other neurons.

2.1 McCulloch-Pitts (M-P) Neuron Equation


McCulloch-Pitts neuron is a simplified model of real biological neuron.
Input 1
Input 2

Output

Input n
Simplified Model of Real Neuron
(Threshold Logic Unit)

The equation for the output of a McCulloch-Pitts neuron as a function


of 1 to n inputs is written as
Output =

where
If
If

i=1
n

i=1

sgn (

i=1

Input i

- )

is the neurons activation threshold.

Input i

then Output = 1

Input i

<

then Output = 0

In this McCulloch-Pitts neuron model, the missing features are :


- Non-binary input and output,
- Non-linear summation,
- Smooth thresholding,
- Stochastic, and
- Temporal information processing.
13

fo
.in
rs
de
ea

SC - Neural Network Artificial Neuron Model

or

ty

,w

.m

yr

2.2 Artificial Neuron - Basic Elements


Neuron consists of three basic components - weights, thresholds, and a

ha

kr

ab

single activation function.

x1

W1

x2

W2

Activation
Function

i=1

xn

Wn
Synaptic Weights

Threshold

Fig Basic Elements of an Artificial Linear Neuron


Weighting Factors w

The values w1 , w2 , . . . wn are weights to determine the strength of


input vector X = [x1 , x2 , . . . , xn]T. Each input is multiplied by the
associated weight of the neuron connection XT W. The +ve weight
excites and the -ve weight inhibits the node output.
T

I = X .W = x1 w1 + x2 w2 + . . . . + xnwn =

i=1

xi wi

Threshold

The nodes internal threshold is the magnitude offset. It affects the


activation of the node output y as:
Y = f (I)

= f{

i=1

xi wi - k }

To generate the final output Y , the sum is passed on to a non-linear


filter f called Activation Function or Transfer function or Squash function
which releases the output Y.
14

fo
.in
rs
de
ea
yr
or

ty

,w

.m

SC - Neural Network Artificial Neuron Model

Threshold for a Neuron

In practice, neurons generally do not fire (produce an output) unless

ha

kr

ab

their total input goes above a threshold value.

The total input for each neuron is

the sum of the weighted inputs

to the neuron minus its threshold value. This is then passed through
the sigmoid function. The equation for the transition in a neuron is :
a = 1/(1 + exp(- x))
x =

where

ai wi - Q

is the activation for the neuron

ai

is the activation for neuron i

wi

is the weight

is the threshold subtracted

Activation Function

An activation function f performs a mathematical operation on the


signal output. The most common activation functions are:
- Linear Function,

- Threshold Function,

- Piecewise Linear Function,

- Sigmoidal (S shaped) function,

- Tangent hyperbolic function

The activation functions are chosen depending upon the type of


problem to be solved by the network.
15

fo
.in
rs
de
ea

SC - Neural Network Artificial Neuron Model

or

ty

,w

.m

yr

2.2 Activation Functions f - Types


Over the years, researches tried several functions to convert the input into

ha

kr

ab

an outputs. The most commonly used functions are described below.

- I/P

Horizontal axis shows sum of inputs .

- O/P Vertical axis shows the value the function produces ie output.
- All functions f are designed to produce values between 0 and 1.

Threshold Function
A threshold (hard-limiter) activation function is either a binary type or
a bipolar type as shown below.
binary threshold
O/p

Output of a binary threshold function produces :


1

if the weighted sum of the inputs is +ve,

if the weighted sum of the inputs is -ve.

I/P

1
Y = f (I) =

if I 0

0 if I < 0
bipolar threshold
O/p

Output of a bipolar threshold function produces :


1

if the weighted sum of the inputs is +ve,

-1

if the weighted sum of the inputs is -ve.

I/P

1
Y = f (I) =

-1

if I 0

-1 if I < 0

Neuron with hard limiter activation function is called McCulloch-Pitts model.


16

fo
.in
rs
de
ea

SC - Neural Network Artificial Neuron Model

or

ty

,w

.m

yr

Piecewise Linear Function


This activation function is also called saturating linear function and can

kr

ab

have either a binary or bipolar range for the saturation limits of the output.

ha

The mathematical model for a symmetric saturation function is described

below.
Piecewise Linear
O/p

This is a sloping function that produces :


-1

for a -ve weighted sum of inputs,

for a +ve weighted sum of inputs.

proportional to input for values between +1


and -1 weighted sum,

+1
I/P

-1

Y = f (I) =

if

if -1 I 1

-1
17

I 0

if

I < 0

fo
.in
rs
de
ea

SC - Neural Network Artificial Neuron Model

or

ty

,w

.m

yr

Sigmoidal Function (S-shape function)


The nonlinear curved S-shape function

is

called

the sigmoid function.

kr

ab

This is most common type of activation used to construct the neural

ha

networks. It is mathematically well behaved, differentiable and strictly

increasing function.
Sigmoidal function
1

A sigmoidal transfer function can be


written in the form:

O/P

2.0

=
=

1.0

Y = f (I) =
1+e

0.5

0.5

-2

, 0 f(I) 1

= 1/(1 + exp(- I)) , 0 f(I) 1


I/P

-4

- I

This is explained as
0 for large -ve input values,
1

for large +ve values, with

a smooth transition between the two.


is slope parameter also called shape
parameter; symbol the is also used to
represented this parameter.
The sigmoidal

function is

achieved

using

exponential equation.

By varying different shapes of the function can be obtained which


adjusts the abruptness of the function as it changes between the two
asymptotic values.
18

fo
.in
rs
de
ea

SC - Neural Network Artificial Neuron Model

The neuron shown consists of four inputs with the weights.

x1=1

+1

x2=2

+1

ha

kr

ab

or

ty

,w

.m

yr

Example :

-1

X3=5

xn=8

Activation
Function

Summing
Junction

+2

=0
Threshold

Synaptic
Weights

Fig Neuron Structure of Example

The output I of the network, prior to the activation function stage, is


+1

I = XT . W =

+1
1

8
-1

= 14

+2

(1 x 1) + (2 x 1) + (5 x -1) + (8 x 2) = 14

With a binary activation function the outputs of the neuron is:


y (threshold) = 1;
19

fo
.in
rs
de
ea

SC - Neural Network Architecture

or

ty

,w

.m

yr

3. Neural Network Architectures


An Artificial Neural Network (ANN) is a data processing system, consisting

kr

ab

large

number

of

simple

highly

ha

artificial neuron in a network

interconnected

elements

consisting a set V of vertices

and a set E of edges.


- The vertices may represent neurons (input/output) and
- The edges may represent synaptic links labeled by the weights attached.

Example :

V1

e5
e2

V3
V5
e4
e5

V2

e3

V4

Fig. Directed Graph


Vertices V = { v1 , v2 , v3 , v4, v5 }
Edges
20

as

structure that can be represented using a

directed graph G, an ordered 2-tuple (V, E) ,

processing

E = { e1 , e2 , e3 , e4, e5 }

fo
.in
rs
de
ea

SC - Neural Network Architecture

or

ty

,w

.m

yr

3.1 Single Layer Feed-forward Network


The Single Layer Feed-forward Network consists

kr

ab

weights ,

where the inputs

of

a single layer of

are directly connected to the outputs, via a

ha

series of weights. The synaptic links carrying weights connect every input

to every output , but not other way. This way it is considered a network of
feed-forward type. The sum of the products of the weights and the inputs

is calculated in each neuron node, and if the value is above some threshold
(typically 0) the neuron fires and takes the activated value (typically 1);
otherwise it takes the deactivated value (typically -1).
input xi

output yj

weights wij

w11

x1

y1

w21

w12

w22

x2

y2

w2m
w1m

wn1
wn2

xn

ym

wnm
Single layer
Neurons
Fig. Single Layer Feed-forward Network

21

fo
.in
rs
de
ea

SC - Neural Network Architecture

or

ty

,w

.m

yr

3.2 Multi Layer Feed-forward Network


The name suggests, it consists of multiple layers. The architecture of

kr

ab

this class of network, besides having the input and the output layers,

ha

also

have one or more intermediary layers called hidden layers. The

computational units of the hidden layer are known as hidden neurons.


Input
hidden layer
weights vij

Output
hidden layer
weights wjk

w11

v11

x1

y1

v21

x2

w12

y1
y2

w11

v1m

y3

v2m
vn1

V m

ym

Hidden Layer
neurons yj

Input Layer
neurons xi

w1m

yn
Output Layer
neurons zk

Fig. Multilayer feed-forward network in ( m n) configuration.


- The hidden layer does intermediate computation before directing the

input to output layer.


- The input layer neurons are linked to the hidden layer neurons;

the

weights on these links are referred to as input-hidden layer weights.


- The hidden layer neurons and the corresponding weights are referred to

as output-hidden layer weights.


- A multi-layer feed-forward network with input neurons, m1 neurons in

the first hidden layers, m2 neurons in the second hidden layers, and n
output neurons in the output layers is written as ( - m1 - m2 n ).
The Fig. above illustrates a multilayer feed-forward network with a
configuration ( - m n).
22

fo
.in
rs
de
ea

SC - Neural Network Architecture

or

ty

,w

.m

yr

3.3 Recurrent Networks


The Recurrent Networks differ from feed-forward architecture. A Recurrent

ha

kr

ab

network has at least one feed back loop.

Example :

y1

x1

y1

y2

x2

ym

Feedback
links

Yn

X
Input Layer
neurons xi

Hidden Layer
neurons yj

Output Layer
neurons zk

Fig Recurrent Neural Network

There could be neurons with self-feedback links; that is the output of a


neuron is fed back into it self as input.
23

fo
.in
rs
de
ea

SC - Neural Network Learning methods

or

ty

,w

.m

yr

4. Learning Methods in Neural Networks


The learning methods in neural networks are classified into three basic types :

kr

ab

- Supervised Learning,

ha

- Unsupervised Learning

and

- Reinforced Learning

These three types are classified based on :


- presence or absence of teacher and
- the information provided for the system to learn.

These are further categorized, based on the rules used, as


- Hebbian,
- Gradient descent,
- Competitive and
- Stochastic learning.
24

fo
.in
rs
de
ea

SC - Neural Network Learning methods

ty

,w

.m

yr

Classification of Learning Algorithms


Fig. below indicate the hierarchical representation of the algorithms

kr

ab

or

mentioned in the previous slide. These algorithms are explained in

ha

subsequent slides.
Neural Network
Learning algorithms

Supervised Learning
(Error based)

Stochastic

Reinforced Learning
(Output based)

Error Correction
Gradient descent

Least Mean
Square

Unsupervised Learning

Hebbian

Back
Propagation

Fig. Classification of learning algorithms


25

Competitive

fo
.in
rs
de
ea

SC - Neural Network Learning methods

or

ty

,w

.m

yr

Supervised Learning
- A teacher is present during learning process and presents expected

kr

ab

output.

ha

- Every input pattern is used to train the network.

- Learning process is based on comparison, between network's computed

output and the correct expected output, generating "error".


- The "error" generated is used to change network parameters that result

improved performance.

Unsupervised Learning
- No teacher is present.
- The expected or desired output is not presented to the network.
- The system learns of it own by discovering and adapting to the structural

features in the input patterns.

Reinforced learning
- A teacher is present but does not present the expected or desired output

but only indicated if the computed output is correct or incorrect.


- The information provided helps the network in its learning process.
- A reward is given for correct answer computed and a penalty for a wrong

answer.
Note : The Supervised and Unsupervised learning methods are most popular
forms of learning compared to Reinforced learning.
26

fo
.in
rs
de
ea

SC - Neural Network Learning methods

Hebb proposed a rule based on correlative weight adjustment.

ab

or

ty

,w

.m

yr

Hebbian Learning

ha

kr

In this rule,

the input-output pattern pairs (Xi , Yi)

are associated by

the weight matrix W, known as correlation matrix computed as


W=

where Yi
There

are

i=1

Xi YiT

is the transpose of the associated output vector Yi

many

variations

of

this

researchers (Kosko, Anderson, Lippman) .


27

rule

proposed

by

the

other

fo
.in
rs
de
ea

SC - Neural Network Learning methods

or

ty

,w

.m

yr

Gradient descent Learning


This is based on the minimization of errors E defined in terms of weights

ha

kr

ab

and the activation function of the network.

- Here,

the activation function of the network

differentiable, because

the

updates

of

is required to be

weight is dependent on

the gradient of the error E.


- If

Wij is the weight update of the link connecting the i

th

and the j

th

neuron of the two neighboring layers, then Wij is defined as

Wij = (

E /

Wij )

where is the learning rate parameters and ( E / Wij ) is error


gradient

with reference to the weight Wij .

Note : The Hoffs Delta rule and Back-propagation learning rule are
the examples of Gradient descent learning.
28

fo
.in
rs
de
ea

SC - Neural Network Learning methods

or

ty

,w

.m

yr

Competitive Learning
- In this method, those neurons which respond strongly to the input

kr

ab

stimuli have their weights updated.

ha

- When an input pattern is presented, all neurons in the layer compete,

and the winning neuron undergoes weight adjustment .


- This strategy is called "winner-takes-all".

Stochastic Learning
- In this method the weights are adjusted in a probabilistic fashion.
- Example

: Simulated

annealing which is a learning mechanism

employed by Boltzmann and Cauchy machines.


29

fo
.in
rs
de
ea

SC - Neural Network Systems

or

ty

,w

.m

yr

5. Taxonomy Of Neural Network Systems


In

the

previous

sections,

the

Neural

Network

Architectures

and

the

kr

ab

Learning methods have been discussed. Here the popular neural network

ha

systems are listed. The grouping of these systems in terms of architectures

and the learning methods are presented in the next slide.

Neural Network Systems


ADALINE (Adaptive Linear Neural Element)
ART (Adaptive Resonance Theory)
AM (Associative Memory)
BAM (Bidirectional Associative Memory)
Boltzmann machines
BSB ( Brain-State-in-a-Box)
Cauchy machines
Hopfield Network
LVQ (Learning Vector Quantization)
Neoconition
Perceptron
RBF ( Radial Basis Function)
RNN (Recurrent Neural Network)
SOFM (Self-organizing Feature Map)
30

fo
.in
rs
de
ea

SC - Neural Network Systems

or

ty

,w

.m

yr

Classification of Neural Network


A taxonomy of neural network systems

based on

Architectural types

ha

kr

ab

and the Learning methods is illustrated below.

Learning Methods
Gradient
descent

Hebbian

Competitive

Stochastic

Single-layer
feed-forward

ADALINE,
Hopfield,
Percepton,

AM,
Hopfield,

LVQ,
SOFM

Multi-layer
feed- forward

CCM,
MLFF,
RBF

Neocognition

RNN

BAM,
BSB,
Hopfield,

ART

Boltzmann and
Cauchy
machines

Recurrent
Networks

Table : Classification of Neural Network Systems with respect to


learning methods and Architecture types
31

fo
.in
rs
de
ea

SC - Neural Network Single Layer learning

Here, a simple Perceptron Model and an ADALINE Network Model is presented.

ab

or

ty

,w

.m

yr

6. Single-Layer NN Systems

ha

kr

6.1 Single layer Perceptron

Definition :

An arrangement of one input layer of neurons feed forward

to one output layer of neurons is known as Single Layer Perceptron.


input xi

output yj

weights wij

w11

x1

y1

w21

w12

w22

x2

y2

w2m
w1m

wn1
wn2

xn

ym

wnm
Single layer
Perceptron
Fig.
1

Simple Perceptron Model

if net j

y j = f (net j) =
0
32

if net j

< 0

where net j =

i=1

xi wij

fo
.in
rs
de
ea

SC - Neural Network Single Layer learning

or

ty

,w

.m

yr

Learning Algorithm : Training Perceptron


The training of Perceptron is a supervised learning algorithm where

kr

ab

weights are adjusted to minimize error when ever the output does

ha

not match the desired output.


If the output is correct then no adjustment of weights is done.
K+1

i.e.

ij

ij

If the output is 1

but should have been 0 then the weights are

decreased on the active input link


K+1

i.e.

ij

= W

ij

If the output is 0

. xi
but should have been 1 then the weights are

increased on the active input link


K+1

i.e.

ij

= W

ij

+ . xi

Where
K+1

33

ij

is the new adjusted weight,

ij

is

the old weight

the input and is the learning rate parameter.

xi

is

small leads to slow and large leads to fast learning.

fo
.in
rs
de
ea

SC - Neural Network Single Layer learning

Perceptron can not handle tasks which are not separable.

ab

or

ty

,w

.m

yr

Perceptron and Linearly Separable Task

ha

kr

- Definition :

Sets of points in 2-D space are linearly separable if the

sets can be separated by a straight line.


- Generalizing,

a set of points in n-dimensional space are linearly

separable if there is a hyper plane of (n-1) dimensions

separates

the sets.
Example
S1

S2

S1
S2

(a) Linearly separable patterns

(b) Not Linearly separable patterns

Note : Perceptron cannot find weights for classification problems that


are not linearly separable.
34

fo
.in
rs
de
ea

SC - Neural Network Single Layer learning

Exclusive OR operation

or

ty

,w

.m

yr

XOR Problem :

ab

X2

Input x2

Output

(0, 1)

ha

kr

Input x1

0
1
0
1

0
1
1
0

0
0
1
1

Even parity
Odd parity

XOR truth table

(0, 0)

(1, 1)

X1

(0, 1)

Fig. Output of XOR in


X1 , x2 plane

Even parity is, even number of 1 bits in the input


Odd parity is, odd number of 1 bits in the input
- There is no way to draw a single straight line so that the circles are on

one side of the line and the dots on the other side.
- Perceptron is unable to find a line separating

patterns from odd parity input patterns.


35

even parity

input

fo
.in
rs
de
ea

SC - Neural Network Single Layer learning

ty

,w

.m

yr

Perceptron Learning Algorithm


The algorithm is illustrated step-by-step.

ab

or

Step 1 :

ha

kr

Create a peceptron with (n+1) input neurons x0 , x1 , . . . . . , . xn ,

where

x0 = 1

is the bias input.

Let O be the output neuron.


Step 2 :

Initialize weight W = (w0 , w1 , . . . . . , . wn ) to random weights.


Step 3 :

Iterate through the input patterns


weight set;

Xj of the training set using the

ie compute the weighted sum of inputs net j =

for each input pattern j .

i=1

xi wi

Step 4 :

Compute the output y j using the step function


1

if net j

where

y j = f (net j) =
0

if net j

< 0

net j

i=1

xi wij

Step 5 :

Compare the computed output


each input pattern

yj

with the target output

yj

for

If all the input patterns have been classified correctly, then output
(read) the weights and exit.
Step 6 :

Otherwise, update the weights as given below :


If the computed outputs yj is 1
Then wi = wi - xi ,

but should have been 0,

i= 0, 1, 2, . . . . , n

If the computed outputs yj is 0 but should have been 1,


Then wi = wi + xi ,

i= 0, 1, 2, . . . . , n

where is the learning parameter and is constant.


Step 7 :

goto step 3
END
36

fo
.in
rs
de
ea

SC - Neural Network ADALINE

or

ty

,w

.m

yr

6.2 ADAptive LINear Element (ADALINE)


An ADALINE consists of a single neuron of the McCulloch-Pitts type,

kr

ab

where

its

weights

are

determined

by

the

normalized

least

mean

ha

square (LMS) training law. The LMS learning rule is also referred to

delta rule.

as

It is a well-established supervised training method that

has been used over a wide range of diverse applications.

Architecture of a simple ADALINE

x1

W1

x2

W2

Output

Neuron

xn

Wn
Error

+
Desired Output

The basic structure of an ADALINE is similar to a neuron with a


linear activation function and a feedback loop. During the training
phase of ADALINE, the input vector

as well as the desired output

are presented to the network.


[The complete training mechanism has been explained in the next slide. ]
37

fo
.in
rs
de
ea

ADALINE Training Mechanism


(Ref.

Fig. in the previous slide - Architecture of a simple ADALINE)

or

ty

,w

.m

yr

SC - Neural Network ADALINE

ha

kr

ab

The basic structure of an ADALINE

is

similar

to

a linear neuron

with an extra feedback loop.


During the training phase of ADALINE,

X = [x1 , x2 , . . . , xn]

the

input

vector

as well as desired output are presented

to the network.
The weights are adaptively adjusted based on delta rule.
After

the ADALINE is trained, an input vector presented to the

network with fixed weights will result in a scalar output.


Thus,

the

network

performs

an

dimensional

mapping

during

training

to

scalar value.
The

Once

activation
the

function

weights

are

is

not

used

properly

adjusted,

the

the

phase.

response

of

the

trained unit can be tested by applying various inputs, which are


not

in

the

responses

training

to

high

set.

If

degree

the
with

that the network could generalize.

network
the

test

produces
inputs,

consistent
it

is

said

The process of training and

generalization are two important attributes of this network.


Usage of ADLINE :

In practice, an ADALINE is used to


- Make binary decisions; the output is sent through a binary threshold.
- Realizations of logic gates such as AND, NOT and OR .
- Realize only those logic functions that
38

are linearly separable.

fo
.in
rs
de
ea

SC - Neural Network Applications

Neural Network Applications can be grouped in following categories:

or

ty

,w

.m

yr

7. Applications of Neural Network

kr

ab

Clustering:

ha

A clustering algorithm explores the similarity between patterns and

places similar patterns in a cluster. Best known applications include


data compression and data mining.

Classification/Pattern recognition:

The

task

of

pattern

(like

handwritten

recognition

symbol)

to

one

is

to

assign

of

many

an

classes.

input
This

pattern
category

includes algorithmic implementations such as associative memory.

Function approximation :

The

tasks

of

function approximation is to find an estimate of the

unknown function subject to noise. Various engineering and scientific


disciplines require

function approximation.

Prediction Systems:

The

task

is

to

forecast

some

future

values

of

time-sequenced

data. Prediction has a significant impact on decision support systems.


Prediction differs from function approximation by considering time factor.
System may be dynamic and may produce different results for the
same input data based on system state (time).
39

fo
.in
rs
de
ea

SC - Neural Network References

1. "Neural

Network, Fuzzy Logic, and Genetic Algorithms - Synthesis and


Applications", by S. Rajasekaran and G.A. Vijayalaksmi Pai, (2005), Prentice Hall,
Chapter 2, page 11-33.

ha

kr

ab

or

ty

,w

.m

yr

8. References : Textbooks

2. "Soft Computing and Intelligent Systems Design - Theory, Tools and Applications",

by Fakhreddine karray and Clarence de Silva (2004), Addison Wesley, chapter 4,


page 223-248.

3. "Neural Networks: A Comprehensive Foundation", by Simon S. Haykin, (1999),


Prentice Hall, Chapter 1-7, page 1-363.

4. "Elements of Artificial Neural Networks", by Kishan Mehrotra, Chilukuri K. Mohan


and Sanjay Ranka, (1996), MIT Press, Chapter 1-5, page 1-214.

5. "Fundamentals of Neural Networks: Architecture, Algorithms and Applications", by


Laurene V. Fausett, (1993), Prentice Hall, Chapter1-4, page 1-214.

6. "Neural Network Design", by Martin T. Hagan, Howard B. Demuth and Mark


Hudson Beale, ( 1996) , PWS Publ. Company, Chapter 1-7, page 1-1 to 7-31.

7. "An Introduction to Neural Networks", by James A. Anderson, (1997), MIT Press,


Chapter 1- 12, page 1-401.

8. Related documents from open source, mainly internet. An exhaustive list is


being prepared for inclusion at a later date.

40

You might also like