60 views

Uploaded by nguyentau

Widrow-HoffLearning-LMS

- FUZZY_PATTERNS_RECOGNITION_AND_NEURAL_NETWORKS_2.pdf
- Ramdom Signals
- 133
- 2017-00 Neural Collaborative Filtering.pdf
- Grid Synchronization of Power Converters Using Multiple Second Order Generalized Integrators
- lec9
- Basic Optimization review
- Wireless Vision based Real time Object Tracking System Using Template Matching
- Percept Ron
- Michael Jachan et al- Inferring direct directed-information flow from multivariate nonlinear time series
- Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training Algorithms For Image Compression Using MLP
- Steepest Descent
- Paper Reasoning
- Currency note Identifier
- 03_backpropagation
- In my opinion.docx
- Shearing
- University Science Instrumentation Centre
- Chaturvedi 2012
- FIR window method: A comparative Analysis

You are on page 1of 22

(LMS Algorithm)

In this chapter we apply the principles of performance

learning to a single-layer linear neural network.

Widrow-Hoff learning is an approximate steepest

descent algorithm, in which the performance index is

mean square error.

the late 1950s, at about the same time that

Frank Rosenblatt developed the

perceptron learning rule.

In

I 1960 Widrow

Wid

and

dH

Hoff

ff iintroduced

t d

d

ADALINE (ADAptive LInear NEuron)

network.

Its learning rule is called LMS (Least Mean

Square) algorithm.

ADALINE is similar to the perceptron,

except that its transfer function is linear,

instead of hard limiting.

2

IUT-Ahmadzadeh

1430/10/28

switching circuits, in 1960 IRE WESCON Convention

Record, Part 4, New York: IRE, pp. 96104.

Widrow, B., and Lehr, M. A., 1990, 30 years of

adaptive neural networks: Perceptron, madaline, and

backpropagation, Proc. IEEE, 78:14151441.

Widrow, B., and Stearns, S. D., 1985, Adaptive Signal

Processing, Englewood Cliffs, NJ: Prentice-Hall.

only solve linearly separable problems.

The LMS algorithm minimizes mean

square error

error, and therefore tries to move

the decision boundaries as far from the

training patterns as possible.

The LMS algorithm found many more

practical uses than the p

p

perceptron

p

((like

most long distance phone lines use

ADALINE network for echo cancellation).

4

IUT-Ahmadzadeh

1430/10/28

ADALINE Network

a = purel in Wp + b = Wp + b

w i 1

iw

iw

w i 2

w i R

Two-Input ADALINE

T

a = 1w p + b = w1 1 p 1 + w1 2 p 2 + b

determined by the input vectors for which the net input n is zero.

IUT-Ahmadzadeh

1430/10/28

The LMS algorithm is an example of supervised training.

Training Set:

{ p 1, t 1} {p 2 , t2} { p Q, t Q}

Input:

Target:

pq

tq

Notation:

x =

1w

a = 1w p + b

z = p

a = x z

2

F x = E e = E t a = E t xT z

The expectation is taken over all sets of input/target pairs.7

Error Analysis

2

F x = E e = E t a = E t xT z

T

F x = E t 2 t x T z + x T z z x

2

F x = E t 2 x T E t z + xT E zz x

This can be written in the following convenient form:

T

F x = c 2 x h + x R x

where

c = E t

h = E tz

R = E zz

IUT-Ahmadzadeh

1430/10/28

between the input vector and its associated

target.

R is the input correlation matrix.

The diagonal elements of this matrix are

equal to the mean square values of the

elements of the input vectors.

The mean square error for the ADALINE Network is a

quadratic function:

T

1 T

F x = c + d x + -- x Ax

2

d = 2 h

A = 2R

Stationary Point

Hessian Matrix:

A = 2R

semidefinite. Really it can be shown that all correlation

matrices are either positive definite or positive

semidefinite. If there are any zero eigenvalues, the

performance index will either have a weak minimum or

else no stationary point (depending on d= -2h),

otherwise there will be a unique global minimum x*

(see Ch8).

T

1 T

Fx = c + d x + - x Ax = d + Ax = 2h + 2Rx

Stationary point:

2h + 2R x = 0

10

IUT-Ahmadzadeh

1430/10/28

definite:

1

x = R h

we could find the minimum point directly from above

equation.

But it is not desirable or convenient to calculate h and

R. So

11

Approximate mean square error (one sample):

x = t k a k 2 = e 2k

F

Expectation of the squared error has been replaced

by the squared error at iteration k.

Approximate (stochastic) gradient:

Fx = e2k

2

e k

e k

e k j = ---------------- = 2 e k ------------ w1 j

w 1 j

j = 1 2 R

2

e k

2

e k

e k R + 1 = ---------------- = 2e k ------------b

b

12

IUT-Ahmadzadeh

1430/10/28

T

e k t k a k

t k 1 w pk + b

------------- = ---------------------------------- =

w1 j

w 1 j

w1 j

e k

------------- =

w 1 j

w 1

t k w 1 i p i k + b

i = 1

Where pi(k) is the ith elements of the input vector at kth iteration.

e k

------------- = p j k

w1 j

e k

-------- = 1

b

F x = e2 k = 2e k z k

13

mean square error by the single error at iteration k as in:

x = tk ak2 = e2k

F

This approximation to F (x) can now be used

in the Steepest descent algorithm.

LMS Algorithm

Al ith

xk + 1 = xk F x

x = xk

14

IUT-Ahmadzadeh

1430/10/28

If we substitute

x k + 1 = x k + 2 e k z k

1w k + 1

= 1w k + 2e k p k

b k + 1 = b k + 2 e k

These last two equations make up the LMS algorithm.

Also called Delta Rule or the Widrow-Hoff learning

algorithm.

15

Multiple-Neuron Case

iw k +

1 = iw k + 2 ei k p k

b i k + 1 = b i k + 2e i k

Matrix Form:

T

W k + 1 = W k + 2e k p k

b k + 1 = b k + 2 e k

16

IUT-Ahmadzadeh

1430/10/28

Analysis of Convergence

Note that xk is a function only of z(k-1), z(k-2), , z(0). If

we assume that successive input vectors are statistically

independent, then xk is independent of z(k).

We will show that for stationary input processes meeting

this condition, so the expected value of the weight vector

will converge to:

*

1

x R h

solution, as we saw before.

17

xk + 1 = xk + 2e k zk

E xk + 1 = E xk + 2E e k z k

Substitute the error with

t (k ) xTk z (k )

T

Ex k + 1 = Ex k + 2E t k z k E xk zk z k

since xTk z (k ) z T (k )x k

T

E xk + 1 = E xk + 2 Etk z k E zkz k xk

18

IUT-Ahmadzadeh

1430/10/28

E xk + 1 = E xk + 2 h RE xk

E xk + 1 = I 2RE xk + 2h

For stability, the eigenvalues of this

matrix must fall inside the unit circle.

eig I 2 R = 1 2 i 1

(where i is an eigenvalue of R)

Since

i 0 ,

19

1 2i 1 .

1 2

1 i

for all i

0 1 m ax

SD we use the Hessian Matrix A, here we use the input

correlation matrix R (Recall that A=2R).

10

20

IUT-Ahmadzadeh

1430/10/28

E xk + 1 = I 2 R E xk + 2 h

If the system is stable,

stable then a steady state condition will be reached.

reached

E xss = I 2 R E xss + 2 h

The solution to this equation is

1

Ex ss = R h = x

Thus the LMS solution, obtained by applying one input at a time, is

the same as the minimum mean square solution of x* R 1h

21

Example

Banana

p

=

t

=

1

1 1

1

p

=

t

=

Apple 2

1 2

1

input correlation matrix is:

1

2

1

2

R = E pp = -- p 1 p 1 + -- p 2 p 2

1

1 0 0

R = --2- 1 1 1 1 + -2- 1 1 1 1 = 0 1 1

1

1 = 1.0

2 = 0.0

3 = 2.0

0 1 1

1

1

------- = ---- = 0.5

max 2.0

. We choose them by trial and error).

11

22

IUT-Ahmadzadeh

1430/10/28

Iteration One

Banana

a0 = W 0p 0 = W0 p1= 0 0 0

1

1= 0

1

W(0) is

selected

arbitrarily.

e 0 = t 0 a0 = t1 a 0= 1 0= 1

W 1 = W0 + 2e 0 pT 0

T

1

W 1 = 0 0 0 + 20.2 1 1 = 0.4 0.4 0.4

1

23

Iteration Two

Apple

1

1 = 0.4

1

e 1 = t1 a1 = t2 a 1= 1 0.4= 1.4

T

1

W 2 = 0.4 0.4 0.4 + 2 0.2 1.4 1 = 0.96 0.16 0.16

1

24

12

IUT-Ahmadzadeh

1430/10/28

Iteration Three

a 2= W2 p 2= W 2 p1= 0.96 0.16 0.16

1

1 = 0.64

1

e 2 = t 2 a 2 = t 1 a2 = 1 0.64= 0.36

T

W = 1 0 0

25

learning process:

Computationally, the learning process

goes through

th

h allll ttraining

i i examples

l ((an

epoch) number of times, until a stopping

criterion is reached.

The convergence process can be

monitored with the plot of the meanmean

squared error function F(W(k)).

26

13

IUT-Ahmadzadeh

1430/10/28

the mean-squared error is sufficiently

small:

ll F(W(k)) <

The rate of change of the mean-squared

error is sufficiently small:

27

Adaptive Filtering

ADALINE is one of the most widely used NNs in practical

applications. One of the major application areas has been

Adaptive Filtering.

Adaptive Filter

Tapped Delay Line

28

14

IUT-Ahmadzadeh

1430/10/28

ak = purelinWp + b =

w1 i yk i + 1 + b

i= 1

lang age wee

recognize this network as a finite impulse response

(FIR) filter.

29

30

15

IUT-Ahmadzadeh

1430/10/28

Two-input filter can attenuate and phase-shift the

noise in the desired way.

31

Correlation Matrix

To Analyze this system we need to find the input

correlation matrix R and the input/target crosscorrelation vector h.

h

R E[zz T ]

z k =

h = E t z

v k

v k 1

t k = s k + m k

2

R=

E v k

E v k v k 1

2

E v k 1v k Ev k 1

h =

16

E s k + m k v k

E s k + m k v k 1

32

IUT-Ahmadzadeh

1430/10/28

and the filtered noise m, to be able to obtain specific

values.

We assume: The EEG signal is a white (Uncorrelated

from one time step to the next) random signal

uniformly distributed between the values -0.2 and +0.2,

the noise source (60 Hz sine wave sampled at 180 Hz) is

given by

2 k

2k

v k = 1.2 sin---------

3

noise attenuated by a factor 1.0 and shifted in phase by

33

-3/4:

m k = 1.2

2 k 3

sin --------- ----- 3

4

2

2k 2

E v k = 1.2 --- sin --------- = 1.2 0.5 = 0.72

3

3

21

k =1

E v k 1 = E v k = 0.72

3

2 k 1

2k

1

E v k v k 1 = --- 1.2 sin ---------1.2 sin-----------------------

3

3

3

k=1

2

2

= 1.2 0.5 cos ------ = 0.36

3

R=

17

0.72 0.36

0.36 0.72

34

IUT-Ahmadzadeh

1430/10/28

Stationary Point

E sk + mk v k = E sk v k + E mk v k

0

1st

independent and zero mean.

1

Em k v k = -3

2k

3

--------- ------ 1.2sin --------- = 0.51

1.2 sin 2k

3

3

4

k =1

E s k + m k v k 1 = Es k v k 1 + E m kv k 1

0

35

1

2k 3

2 k 1

Em k v k 1 = --- 1.2 sin------- ----1.2 sin --------------- = 0.70

3

4

3

k=1

h =

E s k + m k v k

h = 0.51

E s k + m k v k 1

x = R 1 h =

0.72 0.36

0.36 0.72

0.70

0.51

0.70

0.30

0.82

minimum solution?

36

18

IUT-Ahmadzadeh

1430/10/28

Performance Index

T

F x = c 2 x h + x Rx

2

c = E t k = E s k + m k

2

c = Es k + 2E s k mk + E m k

The middle term is zero because s(k) and v(k) are

independent and zero mean.

1

E s k = ------0.4

2

0.2

0.2

0.2

2

1

3

s d s = --------------- s 0.2 = 0.0133

3 0.4

37

1

E m k = --- 1.2 sin 2

------ 3

------ = 0.72

3

3

4

k =1

c = 0.0133

0 0133 + 0.72

0 72 = 0.7333

0 7333

The minimum mean square error is the same as the

mean square value of the EEG signal. This is what

we expected, since the error of this adaptive noise

canceller is in fact the reconstructed EEG Signal.

38

19

IUT-Ahmadzadeh

1430/10/28

W1,2

W1,1

descent.

39

Note that the contours in this figure reflect the fact that

the eigenvalues and the eigenvectors of the Hessian

matrix A=2R are

0.7071

0.7071

, 2 0.75, z 2

0.7071

0.7071

1 2.16, z1

smoother, but the learning proceed more slowly.

Note that max is 2/2.16=0.926 for stability.

40

20

IUT-Ahmadzadeh

1430/10/28

algorithm is approximate steepest descent; it uses an estimate

41

of the gradient, not the true gradient. nnd10eeg

Echo Cancellation

42

21

IUT-Ahmadzadeh

1430/10/28

HW

Ch 4: E 2, 4, 6, 7

Ch 5: 5, 7, 9

Ch 6: 4, 5, 8, 10

Ch 7: 1, 5, 6, 7

Ch 8: 2, 4, 5

Ch 9: 2, 5, 6

Ch 10: 3, 6, 7

43

22

IUT-Ahmadzadeh

- FUZZY_PATTERNS_RECOGNITION_AND_NEURAL_NETWORKS_2.pdfUploaded bySuada Bőw Wéěžý
- Ramdom SignalsUploaded bytaoyrind3075
- 133Uploaded bySadiq Ali
- 2017-00 Neural Collaborative Filtering.pdfUploaded byUVSoft
- Grid Synchronization of Power Converters Using Multiple Second Order Generalized IntegratorsUploaded byJandfor Tansfg Errott
- lec9Uploaded bympssassygirl
- Basic Optimization reviewUploaded by박찬우
- Wireless Vision based Real time Object Tracking System Using Template MatchingUploaded byIDES
- Percept RonUploaded bykadexjus
- Michael Jachan et al- Inferring direct directed-information flow from multivariate nonlinear time seriesUploaded byGretymj
- Comparison Between Levenberg-Marquardt And Scaled Conjugate Gradient Training Algorithms For Image Compression Using MLPUploaded byAI Coordinator - CSC Journals
- Steepest DescentUploaded byseenagregory
- Paper ReasoningUploaded byRithesh Baliga B
- Currency note IdentifierUploaded bySanjaya Srinath
- 03_backpropagationUploaded byvenkatcd
- In my opinion.docxUploaded byRafiqah Rashidi
- ShearingUploaded byRiya Singh
- University Science Instrumentation CentreUploaded byJagadeep Kumar Bhatiya
- Chaturvedi 2012Uploaded byMahendra Kumar
- FIR window method: A comparative AnalysisUploaded byIOSRjournal
- Long Short Term Memory in MLP PairUploaded byTodor Balabanov
- 574805Uploaded byClaudia Rua Perez
- Report Hil PkUploaded byPratyush Sinha
- Tobler 1979 CellularGeogUploaded byJose Northern Lights
- Opencv UserUploaded byapoks88
- Matlab TutorialUploaded byamit_k25
- Digital Signal Processing by oppenheim home worksUploaded bylankyrck
- VHDL_SS09_Teil06Uploaded byKrishnakumar Somanpillai
- 4th YearUploaded byAneil
- Matlab 1Uploaded byVikas Singla

- annotated bibliographyUploaded byapi-404439852
- Magnetic MaterialsUploaded byhugo3434
- Fresher Engineer Resume SampleUploaded byEzhilarasi Periyathambi
- Safalta.com - English Book For Government ExamUploaded bySafalta.com
- PIC LAB II MANUAL.pdfUploaded bythanhvietnguyen
- A Low Cost Wireless Interfacing Device between PS/2 Keyboard and DisplayUploaded byidescitation
- Android Chapter27 MultimediaUploaded byLeTuyen
- Design of an Electromagnetic Imaging System for Weapon Detection Based OnUploaded byNguyễn Thị Nhung
- CNS-220-2I-en-StudentExerciseWorkbook-4-5-days-v05.pdfUploaded byAnonymous KV7I6O0
- ECU Software/Firmware/USB Driver InformationUploaded bypepeladazo
- Chapter_10_Queuing Theory sirUploaded bydisharawat3923
- engineering standards prsentationUploaded byDr_M_Soliman
- 3ds Max - From 3-Lights to Mental RayUploaded byGerardo Lopez Madrid
- 1102 happiness reflectionUploaded byapi-301069800
- final_exam_solutions_1.pdfUploaded byΧρίστος Παπαναστασίου
- SurvivalUploaded byHicham Wadi
- IJESAT_2012_02_01_17Uploaded byIjesat Journal
- HP-ProLiant-DL360 (1)Uploaded byoon
- ASHRAE Journal - September 2007Uploaded bycaliche_362447
- Static Simulation Tools vs. Dynamic Simulation ToolsUploaded byCreateASoft
- Nursing Research ReviewUploaded byɹǝʍdןnos
- Comp 4905 Honors Project - Multi-Screen Online Multiplayer Game for an Android DeviceUploaded bynickapopolis
- License Admin CmdsUploaded byBouazza
- 3 4 LabVIEW and Video StreamingUploaded bynandakishore
- GMM 3 and 2 CHAPTER 4 Missile Guidance and ControlUploaded bySaravanan Atthiappan
- pop up debateUploaded byapi-349380280
- Advances in Antimicrobial Food Packaging With Nanotechnology and Natural AntimicrobialsUploaded byIvy Nazeerah
- DP-8550_SH_EN_0001(E555~E855)Uploaded byNguyễn Tài Tình
- veracross portal updatesUploaded byapi-259630476
- Chapter 01Uploaded bygood2000mo2357