You are on page 1of 79

M2AA3 Orthogonality

Lectured by John Barrett


Lyxed by jm407
December 13, 2008
www.ma.ic.ac.uk/~jwb/teaching
2hr exam in the summer term (4 questions)
2 small assessed projects (involving computation - MatLab or whatever you
want)
exam 6 : project 1
deadline for the 2 assessed projects
1st project - mid/late november
2nd project - rst week of spring term
1
Contents
1 Applied Linear Algebra 3
1.1 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Inner Product: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Outer Product: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.1 Classical Gram-Schmidt Algorithm . . . . . . . . . . . . . . . . . . . 12
1.3 QR Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Cauchy-Schwartz Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.5 Gradients and Hessians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.6 Inner Products Revisited and Positive Denite Matrices . . . . . . . . . . . . 36
1.7 Least Squares Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2 Least Squares Problems 52
3 Orthogonal Polynomials 58
4 Polynomial Interpolation 65
5 Best Approximation in |.|

77
2
1 Applied Linear Algebra
1.1 Orthogonality
When two vectors are perpendicular to each other
a R
n
R
n1
a =
_
_
_
_
_
a
1
a
2
.
.
.
a
n
_
_
_
_
_
n1
n rows, 1 column a
i
R
Transpose of a:
a
T
= (a
1
. . . a
n
)
1n
R
1n
1 row, n columns
Given a, b R
n
1.1.1 Inner Product:
a
T
b = (a
1
. . . a
n
)
_
_
_
b
1
.
.
.
b
n
_
_
_
1 n n 1
. .
11
R
=
n

i=1
a
i
b
i
R
1.1.2 Outer Product:
ab
T
=
_
_
_
a
1
.
.
.
a
n
_
_
_
(b
1
. . . b
n
)
n 1 1 n
. .
nn
=
_
_
_
_
_
_
a
1
b
1
. . . a
1
b
n
a
2
b
1
.
.
.
.
.
.
.
.
.
a
n
b
1
. . . a
n
b
n
_
_
_
_
_
_
nn
Therefore ab
T
R
nn
_
ab
T
_
jk
= a
j
b
k
such that
j = 1 n
k = 1 n
3
Useful for some questions on sheet 1
u R
n
ab
T
n 1 1 n
. .
nn
u
n1
= a
n1
_
b
T
u
_
11
=
_
b
T
u
_
a
is always a multiple of a u, a, b R
n
Note:
let A & B be matrices of dimensions p q, r s respectively
A B = C
with C being a matrix of dimensions p s
Given a, b R
n
, let
a, b) = a
T
b =
n

i=1
a
i
b
i
. . . ) : R
n
R
n
R inner product
a, b) =
n

i=1
a
i
b
i
=
n

i=1
b
i
a
i
= b, a) a, b R
n
symmetric (1)
the order doesnt matter
a, b +c) = a
T
(b +c)
=
n

i=1
a
i
(b
i
+c
i
) (2)
=
n

i=1
a
i
b
i
+
n

i=1
a
i
c
i
= a, b) +a, c)
linear with respect to the 2nd argument a, b, c R
n
and , R
(1) + (2) (3)
4
a +b, c)
(1)
= c, a +b)
(2)
= c, a) +c, b)
(1)
= c, a) +b, c) (3)
linear with respect to the 1st argument
a, a) = a
T
a =
n

i=1
a
2
i
0
Let |a| = [a, a)]
1
2
=
_
n
i=1
a
2
i
_1
2
length or norm of a
|a| 0
= 0
a R
n
if and only if a = 0
Recall - Geometric Vectors in R
3
see diagram 20081009.M2AA3.1
a = a
1
i +a
2
j +a
3
k
b = b
1
i +b
2
j +b
3
k
1.1.3 Dot (Scalar) Product
a b = b a = [a[ [b[ cos
order doesnt matter
a a = [a[
2
therefore
[a[ = (a a)
1
2
_
= 0
cos = 1
_
see diagram 20081009.M2AA3.2
5
i i = j j = k k = 1 (as i, j, k are unit vectors)
i j = j k = i k = 0 (as =

2
)
Easy to show that
a (b +c) = a b +a c (4)
Non Trivial Vectors a & b (a ,= 0, b ,= 0)
a and b are orthogonal (perpendicular) if and only if their dot product = 0
a b = 0 cos = 0 =

2
a b = a(b
1
i +b
2
j +b
3
k)
(4)
= b
1
a i +b
2
a j +b
3
a k
= b
1
a
1
+b
2
a
2
+b
3
a
3
therefore
a b =
3

i=1
a
i
b
i
Given
a = a
1
i +a
2
j +a
3
k a =
_
_
a
1
a
2
a
3
_
_
R
3
a b =
3

i=1
a
i
b
i
= a
T
b a, b)
which is the inner product of a & b
Non-trivial vectors a & b are orthogonal if and only if (the inner product)
a, b) = 0
Denition:
Dot product = Inner product
6
see diagram 20081009.M2AA3.3
a, b) = a
T
b =
n

i=1
a
i
b
i
a, b R
n
Inner product: take two vectors in R
n
and spue out a vector in R
3 rules
1. isometric, order doesnt matter
2. linearity (linear combination of inner product see above)
3. again linearity on the other argument
Length/norm as above
Ex.
a, b R
n
orthogonal |a +b|
2
= |a|
2
+|b|
2
(Generalised Pythagoras)
see diagram 20081014.M2AA3.1
Proof:
|a +b|
2
def
= a +b, a +b)
(2)
= a +b, a) +a +b, b)
(3)
= a, a) +b, a) +a, b) +b, b)
def|.|
= |a|
2
+|b|
2
+ 2 a, b) = 0
hence result.
q
k

n
k=1
, q
k
R
m
, q
k
,= 0 k = 1 n
is ORTHOGONAL if and only if
q
k
, q
j
) = 0 j, k = 1 n j ,= k
7
Kronecker delta notation

jk
=
_
1 if j = k
0 if j ,= k
identity matrix I R
nn
I =
_
_
_
1 0 0
0
.
.
.
0
0 0 1
_
_
_
I
jk
=
jk
j, k = 1 n (5)
q
k

n
k=1
, q
k
R
m
k = 1 n
is ORTHONORMAL if and only if
q
k
, q
j
) =
jk
j, k = 1 n
Denition:
i.e. ORTHONORMAL ORTHOGONAL + each vector has unit length
|q
k
| = [q
k
, q
k
)]
1
2
= 1 k = 1 n
Linearly Independent Vectors
a
k

n
k=1
, a
k
R
m
k = 1 n
a
k

n
k=1
is said to be LINEARLY INDEPENDENT if
n

k=1
c
k
a
k
= 0 = c
k
= 0 k = 1 n
(only choice)
a
k

n
k=1
is said to be LINEARLY DEPENDENT if
c
k

n
k=1
not all zero such that
n

k=1
c
k
a
k
= 0
8
(e.g. if c
i
,= 0 a
i
=

n
k=1
k=i
c
k
c
i
a
k
)
Let A R
mn
that has a
k

n
k=1
as its columns
A
mn
=
n
(a
1
, a
2
, ..., a
n
)
R
m
A
mn
c
n1
= (a
1
, a
2
, ..., a
n
)
_
_
_
c
1
.
.
.
c
n
_
_
_
=
n

k=1
c
k
a
k
R
m
therefore if the only solution to Ac = 0 is c = 0 then a
k

n
k=1
is linearly independent
however if a non-trivial solution, c ,= 0, then a
k

n
k=1
is linearly dependent.
Restrict to the case m = n
A = (a
1
, ..., a
n
) , a
k
R
n
k = 1 n
(a) If A
1
exists, then
Ac = 0 A
1
Ac = A
1
0
Ic = 0
c = 0 a
k

n
k=1
is lin. ind.
(b) If a
k

n
k=1
is lin. ind.
they form a basis for R
n
, i.e. span R
n
b R
n
c
k

n
k=1
such that b =
n

k=1
c
k
a
k
(6)
Is c
k

n
k=1
unique?
Assume the contrary
b =
n

k=1
d
k
a
k
(7)
(6) (7) = 0 =
n

k=1
(c
k
d
k
) a
k
a
k

n
k=1
lin. ind. c
k
d
k
= 0 c
k
= d
k
k = 1 n
therefore the representation of b by a
k

n
k=1
is unique
b =
n

k=1
c
k
a
k
= Ac
9
(linear combination), where
A = (a
1
, ..., a
n
) R
nn
c =
_
_
_
c
1
.
.
.
c
n
_
_
_
therefore b R
n
, !c R
n
(! = unique) such that Ac = b
see diagram 20081014.M2AA3.2
Hence (a) & (b) yield for m = n
A
1
exists
A
nn
=(a
1
,...,a
n
)
a
k

n
k=1
lin. indep.
Therefore given
e
i
=
_
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
0
1
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
i
A
position R
n
unique s
i
R
n
i.e. (e
i
)
j
=
ij
j = 1 n
such that
As
j
= e
i
i = 1 n
therefore
S = A
1
i.e. A
1
exists
Lemma:
a
k

n
k=1
, a
k
R
m
, a
k
,= 0, k = 1 n
and orthogonal
a
j
, a
k
) = 0 j, k = 1 n
j,=k
a
k

n
k=1
linearly independent
n m
(Cant have n > m linearly independent vectors in R
m
- recall exchange lemma)
10
Proof: If
n

k=1
c
k
a
k
= 0

n
k=1
c
k
a
k
, a
j
) = 0, a
j
)
(3)


n
k=1
c
k
a
k
, a
j
)
=0 if k,=j
= 0
c
k
a
k
, a
j
) = 0
a
j
,= 0
|a
j
|
2
= a
j
, a
j
)
,= 0
therefore
c
j
= 0
Repeat for j = 1 n
therefore c
j
= 0 for j = 1 n
therefore a
k

n
k=1
lin. indep.
orthogonality implies linear independence
therefore non-trivial orthogonal vectors are lin. ind.
However, lin. ind. orthogonal
Ex.
n = m = 2
a
1
=
_
2
0
_
a
2
=
_
3
1
_
c
1
a
1
+c
2
a
2
= 0
2c
1
+ 3c
2
= 0
c
2
= 0
_
c
1
= c
2
= 0
therefore a
i

2
i=1
lin. ind.
a
1
, a
2
) = a
T
1
a
2
= 6 ,= 0
11
1.2 Gram-Schmidt
Given
a
i

n
i=1
, a
i
R
m
i = 1 n, lin. ind.
_

n m
_
j
nd
q
i

n
i=1
, q
i
R
m
i = 1 n, ORTHOGONAL
i.e.
q
i
, q
j
) =
ij
i, j = 1 n
with span q
i

n
i=1
= span a
i

n
i=1
1.2.1 Classical Gram-Schmidt (CGS) Algorithm
v
1
= a
1
, q
1
=
v
1
|v
1
|
for k = 2 n
v
k
= a
k

k1

l=1
a
k
, q
l
) q
l
(8)
q
k
=
v
k
|v
k
|
Proof:
q
1
=
v
1
|v
1
|
=
a
1
|a
1
|
a
i

n
i=1
lin. ind. a
i
,= 0 i = 1 n
therefore
|a
1
| ,= 0
|q
1
|
2
= q
1,
q
1
) =
_
a
1
|a
1
|
,
a
1
|a
1
|
_
=
1
|a
1
|
2
a
1
, a
1
) = 1
span a
1
= span q
1

Set
v
2
= a
2
a
2
, q
1
) q
1
(8)
v
2
, q
1
) = a
2
a
2
, q
1
) q
1
, q
1
)
(2)
= a
2
, q
1
) a
2
, q
1
) q
1
, q
1
)
. .
1
= 0
12
see diagram 20081015.M2AA3.1
check
Is v
2
= 0?
If v
2
= 0 = a
2
is a multiple of q
1
(so a
2
a multiple of a
1
which is impossible since they are lin. ind.) contradiction
therefore v
2
,= 0
therefore q
2
=
v
2
|v
2
|
v
2
, q
1
) = 0 q
2
, q
1
) =
_
v
2
|v
2
|
, q
1
_
= 0
Also
q
2
, q
2
) =
_
v
2
|v
2
|
,
v
2
|v
2
|
_
= 1
therefore
q
i

2
i=1
ORTHOGONAL
v
2
is a lin. combination of a
2
and q
1
so v
2
is a lin. combination of a
2
and a
1
so q
2
is a lin. combination of a
2
and a
1
Similarly a
2
is a lin. comb. of q
1
and q
2
Therefore
span q
i

2
i=1
= span a
i

2
i=1
Continue by induction
assume that when weve done up to k 1
q
i

k1
i=1
are all ORTHONORMAL
q
j
= lin. comb. of a
i

j
i=1
j = 1 k 1
a
j
= lin. comb. of q
i

j
i=1
j = 1 k 1
true for k = 2 and 3 from the above
Set
v
1
= a
k

k1

l=1
a
k
, q
l
) q
l
13
= v
k
, q
j
) =
(2)
a
k
, q
j
)
k1

l=1
a
k
, q
l
) q
l
, q
j
) j = 1 l 1
= a
k
, q
j
) a
k
, q
j
) = 0
therefore
v
k
, q
j
) = 0 j = 1 k 1
If v
k
= 0 this would tell us that a
k
is a lin. comb. of q
l

k1
l=1
but the inductive hypothesis
said that the qs can be written like the as, therefore it would tell us that a
k
is a lin. comb.
of a
l

k1
l=1
= contradiction to a
i

n
i=1
lin. ind.
therefore
q
k
=
v
k
|v
k
|
v
k
, q
j
) = 0 j = 1 k 1 q
k
, q
j
) = 0 j = 1 k 1
Also q
k
, q
k
) = 1
therefore
q
k

k
i=1
ORTHONORMAL
v
k
lin. comb. of a
k
and q
l

k1
l=1
v
k
lin. comb. of a
k
and a
l

k
l=1
q
k
lin. comb. of a
k
and a
l

k
l=1
therefore q
j
= lin. comb. of a
i

j
i=1
j = 1 k
Similarly
a
j
= lin. comb. of q
i

j
i=1
j = 1 k
Ex.
n = m = 2
a
1
=
_
3
4
_
a
2
=
_
1
2
_
clearly lin. ind.
rst step:
q
1
=
a
1
|a
1
|
rst we need to work out the length of a
1
|a
1
| = a
1
, a
1
) = a
T
1
a
1
= (3)
2
+ (4)
2
= 25
this implies
|a
1
| = 5
14
so q
1
q
1
=
1
5
_
3
4
_
|q
1
| = 1
v
2
= a
2
a
2
, q
1
) q
1
(9)
rst calculate a
2
, q
1
)
a
2
, q
1
) = a
T
2
q
1
=
1
5
(3 8) = 1
now put that back into (9)
= a
2
+q
1
=
_
1
2
_
+
1
5
_
3
4
_
=
1
5
_
8
6
_
so
|v
2
|
2
= v
2
, v
2
) = v
T
2
v
2
=
_
_
8
5
_
2
+
_
6
5
_
2
_
=
100
25
= 4
|v
2
| = 2 q
2
=
v
2
|v
2
|
=
1
5
_
4
3
_
__
3
4
_
,
_
1
2
__
CGS

_
1
5
_
3
4
_
,
1
5
_
4
3
__
1.3 QR Factorisation
a
i

n
i=1
lin. ind.
CGS
q
i

n
i=1
a
i
R
m
i = 1 n

lin. ind.
nm
Look at this from a dierent viewpoint
Let
A = (a
1
, a
2
, ..., a
n
) R
mn

Q = (q
1
, q
2
, ..., q
n
) R
mn
Let

R R
nn
be the upper trianglar matrix

R =
_
_
_
_
_
r
11
r
12
. . . r
1n
r
22
. . . r
2n
.
.
.
.
.
.
0 r
nn
_
_
_
_
_

R
lk
=
_
r
lk
if l k
0 if l > k
r
lk
will be determined later
15
Let e
(n)
k
R
n
((n) is to stress it is in R
n
as opposed to R
m
)
e
(n)
k
=
_
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
0
1
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
k
th
row
_
e
(n)
k
_
j
=
jk
j, k = 1 n
B
mn
e
(n)
k
n1
= (b
1
, b
2
, ..., b
n
)
_
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
0
1
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
= b
k
R
m
k
th
column of B

Q
mn

R
nn
e
(n)
n1
. .
=

Q
_
_
_
_
_
_
_
_
_
_
_
_
r
1k
r
2k
.
.
.
r
kk
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
=
k

l=1
r
lk
q
l
(10)
CGS
a
1
= v
1
= |v
1
| q
1
where r
11
= |v
1
|
a
k
= v
k
+
k1

l=1
a
k
, q
l
) q
l
= |v
k
| q
k
+
k1

l=1
a
k
, q
l
) q
l
=
k

l=1
r
lk
q
l
where r
kk
= |v
k
|
r
lk
=a
k
,q
l
) l=1k1
therefore
A
mn
e
(n)
k
n1
= a
k
=
k

l=1
r
lk
q
l
(10)
=

Q

Re
(n)
k
16

R is the upper triangular n n matrix with coecients as above.


therefore columns of A and

Q

R are the same


therefore A
mn
=

Q
mn

R
nn

Q has orthonormal columns


whereas

R is a square matrix, upper triangular where its diagonal entries are the lengths of
the v
k
(so r
kk
= |v
k
| > 0 k = 1 n) and since the vs are non-trivial,

R has strictly
positive diagonal entries.
therefore CGS yields a factorisation of A
A
mn
=

Q
mn

R
nn
if m > n A rectangular,

Q rectangular,

R square
with n m REDUCED QR FACTORISATION of A
QR Factorisation of A
A
mn
= Q
mm
R
mn
where
Q =
_

Q
n
q
n+1
. . . q
m
mn
_

m

where
q
j

m
j=n+1
are chosen so that all columns of Q are orthonormal
q
i
, q
j
) =
5

ij
i, j = 1 m
R
mn
=
_

R
0
_
n

mn

QR =
_

Qq
n+1
. . . q
m
_
mm
_

R
0
_
mn
=

Q

R = A
A
mn
= Q
mm
R
mn
Note:
Q
T
mm
Q
mm
=
_

_
q
T
1
.
.
.
q
T
m
_

_
[q
1
. . . q
m
]
17
_
_
Q
TQ
..
mm
_
_
jk
= q
T
j
q
k
= q
j
, q
k
) =
jk
j, k = 1 n
Q
T
Q = I
(m)
R
mm
identity matrix
therefore
Q
T
= Q
1
therefore
Q
T
Q = I
(m)
= QQ
T
therefore the columns of Q orthonormal rows of Q orthonormal (columns of Q
T
orthonor-
mal)
Denition: Q R
mm
is called ORTHOGONAL if Q
T
Q = I
(m)
= QQ
T
(orthonormal
would be a better name however due to historical reasons it is named orthogonal)
Denition: A R
mn
and A = QR where Q R
mm
is orthogonal and R R
mn
is an upper triangular matrix, then we say that we have a QR factorisation of A
Proposition
Orthogonal matrices preserve length and angle
If Q R
mm
and Q
T
Q = I
(m)
, then v, w R
m
Qv, Qw) = v, w) angle ()
and
|Qv| = |v| length ()
Proof:
Qv, Qw) = (Qv)
T
Qw
= v
T
Q
T
Qw = v
T
I
(m)
w
= v
T
w = v, w)
|Qv| = [Qv, Qv)]
1
2
=
by the above
[v, v)]
1
2
= |v|
Geometric vectors
ab = [a[ [b[ cos
18
see diagram 20081009.M2AA3.2
One can show, see section 1.4 Cauchy-Schwarz, that
v, w) = |v| |w| cos
Qv, Qw) = |Qv| |Qw| cos
=
()
|v| |w| cos
But
() cos = cos
= as , [0, ]
Proposition
Q
1
, Q
2
R
mm
, orthogonal i.e. Q
T
1
Q
1
= I
(m)
= Q
T
2
Q
2
then Q
1
Q
2
R
mm
is orthogonal.
Proof:
(Q
1
Q
2
)
T
Q
1
Q
2
= Q
T
2
Q
T
1
Q
1
Q
2
= Q
T
2
I
(m)
Q
2
= Q
T
2
Q
2
= I
(m)
therefore Q
1
Q
2
is orthogonal.
Ex: (rotation matrices) m = 2
Q =
_
cos sin
sin cos
_
= (q
1
, q
2
)
therere
q
i
, q
j
) =
ij
i, j = 1 2
therefore Q is orthogonal as its columns are orthonormal
Q represents rotation through an angle
_
x
y
_
= l
_
cos
sin
_
l =
_
x
2
+y
2
_1
2
_
a
b
_
= Q
_
x
y
_
19
Transform
_
x
y
_

_
l
0
_
choose
=
therefore
cos = cos () = cos =
x
l
sin = sin () = sin =
y
l
therefore
Q =
_
x
l
y
l

y
l
x
l
_
where
l =
_
x
2
+y
2
_1
2
therefore the rotation matrix
Q =
1
(x
2
+y
2
)
1
2
_
x y
y x
_
(orthogonal)
takes
_
x
y
_

_
_
x
2
+y
2
_1
2
0
_
for 0 p < q m introduce G
pq
() R
mm
G
pq
() =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 0
0
.
.
.
1
cos sin p
1
.
.
.
1
sin cos q
1
.
.
.
0
p

q
1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
20
=
_

_
e
(m)
j
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
.
.
.
0
1
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
j
th
position of j ,= p q
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
0
cos p
th
0
.
.
.
0
sin q
th
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
position if j = p
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
0
sin p
th
0
.
.
.
0
cos q
th
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
position if j = q
each column of G
pq
() has unit length, the columns are also orthogonal
therefore the columns of G
pq
() are orthonormal.
G
pq
() R
mm
an orthogonal matrix
a R
m
21
G
pq
() a = b b
j
= a
j
if j ,= p q
b
p
= cos a
p
sin a
q
b
q
= sin a
p
+ cos a
q
Similarly
G
pq
()
mm
A
mn
= B
mn
All rows of B are the same as A except rows p and q
G
pq
() are called GIVENS Rotation Matrices (circa. 1950)
Obtain a QR factorisation of A using a sequence of Givens rotations (an alternative pro-
cedure: Householder reections)
Ex: m = 3, n = 2
A =
_
_
3 65
4 0
12 13
_
_
take a sequence of Givens Rotation so that
A
_
_
X X
0 X
0 0
_
_
= R
Choose G
12
() such that
G
12
() A =
_
_
X X
0 X
12 13
_
_
last row not aected
G
12
() =
_
_
cos sin 0
sin cos 0
0 0 1
_
_
G
12
()
_
_
3 65
4 0
12 13
_
_
x = 3, y = 4, l = 5
1
5
_
3 4
4 3
__
3
4
_
=
_
5
0
_
therefore choose
G
12
() =
_
_
3
4
4
5
0

4
5
3
5
0
0 0 1
_
_
A
(1)
= G
12
() A =
_
_
3
4
4
5
0

4
5
3
5
0
0 0 1
_
_
_
_
3 65
4 0
12 13
_
_
=
_
_
5 39
0 52
12 13
_
_
22
use a rotation matrix to obtain 0 in row 3, column 1
choose either G
13
() or G
23
()?
choose G
13
() as G
23
() will aect row 2, column 1 which would be counter productive
G
13
() =
_
_
cos 0 sin
0 1 0
sin 0 cos
_
_
choose on x = 5 and y = 12 l = 13
G
13
() =
_
_
5
13
0
12
13
0 1 0

12
13
0
5
13
_
_
A
(2)
= G
13
() A
(1)
=
_
_
5
13
0
12
13
0 1 0

12
13
0
5
13
_
_
_
_
5 39
0 52
12 13
_
_
=
_
_
13 27
0 52
0 31
_
_
now use G
13
() or G
23
()? G
13
() would mess up the 0 in 3,1, therefore use G
23
()
x = 52, y = 31 l =

3665
A
(3)
= G
23
() A
(2)
=
_
_
13 27
0

3665
0 0
_
_
= R upper triangular
with strictly positive diagonal entries.
Note:
G
pq
(.) makes the (q, p)
th
element in the current A zero.
Therefore
R = A
(3)
= G
23
() A
(2)
= G
23
() G
13
() A
(1)
= G
23
() G
13
() G
12
()
. .
G
A
G is a product of Givens rotations
each G
pq
(.) is orthogonal
therefore G is orthogonal (a product of orthogonal matrices)
G
T
G = I = GG
T
23
therefore
GA = R
G
T
GA = G
T
R
A = QR
where
Q = G
T
Note:
Q
T
Q =
_
G
T
_
T
G
T
= GG
T
= I
therefore Q orthogonal
therefore it is a QR Factorisation of A
General A R
mn
with m n
Apply a sequence of Givens Rotations to take A to R R
mn
upper triangular with strictly
positive diagonal entries
GA = R
where
G = G
nm
. . . G
nn+1
. .
column n
. . . G
2m
. . . G
23
. .
column 2
G
1m
. . . G
12
. .
column 1
G
pq
makes (q, p)
th
element zero
if y = 0, then G
pq
= I G
pq
R
mm
Let Q = G
T
R
mm
Q is orthogonal
GA = R A = QR (QR factorisation)
We might be interested in solving
A

mn
x
?
n1
= b

m1
(m n)
Apply G R
mm
to Ax = b
GAx = Gb
R
mn
x
n1
= c R
m1
(equiv. system to Ax = b)
see diagram 20081028.M2AA3.1
24
If m > n and if c
i
,= 0 for some i = n + 1 m, there is no solution to Rx = c(there is
no solution to Ax = b)
INCONSISTENT SYSTEM (Return to this later in the course)
Otherwise i.e. c
i
= 0 i = n + 1 m
!x R
n
!=unique
such that Ax = b (Rx = c)
Solve by backward substitution
x
n
=
c
n
r
nn
x
i
=
_
c
i

n
j=i+1
r
ij
x
j
_
r
ii
i = n 1, n 2, . . . , 2, 1
This is all thats needed to do questions on sheet 1.
1.4 Cauchy-Schwartz Inequality
For geometric vectors in R
3
a b = [a[ [b[ cos
see diagram 20081009.M2AA3.2
Generalises to R
n
a, b) = a
T
b = |a| |b| cos
[a, b)[ = |a| |b| [cos [ |a| |b|
Theorem (Cauchy-Schwarz Inequality)
If you take any vector a, b R
n
, then
[a, b)[ |a| |b|
with equality if and only if a and b are linearly dependent.
Proof:
If a = 0, then a, b) = 0 and |a| = 0 so result is trivial
25
If a ,= 0, then q =
a
|a|
q, q) = |q|
2
= 1
let
c = b b, q) q
c, q) = b k, q) q, q)
=
(3)
b, q) b, q) q, q)
. .
1
= 0
see diagram 20081028.M2AA3.2
0 |c|
2
= c, c) = c, b b, q) q)
=
(2)
c, b) b, q) c, q)
= c, b)
=
(3)
b b, q) q, b)
= b, b) b, q) q, b)
= |b|
2
[q, b)]
2
[q, b)]
2
|b|
2
[a, b)]
2
|a|
2
|b|
2
Taking square root desired result
Equality if and only if c = 0
i.e. b a multiple of q
i.e. b a multiple of a
i.e. a and b lin. dep.
1.5 Gradients and Hessians
f : R R, f (x)
one independent variable
see diagram 20081029.M2AA3.1
Taylor Series
f (a +h) = f (a) +hf
t
(a) +
h
2
2
f
tt
(a) +
O
(
h
3
)

R , where [R[ Ch
3
, write O
_
h
3
_
26
We want to generalise this to functions of n independent variables
f : R
n
R f (x
1
, x
2
, . . . x
n
)
Write f (x) where x =
_
_
_
x
1
.
.
.
x
n
_
_
_
R
n
Partial derivative of f with respect to x
i
, write as
f
x
i
(x);
(dierentiate f with respect to x; holding x
1
, . . . x
i1
, x
i+1
, . . . , x
n
as constants)
Ex. n = 2, f (x
1
, x
2
), x =
_
x
1
x
2
_
R
2
f (x) = sin x
1
sin x
2
f
x
1
(x) = cos x
1
sin x
2
f
x
2
(x) = sin x
1
cos x
2

2
f
x
i
x
j
=

x
i
_
f
x
j
_
()
=

x
j
_
f
x
i
_
=

2
f
x
j
x
i
i, j = 1 n
() if both derivatives exist and are continuous
Ex.

2
f
x
2
x
1
(x) =

x
2
_
f
x
1
(x)
_
= cos x
1
cos x
2
|

2
f
x
1
x
2
(x) =

x
1
_
f
x
2
(x)
_
= cos x
1
cos x
2

2
f
x
2
1
=

x
1
_
f
x
1
(x)
_
= sin x
1
sin x
2

2
f
x
2
2
=

x
2
_
f
x
2
(x)
_
= sin x
1
sin x
2
Chain Rule
(n = 1)
f : R R, f (x)
Change variables t = t (x) x = x(t)
e.g.
x(t) = t
2
t (x) = x
1
2
27
Let w(t) = f (x(t))
dw
dt
(t) =
df
dx
(x(t))
dx
dt
(t)
Extend this to
f : R
n
R, f (x) , x =
_

_
x
1
.
.
.
x
n
_

_
R
n
Example:
x(t) =

a +t

h (

= xed)
see diagram 20081029.M2AA3.2
x
i
(t) = a
i
+th
i
i = 1 n
In general
f (x) , x(t)
Let w(t) = f (x(t))
dw
dt
(t) =
f
x
1
(x(t))
dx
1
dt
(t) +. . .
f
x
n
(x(t))
dx
n
dt
(t)
dw
dt
(t) =
n

i=1
f
x
i
(x(t))
dx
i
dt
(t) (11)
Ex. n = 2, f (x
1
, x
2
) , x =
_
x
1
x
2
_
R
2
f (x) = sin x
1
sin x
2
x
1
(t) = t
2
, x
2
(t) = cos t
w(t) = f (x(t)) =
u
sin t
2
v
sin (cos t)
dw
dt
(t) = cos t
2
sin (cos t) 2t
. .
v
u

+ sin t
2
u
cos (cos t) (sin t)
. .
v

f
x
1
(x(t))
dx
1
dt
(t) +
f
x
2
(x(t))
dx
2
dt
(t)
28
going back to example
f : R
n
R, f (x) , x =
_
_
_
x
1
.
.
.
x
n
_
_
_
R
n
x(t) = a +th
see diagram 20081029.M2AA3.3
x
i
(t) = a
i
+th
i

dx
i
dt
(t) = h
i
i = 1 n
Let
w(t) = f (x(t)) = f (a +th) (12)
=
see diagram 20081029.M2AA3.4
Taylor series for w(t)
w(1) = w(0) + 1 w
t
(0) +
1
2
1
2
w
tt
(0) +. . .
= w(0) +w
t
(0) +
1
2
w
tt
(0)
(11)

(12)
f (a +h) = f (a) +
n

9=1
f
x
i
(a) h
i
+. . . (13)
29
(11)
dw
dt
(t) =
n

i=1
h
i

x
i
f (x(t))

d
dt

n

i=1
h
i

x
i

_
d
dt
_
m

_
n

i=1
h
i

x
i
_
m

d
2
w
dt
2
(t) =
_
_
n

j=1
h
j

x
j
_
_
_
n

i=1
h
i

x
i
_
f (x(t))
=
n

j=1
n

i=1
h
j
h
i

2
f
x
j
x
i
(x(t))
w
tt
(0) =
n

i=1
n

j=1
h
i
h
j

2
f
x
j
x
i
(a)
inserting this into (13)
= f (a +h) = f (a) +
n

i=1
h
i
f
x
i
(a) +
1
2
n

i=1
n

j=1
h
i
h
j

2
f
x
j
x
i
(a) +O
_
|h|
3
_
compare this to (11).
We introduce the GRADIENT of f (gard f - vector of rst order partial derivatives)
f (x) R
n
f (x) =
_

_
f
x
1
(x)
.
.
.
f
x
n
(x)
_

_
i.e.
[f (x)]
i
=
f
x
i
(x) i = 1 n
Introduce the HESSIAN of f (matrix of second derivatives)
D
2
f (x) R
nn
_
D
2
f (x)

ij
=

2
f
x
i
x
j
(x) i, j = 1 n
"smooth" f D
2
f (x) is symmetric
30
n = 2
D
2
f (x) =
_
_

2
f
x
2
1
(x)

2
f
x
1
x
2
(x)


2
f
x
2
2
(x)
_
_
A R
nn
A
ij
A
nn
x
n1
R
n
[Ax]
i
=
n

j=1
A
ij
x
j
x
T
1n
A
nn
x
n1
= x
T
(Ax) =
n

i=1
x
i
(Ax)
i
=
n

i=1
n

j=1
x
i
A
ij
x
j
f (a +h) = f (a) +h
T
f (a) +
1
2
h
T
D
2
f (a) h +O
_
|h|
3
_
Ex.
f (x) = x
T
Ax x R
n
where A R
nn
and is symmetric.
f : R
n
R
nd (i) f (x) (the gradient of f) (ii) D
2
f (x) (the hessian of f)
(i)
f (x) = x
T
Ax =
n

i=1
n

j=1
A
ij
x
i
x
j
[f (x)]
p
=
f
x
p
(x) =
n

i=1
n

j=1
A
ij

x
p
(x
i
x
j
)

x
p
(x
i
x
j
) =
x
i
x
p
x
j
+x
i

x
p
x
j
x
1
, . . . , x
n
are independent variables

x
i
x
p
=
ip
i, p = 1 n
31
[f (x)]
p
=
n

i=1
n

j=1
A
ij
(
ip
x
j
+x
i

jp
)
=
n

j=1
A
pj
x
j
+
(
A
T
)
pi
|
n

i=1
A
ip
x
i
= [Ax]
p
+
_
A
T
x

p
f (x) = Ax +A
T
x
= 2Ax if A
T
= A
(ii)
_
D
2
f (x)

qp
=

2
f
x
q
x
p
(x)
we know that
f
x
p
(x) =
n

j=1
A
pj
x
j
+
n

i=1
_
A
T
_
pi
x
i


2
f
x
q
x
p
(x) =
n

j=1
A
pj

jq
+
n

i=1
_
A
T
_
pi

iq
= A
pq
+
_
A
T
_
pq
Note:
= kronecker delta
f
x
= partial derivative
D
2
f (x) = A+A
T
= 2A if A
T
= A
f (x) = x
T
Ax
f : R
n
R
f (x) = 2Ax R
n
D
2
f (x) = 2A R
nn
Analogue of f (x) = ax
2
a R
f : R R f
t
(x) = 2ax f
tt
(x) = 2a
32
Denition:
f : R
n
R
f (x) has a LOCAL MINIMUM at x = a
if u R
n
, |u| = 1,
> 0 such that
f (a +hu)
()
f (a) h [0, ]
see diagram 20081030.M2AA3.1
n = 1, f : R Ru
see diagram 20081030.M2AA3.2
Reminder: Taylor Series
f (a +h) = f (a) +h
T
f (a) +
1
2
h
T
D
2
f (a) h +O
_
|h|
3
_
(14)
Proposition
Let n = 1. Then f
t
(a) = 0 and f
tt
(a) > 0
<0
are sucient conditions for f to have a local
minimum
maximum
at x = a.
Proof: n = 1 u = 1
see diagram 20081104.M2AA3.1
f (a h) = f (a) hf
t
(a) +
1
2
(h)
2
f
tt
(a) +O
_
h
3
_
= f (a) +
1
2
h
2
f
tt
(a)
. .
>0
+O
_
h
3
_
. .
0 for h suciently small
as f
t
(a) = 0

()
f (a) for h suciently small x = a is a local minimum
(maximum)
of f
33
Proposition
If f (a) ,= 0, then f (x) does not have a local minimum or maximum at x = a
Proof: Put h = hu, |u| = 1, in (14)
f (a +hu) = f (a) +hu
T
f (a) +O
_
h
2
_
for h 0
f (a) ,= 0
let
u =
f (a)
|f (a)|
|u| = 1
f
_
a h
f (a)
|f (a)|
_
= f (a) +
h
|f (a)|
|f (a)|
2
+O
_
h
2
_
= f (a) h|f (a)|
. .
>0
+O
_
h
2
_
>
(<)
f (a) for h suciently small, no local min or max
f (a) = 0 is a necessary condition for f (x) to have a local minimum or maximum at
x = a. Points a where f (a) are called stationary points of f (x).
Proposition
If f (a) = 0 and
w
T
D
2
f (a) w >
(<)
0 w R
n
and w ,= 0
then f (x) has a local minimum
maximum
at x = a.
Proof: h = hu, |u| = 1, in (14)
f (a +hu) = f (a) +hu
T
f (a)
. .
=0
as f(a)=0
+
1
2
h
2
u
T
D
2
f (a) u +O
_
h
3
_
. .

()
0 for h su. small if w
2
D
2
f(a)w >0
(<)
w=0

()
f (a) for h su. small local min
max
Ex.
n = 2, x =
_
x
1
x
2
_
R
2
f (x) =
_
x
2
1
2x
1
+ 1
_
+
_
x
2
2
2x
2
+ 1
_
f : R
2
R
34
f (x) =
_
f
x
1
(x)
f
x
2
(x)
_
=
_
2x
1
2
2x
2
2
_
= 2
_
x
1
1
x
2
1
_
R
2
f (a) = 0 a =
_
1
1
_
only one stationary point
Find
D
2
f (x) =
_
_

2
f
x
2
1
(x)

2
f
x
1
x
2
(x)


2
f
x
2
2
(x)
_
_
=
_
2 0
0 2
_
= 2I R
22
w
T
D
2
f (a) w = 2w
T
w = 2 |w|
2
> 0
a =
_
1
1
_
is a local minimum (also a global minimum as its the only stationary point).
[Obvious as f (x) = (x
1
1)
2
+ (x
2
1)
2
]
Denition
A R
nn
is called positive denite if
x
T
Ax > 0 x R
n
, x ,= 0
or negative denite if
x
T
Ax < 0 x R
n
, x ,= 0
or non-negative denite if
x
T
Ax 0 x R
n
or non-positive denite if
x
T
Ax 0 x R
n
Example:
n = 2, A =
_
1 1
1 1
_
, x =
_
x
1
x
2
_
R
2
x
T
Ax =
2

i=1
2

j=1
A
ij
x
i
x
j
= x
2
1
+x
2
2
2x
1
x
2
= (x
1
x
2
)
2
0 x R
2
A is non-negative denite but not positive denite.
e.g. a =
_
1
1
_
a
T
Aa = 0
Using these denitions, we can rewrite the above proposition
35
Proposition
If f (a) = 0 and D
2
f (a) is
_
positive
negative
_
denite, then f (x) has a local
_
minimum
maximum
_
at x = a.
1.6 Inner Products Revisited and Positive Denite Matrices
Let A R
nn
be symmetric
_
A
T
= A
_
and positive denite
_
x
T
Ax > 0 x R
n
, x ,= 0
_
Generalise the idea of an inner product by dening
u, v)
A
= u
T
Av u, v R
n
(perviously u, v) u, v)
I
= u
T
Iv = u
T
v)
Make sure the properties of the inner product still hold with this new denition
., .)
A
: R
n
R
n
R
v, u)
A
= v
T
Au =
_
v
T
Au
_
T
= u
T
A
T
v
A
T
=A
= u
T
Av
= u, v)
A
v, u)
A
= u, v)
A
u, v R
n
(symmetric)
Easy to show that
u, v +w)
A
= u, v)
A
+ u, w)
A
u +v, w)
A
= u, w)
A
+ v, w)
A
_
u, v, w R
n
, , R
Introduce the idea of a generalised norm (length) by dening
|u|
A
= [u, u)
A
]
1
2
u R
n
Note:
u, u)
A
= u
T
Au > 0 if u ,= 0
= 0 if and only if u = 0
|u|
A
0 u R
n
|u|
A
= 0 if and only if u = 0
A key property of positive denite matrices is that they are invertible
i.e.
Ax = 0 x
T
Ax = 0 x = 0
columns of A are linearly independent A
1
exists.
36
Theorem (Generalised Cauchy-Schwarz Inequality)
If A R
nn
is symmetric positive denite, then
[u, v)
A
[ |u|
A
|v|
A
u, v R
n
with equality if and only if u and v are linearly dependent.
Proof: Simply replace ., .) by ., .)
A
and |.| by |.|
A
in the original proof.
It is easy to generate symmetric positive denite matrices.
Given P R
nn
, which is invertible i.e. P
1
exists, then A = P
T
P is symmetric positive
denite.
Check:
A R
nn

A
T
=
_
P
T
P
_
T
= P
T
_
P
T
_
T
= P
T
P = A

any x R
n
x
T
Ax = x
T
P
T
Px = (Px)
T
(Px) = |Px|
2
0
Note
|Px| = 0 Px = 0 x = 0 as P
1
exists.

A is positive denite.
We now prove the reverse implication.
Theorem
Let A R
nn
be any symmetric positive denite matrix.
Then an invertible P R
nn
such that A = P
T
P.
Furthermore we can choose P to be upper triangular with P
ii
> 0 i = 1 n (diagonal
entries stricly positive) in which case we say that A = P
T
P is a Cholesky Decomposi-
tion/Factorisation of A.
Proof:
Let v
n
i=1
be any n linearly independent vectors in R
n
.
Using the inner product ., .)
A
induced by A, a, b)
A
= a
T
Ab, we apply the Classical Gram-
Schmidt process to v
i

n
i=1
.
u
1
=
v
1
|v
1
|
A
|u
1
|
A
= 1
w
i
= v
i

i1

j=1
v
i
, u
j
)
A
u
j
w
i
, w
j
) = 0 j = 1 i 1, i = 2 n
u
i
=
w
i
|w
i
|
A
|u
i
|
A
= 1 i = 2 n
37
u
i
is a linearl combination of v
j

i
j=1
, i = 1 n and
u
i
, u
j
)
A
=
ij
i, j = 1 n
Let
U = [u
1
, u
2
, . . . , u
n
] R
nn
AU = [Au
1
, Au
2
, . . . , Au
n
] R
nn
U
T
AU R
nn
U
T
=
_

_
u
T
1
u
T
2
.
.
.
u
T
n
_

_
_
U
T
AU

ij
= u
T
i

i
th
row of U
T
Au
j

j
th
col. of AU
= u
i
, u
j
)
A
=
ij
i, j = 1 n
U
T
AU = I
(n)
U
1
= U
T
A exists

_
U
1
_
T
=
_
U
T
A
_
T
= A
T
U = AU
Let
P = U
1
R
nn
P
1
= U exists
P
T
P =
_
U
1
_
T
U
1
= AUU
1
= A

To show that we can choose P upper triangular with P


ii
> 0, i = 1 n, we choose
particular v
i

n
i=1
Let v
i
= e
(n)
i
=
_
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
0
1
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
i
th
position R
n
_
e
(n)
i
_
j
=
ij
i, j = 1 n
u
1
is a multiple of e
1
=
_
_
_
_
_

0
.
.
.
0
_
_
_
_
_
u
i
is a linearl combination of
_
e
(n)
j
_
i
j=1
, i = 2 n
u
i
R
n
with (u
i
)
k
= 0 if k > i i = 1 n
U = [u
1
, u
2
, . . . , u
n
] R
nn
upper triangular
38
we now show that (u
i
)
j
> 0 i = 1 n
u
1
=
e
(n)
1
_
_
_e
(n)
1
_
_
_
A
=
_
_
_
_
_

0
.
.
.
0
_
_
_
_
_
strictly positive
u
i
=
w
i
|w
i
|
A
(u
i
)
i
> 0 if and only if (w
i
)
i
> 0 i = 2 n
w
i
= e
(n)
i

i1

j=1
_
e
(n)
i
, u
j
_
A
u
j
(u
j
)
k
= 0 if k > j j = 1 n
(w
i
)
i
=
_
e
(n)
i
_
i
= 1 > 0
U R
nn
is upper triangular with U
ii
= (u
i
)
i
> 0 i = 1 n
Find
P = U
1
= [p
1
, p
2
, . . . , p
n
]
UP = I
(n)
=
_
e
(n)
1
, e
(n)
2
, . . . , e
(n)
n
_
i.e.
[U
p
1
, U
p
2
, U
p
3
, . . . ] =
_
e
(n)
1
, e
(n)
2
, . . .
_
i.e.
U
p
i
= e
(n)
i
i = 1 n (15)
solve by backwards substitution
(p
i
)
n
= (p
i
)
n1
= (p
i
)
i+1
= 0
i.e.
(p
i
)
k
= 0 for k > i i = 1 n
i
th
row of (15)
U
ii
(p
i
)
i
+
n

k=i+1
U
ik
(p
i
)
k
. .
=0 as(p
i
)
k
=0 for k>i
=
_
e
(n)
i
_
i
= 1
(p
i
)
i
=
1
U
ii
> 0 i = 1 n
P is upper triangular with P
ii
= (p
i
)
i
> 0 i = 1 n
Proposition
A R
nn
symmetric positive denite
A
kk
> 0 k = 1 n and
[A
jk
[ < (A
jj
)
1
2
(A
kk
)
1
2
j, k = 1 n j ,= k
39
Proof: From the above theorem A = P
T
P, P R
nn
, P
1
exists
Let
P p
1
, p
2
, . . . , p
n
p
i
R
n
P
1
exists p
i

n
i=1
lin. indep.
A = P
T
P =
_

_
p
T
1
p
T
2
.
.
.
p
T
n
_

_
[p
1
p
2
. . . p
n
]
A
jk
= p
T
j
p
k
j, k = 1 n
A
kk
= p
T
k
p
k
= |p
k
|
2
> 0 k = 1 n as p
k
,= 0
[A
jk
[ =

p
T
j
p
k

= [p
j
, p
k
)[ <
cauchy schwarz inequality
|p
j
| |p
k
| j, k = 1 n j ,= k
it is a strict inequality as p
i

n
i=1
lin. ind.
Using the result |p|
k
= (A
kk
)
1
2
k = 1 n
[A
jk
[ < (A
jj
)
1
2
(A
kk
)
1
2
j, k = 1 n j ,= k
Compute a Cholesky Decomposition of A.
Let L = P
T
. Find a lower triangular matrix L R
nn
with L
ii
> 0 i = 1 n such that
A = LL
T
.
Could compute L = P
T
, where P = U
1
and U = [u
1
. . . u
n
] with
_
e
(n)
i
_
n
i=1
CGS

.,.)
A
u
i

n
i=1
there is, however, an easier way.
Let L = [l
1
l
2
. . . l
n
] l
i
R
n
(lower triangular and L
ii
> 0 i = 1 n)
L =
_
_
_
_
_
_
_
_
_
0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

_
_
_
_
_
_
_
_
_
A = LL
T
A
ij
=
n

k=1
L
ik
_
L
T
_
kj
=
n

k=1
L
ik
L
jk
=
n

k=1
(l
k
)
i
(l
k
)
j
40
Note:
l
k
n1
l
T
k
1n
R
nn

_
l
k
l
T
k
_
ij
= (l
k
)
i
(l
k
)
j
A
ij
=
n

k=1
_
l
k
l
T
k
_
ij

A =
n

k=1
l
k
l
T
k
Example
n = 3
Find the Cholesky Decomposition of
A =
A
12
_
_
2 1 0
1
5
2
1
0 1
5
2
_
_
i.e. Find L R
33
lower triangular L
ii
> 0 i = 1 3 such that A = LL
T
.
Is A symmetric positive denite?
Cearly A
T
= A

A
kk
> 0 k = 1 3
[A
12
[ = [1[ = 1 <

2
_
5
2
=

5 = (A
11
)
1
2
(A
22
)
1
2
etc.
The above are necessary, and not sucient.
Check directly that A is positive denite.
x =
_
_
x
1
x
2
x
3
_
_
,= 0 x
T
Ax =
3

i=1
3

j=1
A
ij
x
i
x
j
= 2x
2
1
+
5
2
x
2
2
+
5
2
x
2
3
2x
1
x
2
2x
2
x
3
. .
+2

x
1
r

x
2
s
+2

x
2
r

x
3
s
41
Note:
(r +s)
2
= r
2
+ 2rs +s
2
0
rs
1
2
_
r
2
+s
2
_
r, s R (16)

applying (16)
2x
2
1
+
5
2
x
2
2
+
5
2
x
2
3

_
x
2
1
+x
2
2
_

_
x
2
2
+x
2
3
_
= x
2
1
+
1
2
x
2
2
+
3
2
x
2
3

1
2
3

i=1
x
2
i
=
1
2
|x|
2
> 0 x R
3
x ,= 0
Recapp from exercise
n = 3
A =
A
12
_
_
2 1 0
1
5
2
1
0 1
5
2
_
_
symmetric, positive denite

L = [l
1
l
2
l
3
] lower triangular
A = l
1
l
T
1
+l
2
l
T
2
+l
3
l
T
3
nd L
l
1
=
_
_

_
_
l
2
=
_
_
0

_
_
l
3
=
_
_
0
0

_
_
A = l
T
1
l
1
+l
2
l
T
2
+l
3
l
T
3
(17)
_
_



_
_
symmetric
+
_
_
0 0 0
0
0
_
_
symmetric
+
_
_
0 0 0
0 0 0
0 0
_
_
rst column/row of A is generated by l,
Equate rst columns of l
1
l
T
1
33
and A
33
_
_
(l
1
)
1
(l
1
)
1
(l
1
)
2
(l
1
)
1
(l
1
)
3
(l
1
)
1
_
_
=
_
_
A
11
A
21
A
31
_
_
(l
1
)
i
=
A
i1
(l
1
)
1
i = 1 3
42
but [(l
1
)
1
]
2
= A
11
(l
1
)
i
=
A
i1

A
11
i = 1 3
in this example
l
1
=
1

2
_
_
2
1
0
_
_
Let
A
(1)
= Al
1
l
T
1
=
_
_
2 1 0
1
5
2
1
0 1
5
2
_
_

1
2
_
_
4 2 0
2 1 0
0 0 0
_
_
A
(1)
=
_
_
0 0 0
0 2 1
0 1
5
2
_
_
= l
2
l
T
2
+l
3
l
T
3
by (17)
2
nd
column/row of A
(1)
is generated by l
2
(l
2
)
i
=
A
(1)
i2
_
A
(1)
22
l
2
=
1

2
_
_
0
2
1
_
_
A
(2)
= A
(1)
l
2
l
T
2
=
_
_
0 0 0
0 2 1
0 1
5
2
_
_

1
2
_
_
0 0 0
0 4 2
0 2 1
_
_
=
_
_
0 0 0
0 0 0
0 0 2
_
_
= l
3
l
T
3
by (17)
l
3
=
1

2
_
_
0
0
2
_
_
L = [l
1
l
2
l
3
] =
1

2
_
_
2 0 0
1 2 0
0 1 2
_
_
lower triangular L
ii
> 0 i = 1 3
Check A = LL
T

43
Now consider the above constructive algorithm in the general case
i.e. A R
nn
symmetric positive denite.
Since A
11
> 0, we can start the algorithm
l
1
=
1

A
11
_
_
_
_
_
A
11
A
21
.
.
.
A
n1
_
_
_
_
_
Let A
(1)
= Al
1
l
T
1
R
nn
(symmetric as A, l
1
l
T
1
are both symmetric)
and has the structure
A
(1)
=
_
_
0 0

0 B
_
_
, where B R
(n1)(n1)
is symmetric
To continue we need
A
(1)
22
= B
11
> 0
We will now prove that B is positive denite B
kk
> 0 k = 1 n 1
To do this, we note that
e
1
=
_
_
_
_
_
1
0
.
.
.
0
_
_
_
_
_
R
n
Ae
1
=
_
_
_
_
_
A
11
A
21
.
.
.
A
n1
_
_
_
_
_
rst column of A
e
T
1
Ae
1
= A
11
l
1
=
1
_
e
T
1
A
Ae
1
Theorem
B R
(n1)(n1)
, as dened above, is positive denite
Proof: We need to show that u
T
Bu > 0 u R
n1
, u ,= 0.
Given u R
n1
, u ,= 0, let v =
_
0
u
_
R
n
so v ,= 0
e
T
1
v = 0, e
1
, v
1
,= 0 e
1
, v lin. ind.
44
u
T
Bu = v
T
A
(1)
v
= v
T
_
Al
1
l
T
1
_
v
= v
T
Av v
T
l
1
l
T
1
v
1 n n 1
. .
1 n n 1
. .
(
l
T
1
v
)
T
= v
T
Av
_
l
T
1
v
_
2
l
1
=
1
_
e
T
1
Ae
1
Ae
1
l
T
1
=
1

e
1
Ae
1
e
T
1
A

A
T
u
T
Bu = v
T
Av
_
e
T
1
Av
_
2
e
T
1
Ae
1
= v, v)
A

[e
1
, v)
A
]
2
e
1
, e
1
)
A
=
|v|
2
A
|e
1
|
2
A
[e
1
, v)
A
]
2
|e
1
|
2
A
Apply Cauchy-Schwarz inequality
[e
1
, v)
A
[ <
e
1
,v lin ind.
|e
1
|
A
|v|
A
u
T
Bu > 0 uR
n1
, u ,= 0
B is positive denite.
and
B = B
T
B
kk
> 0 k = 1 n 1
A
(1)
22
= B
11
> 0 Cholesky Decomposition can continue etc.
Application of the Cholesky Decomposition
Given A R
nn
symmetric positive denite.
If we nd the Cholesky Decomposition of A
i.e. A = LL
T
L R
nn
lower triangular with L
ii
> 0 i = 1 n,
then it is easy to solve A
nn
x
n1
= b
n1
for a given b R
n
.
Ax = b

L
T
?
x
..
z?
=

b
45
L
lower triang.
z = b and L
T
upper triang.
x = z
Solve for z by a forward solve
z
1
=
b
1
L
11
z
k
=
_
b
k

k1
j=1
L
kj
z
j
_
L
kk
k = 2 n
Solve for x by a back solve
x
n
=
z
n
L
T
nn
=
z
n
L
nn
x
k
=
_

_
z
k

n
j=k+1
_
L
T
_
kj
. .
L
jk
x
j
_

_
_
L
T
_
kk
. .
L
kk
k = (n 1) 1
1.7 Least Squares Problems
Give A R
mn
(m n) , b R
m
Find x R
n
such that
Ax = b
_
m equations
n unknowns
_
If m > n, then generally there is no solution x to Ax = b.
So nd an approximate solution in some sense.
Find x

R
n
such that Ax

b is small. Make this precise.


Example
Pendulum
see diagram 20081113.M2AA3.1
length l, period T
_
g
l
T = 2
estimate g (acceleration due to gravity) from the above relationship.
46
Let L =

l and c =
2

Lc =

T
Do m experiments.
L
m1
c
11
= T
m1
L =
_
_
_
L
1
.
.
.
L
m
_
_
_
T =
_
_
_
T
1
.
.
.
T
m
_
_
_
see diagram 20081113.M2AA3.2
Fit a straight line through the origin to the dots
Choose c R to minimise the sum of the squares of the errors
min
cR
m

i=1
(T
i
cL
i
)
2
= min
cR
|TcL|
2
Let
S (c) = |TcL|
2
= TcL, TcL)
= T, T) 2c L, T) +c
2
L, L)
S (c) = |T|
2
2c L, T) +c
2
|L|
2
dS
dc
(c) = 2 L, T) + 2c |L|
2
d
2
S
dc
2
(c) = 2 |L|
2
> 0
dS
dc
(c

) = 0 c

=
L, T)
|L|
2
S (c

) S (c) c R
c

is the global minimum of S (c).


Note:
c

R is such that
L, T) +c

|L|
2
L,L)
= 0
Tc

L, L) = 0
47
see diagram 20081113.M2AA3.3
Generalise to

A
mn
x
n1
=

b
m1
m > n generally no solution x as we have an overdetermined system.
Find x

R
n
such that
|Ax

b| |Ax b| x R
n
i.e.
min
xR
n
|Ax b| min
xR
n
|Ax b|
2
Let Q(x) = |Ax b|
2
Q : R
n
R
minQ(x)
Q1Sheet2
Recall -
Given A R
mn
, b R
m
(m n)
For m > n, in general, no solution x R
n
to Ax = b.
Find approximate solution x

R
n
such that
Q(x

) Q(x) = |Ax b|
2
x R
n
Q : R
n
R
Q(x) = |Ax b|
2
= Ax b, Ax b) = (Ax b)
T
(Ax b)
=
_
x
T
A
T
b
T
_
(Ax b) = x
T
A
T
Ax b
T
Ax x
T
A
T
b +b
T
b
= x
T
A
T
Ax 2b
T
Ax +b
T
b since x
T
A
T
b =
_
x
T
A
T
b
_
T
= b
T
Ax R
= x
T
Gx 2
T
x +|b|
2
where G = A
T
nn
A
mn
R
nn
and = A
T
nm
b
m1
R
n
Note
G
T
=
_
A
T
A
_
T
= A
T
A = G G R
nn
symmetric
= A
T
b
T
= b
T
A
Recall Q1 on Sheet2
Q(x) = 2
_
Gx
_
D
3
Q(x) = 2G
48
Theorem
Let A R
mn
(m n) has n linearly independent columns, and b R
m
.
Then A
T
A R
nn
is symmetric positive denite
Moreever unique x

R
n
such that A
T
Ax

= A
T
b [Normal Equations of Ax = b] and
x

is the global minimum of Q(x) = |Ax b|


2
x

is called the least squares solution of


Ax = b.
Proof: A
T
A is symmetric - see above
A = [a
1
a
2
. . . a
n
] , a
i
R
m
lin. ind.
c
T
A
T
Ac = (Ac)
T
Ac = |Ac|
2
0
and = 0 if and only if Ac = 0 i.e.
n

i=1
c
i
a
i
= 0
a
i

n
i=1
lin. ind. c = 0
c
T
A
T
Ac > 0 c R
n
, c ,= 0
A
T
A R
nn
is symmetric positive denite.

_
A
T
A
_
1
exists
unique x

R
n
solving A
T
Ax

= A
T
b
Now show x

is the global minimum of Q(x)


Q(x) = |Ax b|
2
= x
T
A
T
Ax 2
_
A
T
b
_
T
x +|b|
2
Q(x) = 2
_
A
T
Ax A
T
b
_
D
2
Q(x) = 2A
T
A
For x

R
n
a local minimum of Q(x), we require Q(x

) = 0 and D
2
Q(x

) to be sym-
metric positive denite.
Q(x

) = 0 A
T
Ax

= A
T
b
!x

such that Q(x

) = 0, D
2
Q(x

) = A
T
A s.p.d. (symmetric positive denite) x

global minimum of Q(x).


Ex m = 3, n = 2
A =
_
_
3 65
4 0
12 13
_
_
b =
_
_
1
1
1
_
_
easy to show that no x R
2
solving Ax = b
compute the least squares solution x

R
2
such that
A
T
Ax

= A
T
b
A
T
A =
_
3 4 12
65 0 13
_
A
T
23
_
_
3 65
4 0
12 13
_
_
A
32
=
_
169 351
351 4394
_
A
T
23
b
31
=
_
19
78
_
21
49
_
169 351
351 4394
_
x

=
_
19
78
_
x

=
_
0.090587 . . .
0.010515 . . .
_
Note Ax

b ,= 0
In practice, it is not a good idea to solve the normal equations (A
T
Ax

= A
T
b) since
A
T
A is generally ill-conditioned.
A matrix B R
nn
is ill-conditioned if small changes to the RHS of the system Bx = b
lead to large changes in the solution, unacceptable errors on a computer.
i.e.
Bx = b, B(x +x) = b +b
B ill-conditioned: small b large x
Alternative procedure to nd x

In practice, nd x

using the QR approach.


Take a sequence of Givens rotations
G = G
mn
. . . G
13
G
12
such that
G
mm
A
mn
= R
mn
upper triangular with r
ii
> 0 i = 1 n
G
pq
p < q
G
pq
R
mm
orthogonal
G
T
pq
G
pq
= I
(m)
= G
pq
G
T
pq
take out original system
Ax = b GAx = Gb Rx = Gb R
m
=
_

_
(Gb)
1
.
.
.
(Gb)
n
0
.
.
.
0
_

_
+
_

_
0
.
.
.
0
(Gb)
n+1
.
.
.
(Gb)
m
_

_
= + , R
m

,
_
= 0
50
Recall
R =
_

_
X X
0
X

0
0 0
_

mn

Ax b Rx = +
if = 0 unique solution x R
n
to Rx = = Gb
unique solution x R
n
to the original system Ax = b
if ,= 0 Rx = Gb = + is an inconsistant system
no solution x
no solution x to Ax = b
If ,= 0, solve the consistent system
Rx

=
!x

solving this, solve by a back solve


Claim: x

R
n
is the least squares solution to Ax = b
i.e.
|Ax

b|
2
|Ax b|
2
x R
n
Orthogonal matrices preserve length |Gy| = |y| y R
m
|Ax b|
2
= |G(Ax b)|
2
=
_
_
Rx
_
+
__
_
2
=

(Rx ) , (Rx )
_
|Ax b|
2
= |Rx |
2
+
_
_

_
_
2
2

Rx ,
_
. .
=0
since

Rx,
_
xR
n
=

,
_
= 0
min
xR
n
|Ax b|
2
= min
xR
n
|Rx |
2
+
_
_

_
_
2
Rx

= |Rx

| = 0
x

is such that
_
_

_
_
2
= |Ax

b|
2
|Ax b|
2
x R
n
51
Example
Use QR approach on the example above
m = 3, n = 2
A =
_
_
3 65
4 0
12 13
_
_
b =
_
_
1
1
1
_
_
G = G
23
() G
13
() G
12
() (recall from QR notes)
GA = R =
_
_
13 27
0 (3665)
1
2
0 0
_
_
Gb =
_
_
_
_
19
13
501
13(3665)
1
2
41
(3665)
1
2
_
_
_
_
=
_
_
1.46154 . . .
0.63659 . . .
0.67725 . . .
_
_
=
_
_
1.46154 . . .
0.63659 . . .
0.67725 . . .
_
_
=
_
_
0
0
0.67725
_
_
Solve Rx

= x

=
_
0.090587
0.010515
_
|Ax

b|
2
=
_
_

_
_
2
= (0.67725)
2
2 Least Squares Problems (A more abstract approach)
A more abstract denition of an inner product
Denition:
Let V be a real vector space
An inner product on V V is a function ., .) : V V R such that
(i) u +v, w) = u, w) +v, w)
(ii) u, v) = v, u)
(i)+(ii) w, u +v) = w, u) +w, v)
(iii) u, u) 0 with equality if and only if u = 0 V
An inner product induces a norm
|u| = [u, u)]
1
2
u V
|u| = 0 if and only if u = 0
52
Example
V = C [a, b] continuous functions over the closed interval [a, b]
Let w C [a, b] with w(x) > 0 x [a, b] - w is the weight function
Dene (two continous functions over the interval [a, b]) f, g)=
_
b
a
w(x) f (x) g (x) dx f, g
C [a, b]
Clearly ., .) : V V R and (i), (ii) clearly hold.
(iii)
f, f) =
_
b
a
w(x) [f (x)]
2
dx 0 f C [a, b]
= 0 if and only if f 0 function
Cauchy-Schwarz Inequality
[u, v)[ |u| |v| u, v V
with strict inequality if and only if u, v are linearly independent.
Proof: same as before.
Abstract Form of the Least Squares Problem
Let V be a real vector space with inner product ., .). Let U be a nite dimensional subspace
of V with basis
i

n
i=1
, by basis we mean linearly independent and span the subspace U.
Given v V , nd u

U such that
|v u

| |v u| u U
Example
V = C [a, b] , f, g) =
_
b
a
f (x) g (x) dx
(i.e. w(x)1)
U = P
n1
(polynomials of degree n 1)
Basis
i
= x
i1
i = 1 n
|v u

|
2
|v u|
2
=
_
b
a
[v (x) u(x)]
2
dx
Return to the general case
u U u =
n

i=1

i
for some =
_

1
.
.
.

n
_

_
R
53
u

U u

=
n

i=1

R
n
Finding u

U such that |v u

|
2
|v u|
2
u U
nding

R
n
such that |v

n
i=1

i
|
2
|v

n
i=1

i
|
2
R
n
Let
E () =
_
_
_
_
_
v
n

i=1

i
_
_
_
_
_
2
E : R
n
R
0
Find

R
n
such that E (

) E () R
n
.
E () =
_
v
n

i=1

i
, v
n

j=1

j
_
i, j dummy variable
= v, v)
n

i=1

i
, v)
. .

j=1

j
v,
j
)
. .
the same
+
n

i=1
n

j=1

i
,
j
)
Let R
n
such that
i
= v,
i
) i = 1 n
G R
nn
such that G
ij
=
i
,
j
) i, j = 1 n
E () = |v|
2
2
T
+
T
G
= E () = 2
_
G
_
D
2
E () = 2G
E (

) is a local minimum of E () if
E (

) = 0
and G is positive denite.
E (

) = 0 G

= normal equations
G is called the Gram matrix, it depends on the basis
i

n
n=1
for U.
[Sometimes write G(
1

2
. . .
n
)]
Lemma:
i

n
i=1
basis for U

i

n
i=1
lin. ind.
G is symmetric positive denite
Proof: G R
nn
G
ij
=
i
,
j
) i, j = 1 n
G
ji
=
j
,
i
)
(ii)
=
i
,
j
) = G
ij
i, j = 1 n
54

T
G =
n

i=1
n

j=1
G
ij

i
,
j
)

j
(i),(ii)
=
_
n

i=1

i
,
n

j=1

j
_
=
_
_
_
_
_
n

i=1

i
_
_
_
_
_
2
0
and = 0 if and only if

n
i=1

i
= 0 V = 0 as
i

n
i=1
linearly independent.

T
G > 0 R
n
, ,= 0 = G is symmetric positive denite.
G positive denite G
1
exists
!

R
n
solving G

= normal equations
E (

) = 0
(and is unique - no other stationary points).
D
2
E (

) = 2G symmetric positive denite


R
n
which solves the normal equations is the global minimum of E ()
u

=

n
i=1

i
Recall -
V a real Vector Space, Inner Product ., .)
U a nite dimensional subspace, basis
i

n
i=1
Given v V , nd u

U such that
|v u

| |v u| u U
Then u

=

n
i=1

i
, where

R
n
is the unique solution of
G

= Normal Equations
G R
nn
is the GRAM MATRIX depends on basis for U.
G
ij
=
i
,
j
) i, j = 1 n
symmetric positive denite
R
n

i
= v,
i
) i = 1 n
Theorem (orthogonality property):
_
v u

. .
, u
error
_
= 0 u U
55
see diagram 20081124.M2AA3.1
Proof:
G

=

T
G

=
T
R
n
Implication goes the other way as well

T
G

=
T
R
n
Let
= e
i
=
_
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
0
1
0
.
.
.
1
_
_
_
_
_
_
_
_
_
_
_
_
ith position
=
(G

)
i
=
i
Repeat for i = 1 n = G

=
G

=
T
G

=
T
R
n

i=1
n

j=1
G
ij

i
,
j
)

j
=
n

i=1

i

i

v,
i
)

_
n

i=1

i
. .
uU
,
n

j=1

j
. .
u

_
=
_
v,
n

i=1

i
. .
uU
_
R
n
v u

, u) = 0 u U
Example
1. V = C [0, 1], f, g) =
_
1
0
f (x) g (x) dx
U = P
n1
Basis
_
x
i1
_
n
i=1
i.e.
i
= x
i1
u P
n1
u(x) =

n
i=1

i
x
i1
Given v C [0, 1], nd u

(x) =

n
i=1

i
x
i1
i
such that
|v u

| |v u| u P
n1
56
|v u

|
2
|v u|
2
u P
n1

_
1
0
(v u

)
2
dx
_
1
0
(v u)
2
dx
Find

from solving the normal equations G

i
= v,
i
) =
_
1
0
v (x) x
i1
dx i = 1 n
G
ij
=
i
,
j
) =
_
1
0
x
i1
x
j1
dx =
_
1
0
x
i+j2
dx =
1
i+j1
i, j = 1 n
= G =
_

_
1
1
2
. . .
1
n
1
2
1
3
. . .
1
n+1
.
.
.
.
.
.
.
.
.
.
.
.
1
n
1
n+1
. . .
1
2n1
_

_
the n n Hilbert Matrix
Badly conditioned, columns linear dependence as n .
2. V = R
m
, a, b) = a
T
b a, b R
m
U = span a
i

n
i=1
where n m, linearly independent
i.e.
i
= a
i
R
m
Given v R
m
, nd u

=

n
i=1

i
a
i
such that
|v u

| |v u| u U (18)
Let A = [a
1
a
2
. . . a
n
] R
mn
A
mn

n1
=
n

i=1

i
a
i
(18) |v A

| |v A| R
n
Find

from solving the Normal Equations G

=
R
n

i
= v,
i
) = v, a
i
) = a
T
i
v i = 1 n
G R
nn
G
ij
=
i
,
j
) = a
i
, a
j
) = a
T
i
a
j
i, j = 1 n
A = [a
1
a
2
. . . a
n
] mn
A
T
=
_

_
a
T
1
a
T
2
.
.
.
a
T
n
_

_
n m
A
T
A R
nn
_
A
T
A
_
ij
= a
T
1
a
j
G = A
T
A
57
A
T
nm
v
m1
R
n

_
A
T
v
_
i
= a
T
i
v i = 1 n
= A
T
v
G

= A
T
A

= A
T
v Normal Equations for A = v
Change basis
1.
_
x
i1
_
n
i=1
Gram-Schmidt

n
i=1
orthonormal
G
ij
=
i
,
j
) =
ij
i, j = 1 n = G I

=
where
i
= v,
i
) i = 1 n
2.
_
x
i1
_
n
i=1
Gram-Schmidt

n
i=1
orthogonal
G
ij
=
i
,
j
) = 0 i, j = 1 n
i,=j
and
G
ii
= |
i
|
2
> 0 i = 1 n
G is a diagonal matrix
G

i
=

i
|
i
|
2
i = 1 n
u

=
n

i=1
v,
i
)
|
i
|
2

i
It is very easy to construct this orthogonal basis.
3 Orthogonal Polynomials
V = C [a, b] f, g) =
_
b
a
w(x) f (x) g (x) dx
Weight function w C (a, b) with w(x) > 0 x [a, b]
(w(x) as x a, or x b possibly)
see diagram 20081126.M2AA3.1
58
Require integral to be well-dened
[f, g)[ =

_
b
a
w(x) f (x) g (x) dx

_
b
a
[w(x) f (x) g (x)[ dx
=
_
b
a
w(x) [f (x)[ [g (x)[ dx
_
b
a
w(x) dx max
axb
[[f (x)[ [g (x)[]
. .
< as f,gC[a,b]
require
_
b
a
w(x) dx <
Ex
[a, b] = [0, 1], w(x) = x

> 0
see diagram 20081126.M2AA3.2
w C (0, 1)
_
1
0
x

dx =
_
x
1
1
_
1
0
< if < 1
Note, = 1
_
1
0
x
1
dx = [ln x]
1
0

U = P
n
polynomials of degree n
Canonical basis
_
x
i
_
n
i=0
ill-conditioned Gram matrix
so construct a new basis for P
n

i
(x)
n
i=0
where
j
(x) is a Monic polynomial of degree j, and is also orthogonal
i
,
j
) = 0 i ,= j

j
(x) = x
j
+
j1

i=0
a
ji
x
i
(monic-leading coecient is 1).

0
(x) = 1,
1
(x) = x a
0
, where a
0
R
chosen so that
0
,
1
) = 0.
59
Theorem
Monic orthogonal polynomials,
j
P
j
, satises the three term recurrence relation

j+1
= (x a
j
)
j
(x) b
j

j1
(x) for j 1
where a
j
=
x
j
,
j
)
|
j
|
2
and b
j
=
|
j
|
2
|
j1
|
2
also for j 1.
Proof:

j
(x) P
j
, monic

j+1
(x) x
j
(x) P
j

j+1
(x) x
j
(x) =
j

k=0
c
k

k
(x)
Find c
k
, k = 0 n
_
n

k=0
c
k

(x)
k
,
i
(x)
_
=
j+1
(x) x
j
(x) ,
i
(x)) i = 0 j
c
i
|
i
(x)|
2
= x
j
(x) ,
i
(x)) i = 0 j (19)
since
j
orthogonal
x
j
(x) ,
i
(x)) =
_
b
a
w(x) x
j
(x)
i
(x) dx
=
_

j
(x) , x
i
(x)
. .
P
i+1
_

j
(x) is orthogonal to
k

j1
k=0

j
(x) is orthogonal to P
j1
degree j 1
if i + 1 j 1, i.e. i j 2
then x
j
(x) ,
i
(x)) = 0
(19) c
i
= 0 if i j 2

j+1
(x) x
j
(x) = c
j1

j1
(x) +c
j

j
(x)
all other coecients are 0, where
c
j1
=
x
j
(x) ,
j1
(x))
|
j1
(x)|
2
c
j
=
x
j
(x) ,
j
(x))
|
j
(x)|
2
=
x
j
,
j
)
|
j
|
2

j
, x
j1
) =
_

j
,
P
j1
..
x
j1

j
_
. .
=0
+
j
,
j
)
60
c
j1
=
|
j
|
2
|
j1
|
2
Let a
j
= c
j
=
x
j
,
j
)
|
j
|
2
Let b
j
= c
j1
=
|
j
|
2
|
j1
|
2

j+1
(x) x
j
(x) = a
j

j
(x) b
j

j1
(x)

j+1
(x) = (x a
j
)
j
(x) b
j

j1
(x) j 1
where
a
j
=
x
j
,
j
)
|
j
|
2
b
j
=
|
j
|
2
|
j1
|
2
_
_
_
j 1

(20)

0
(x) = 1

1
(x) = x a
0
, where a
0
R
such that

1
,
0
) = 0
i.e.
x a
0
1, 1) = 0
a
0
1, 1) = x, 1)
a
0
=
x, 1)
|1|
2
=
x
0
,
0
)
|
0
|
2
extend (20) to j 0

j+1
(x) = (x a
j
)
j
(x) b
j

j1
(x) j 0
with
0
(x) = 1,
1
(x) = 0
a
j
=
x
j
,
j
)
|
j
|
2
j 0 b
j
=
||
2
|
j1
|
2
j 1 (21)
Recall -
g (x) is even g(x) = g (x) x
_
2
2
g (x) dx = 2
_
2
0
g (x) dx
61
see diagram 20081127.M2AA3.1
g (x) is odd if g (x) = g (x) x
_
2
2
g (x) dx = 0
see diagram 20081127.M2AA3.2
Ex
f, g) =
_
1
1
f (x) g (x) dx
i.e. [a, b] = [1, 1], w(x) = 1 x [1, 1]
Find the monic orthogonal polynomials with respect to this inner product.
Apply (21)

0
(x) = 1
1
(x) = x a
0
a
0
=
x
0
,
0
)
|
0
|
2
=
_
1
1
xdx
|
0
|
2
= 0 as x is odd

1
(x) = x

2
(x) = (x a
1
)
1
(x) b
1

0
(x)
= (x a
1
) x b
1
a
1
=
x
1
,
1
)
|
1
|
2
=
_
1
1
x
3
dx
|
1
|
2
= 0
b
1
=
|
1
|
2
|
0
|
2
=
_
1
1
x
2
dx
_
1
1
1
2
dx
=
2
_
1
0
x
2
dx
2
=
1
3

2
(x) = x
2

1
3
etc.

3
(x) = x
3

3
5
x
Summary
V = C [a, b],
f, g) =
_
b
a
w(x) f (x) g (x) dx f, g C [a, b]
62
with constraint that w C (a, b), w(x) > 0 x [a, b] and it is integrable
_
b
a
w(x) dx <
Given f C [a, b], we are looking to approximate this by a polynomial of degree n, nd
p

n
(x) P
n
such that the associated norm with this product is minimal, |f p

n
|
|f p
n
| p
n
P
n
Orthogonal basis
j
(x)
n
j=0
for P
n
p

n
(x) =

n
i=0
f,
i
)
|
i
|
2

i
(x)
p

n
P
n
is the best approximation to f from P
n
, in that norm |.|.
Ex
Show that polynomials T
k
(x) = cos
_
k cos
1
x
_
for x [1, 1] are orthogonal with respect
to the inner product
f, g) =
_
1
1
_
1 x
2
_

1
2
. .
w(x)
f (x) g (x) dx
see diagram 20081127.M2AA3.3
T
k
(x) polynomial?
T
0
(x) = 1
T
1
(x) = x
Introduce change of variable
= cos
1
x x = cos
see diagram 20081127.M2AA3.4
x [1, 1] [0, ]
T
k
(x) = cos k
Recall the trigonometric identity
cos (k + 1) + cos (k 1) = 2 cos k cos
T
k+1
(x) = 2xT
k
(x) T
k1
(x) k 1
63
T
2
(x) = 2xT
1
(x) T
0
(x)
= 2x
2
1 P
2
T
3
(x) = 2xT
2
(x) T
1
(x)
= 2x
_
2x
2
1
_
x
= 4x
3
3x
by induction
T
k
(x) = 2
k1
x
k
+ P
k
not monic.
Show T
k
(x)
k1
is orthogonal with respect to
f, g) =
_
1
1
_
1 x
2
_

1
2
. .
w(x)
f (x) g (x) dx
see diagram 20081202.M2AA3.1
_
1
1
_
1 x
2
_

1
2
T
k
(x) T
j
(x) dx
x = cos
dx
d
= sin
_
0

(sin )
1
cos k cos j (sin ) d
=
_

0
cos k cos jd
=
1
2
_

0
[cos [(k +j) ] + cos [(k j) ]] d
=
1
2
_
sin (k +j)
k +j
+
sin (k j)
k j
_

0
not valid if k = j k = j = 0
=
_

_
0 if k ,= j

2
if k = j ,= 0
if k = j = 0
T
k
(x)
k0
orthogonal, not orthonormal. These polynomials are called Chebyshev Poly-
nomials.
64
4 Polynomial Interpolation
Abandon best approximation, and consider the more practical approach of polynomial inter-
polation.
Given (z
j
, f
j
)
n
j=0
with z
j
, f
j
C as j = 0 n, nd p
n
(z) P
n
such that p
n
(z
j
) = f
j
with j = 0 n.
Ex. z
j
, f
j
R j = 0 n
see diagram 20081202.M2AA3.2
p
n
is called the interpolating polynomial for this data.
Natural Questions
1. Does p
n
exist?
2. Is p
n
unique?
3. What is the construction of p
n
?
1. Prove the existence by a construction proof. Clearly z
j

n
j=0
should be distinct.
Lemma
Given (z
j
, f
j
)
n
j=0
with z
j
, f
j
C and j = 0 n, z
j
distinct.
Let
l
j
(z) =
n

k=0
k=j
(z z
k
)
(z
j
z
k
)
j = 0 n
Then l
j
(z) P
n
j = 0 n and l
j
(z
r
) =
jr
j, r = 0 n.
Proof
l
j
(z) is a product of n factors of the form
zz
k
z
j
z
k
k ,= j l
j
(z) P
n
l
j
(z
r
) =
n

k=0
k=j
z
r
z
k
z
j
z
k
If r = j l
j
(z
j
) = 1
If r ,= j one factor = 0 when k = r
l
j
(z
r
) = 0
65
Example
z
j
R j = 0 n
see diagram 20081202.M2AA3.3
l
j
(z)
n
j=0
are the lagrange basis functions.
Lemma
The interpolating polynomials p
n
(z) P
n
for the data (z
j
, f
j
)
n
j=0
, z
j
, f
j
C j = 0 n,
z
j
distinct is such that
p
n
(z) =
n

j=0
f
j
l
j
(z)
Proof
l
j
(z) P
n
j = 0 n
p
n
(x) =
n

j=0
f
j
l
j
(z) P
n
p
n
(z
r
) (polynomial with data point z
r
) you want to guarantee it spews out f
r
.
p
n
(z
r
) =
n

j=0
f
j
l
j
(z
r
) =
n

j=0
f
j

jr
= f
r
r = 0 n
p
n
(z) interpolates the data (z
j
, f
j
)
n
j=0
2. Is p
n
unique?
Theorem (Fundamental Theorem of Algebra)
Let
p
n
(z) = a
0
+a
1
z = a
2
z
2
+ +a
n
z
n
a
i
C, i = 0 n
Then p
n
(z) has at most n distinct roots (zeros) in C, unless a
i
= 0, i = 0 n p
n
(z) 0.
Recall
Given (z
j
, f
j
)
n
j=0
, z
j
, f
j
C, z
j
distinct; nd the interpolating polynomial p
n
P
n
such
that
p
n
(z
j
) = f
j
j = 0 n
66
Lagrange Construction
p
n
(z) =
n

j=0
f
j
l
j
(z) where l
j
(z)
P
n
=
n

k=0
k=j
(z z
k
)
(z
j
z
k
)
j = 0 n
l
j
(z
r
) =
jr
j, r = 0 n
Is the interpolating polynomial unique?
Assume the contrary
p
n
, q
n
P
n
such that
p
n
(z
j
) = q
n
(z
j
) = f
j
j = 0 n
to get a contradiction, we will use the fundamental theorem of algebra
p
n
q
n
P
n
and
(p
n
q
n
) (z
j
) = 0 j = 0 n
p
n
q
n
P
n
has (n + 1) distinct roots (zeros) as z
j
are distinct
F.T.A. p
n
q
n
= 0
p
n
= q
n
uniqueness
Example
Find p
2
P
2
such that p
2
(0)
z
0
= a
f
0
, p
2
(1)
z
1
= b
f
1
, p
2
(4)
z
2
= c
f
2
n = 2
p
2
(z) =
2

j=0
f
j
l
j
(z)
l
0
(z) =
(z z
1
) (z z
2
)
(z
0
z
1
) (z
0
z
2
)
=
(z 1) (z 4)
(1) (4)
=
1
4
_
z
2
5z + 4
_
l
1
(z) = =
1
3
_
z
2
4z
_
l
2
(z) = =
1
12
_
z
2
z
_
p
2
(z) = al
0
(z) +bl
1
(z) +cl
2
(z) lagrange form
=
_
a
4
+
b
3
+
c
12
_
z
2

_
5a
4

4b
3
+
c
12
_
z +a canonical form
One could nd the coecients in the canonical form directly by using p
n
(z) =

n
k=0
a
k
z
k
.
We know that
p
n
(z
j
) =
n

k=0
a
k
z
k
j
= f
j
, j = 0 n,
67

_
_
_
_
_
1 z
0
. . . z
n
0
1 z
1
. . . z
n
1
.
.
.
.
.
.
.
.
.
1 z
n
. . . z
n
n
_
_
_
_
_
_
_
_
_
_
a
0
a
1
.
.
.
a
n
_
_
_
_
_
=
_
_
_
_
_
f
0
f
1
.
.
.
f
n
_
_
_
_
_
,
V a

C
(n+1)(n+1)
= f , a, f C
n+1
,
V
jk
= z
k
j
, j, k = 0 n,
V is called Vandermonde matrix (Q4, Sheet5). In general V is ill-conditioned (as z
j
gets
close to z
i
, columns i and j become linearly independent (this is why it is ill-conditioned)).
Canonical Basis p
n
(z) =
n

k=0
a
k
z
k
,
_
z
k
_
n
k=0
V a = f ,
You should certainly not use the canonical basis, it looks as if we should use the Lagrange
basis, however there is a aw in this basis as we will see later, even though it is far better.
Lagrange Basis p
n
(z) =
n

k=0
f
k
l
k
(z) ,
l
k
(z)
n
k=0
If = f ,
The Lagrange basis if far better than the canonical basis. However, this basis has to be
constructed. Assume we have found p
n1
P
n1
, interpolating (z
j
, f
j
)
n1
j=0
and one is
then given a new data point (z
n
, f
n
) . One cannot use p
n1
P
n1
to nd p
n
P
n
. One
has to compute new Lagrange basis functions P
n
.
We now look for an alternative construction. If p
n1
P
n1
such that p
n1
(z) =
f
j
, j = 0 n 1, now nd p
n
P
n
such that p
n
(z
j
) = f
j
, j = 0 n. Let
p
n
(z) = p
n1
(z) +C
n1

k=0
(z z
k
)
. .
P
n
vanishes at z
j
, ,j=0n1
p
n
(z
j
) = p
n1
(z
j
) = f
j
, j = 0 n 1.
Then choose C C such that
p
n
(z
n
) = p
n1
(z
n
) +C
n1

k=0
(z
n
z
k
) = f
n
,
z
j

n
j=0
distinct C =
f
n
p
n1
(z
n
)

n1
k=0
(z
n
z
k
)
,
68
C depends on all data points (z
j
, f
j
)
n
j=0
.
Classical Notation C = f [z
0
, z
1
, . . . , z
n
] ,
This is called a divided dierence of order n (depends on (n + 1) points).
p
n
(z) = p
n1
(z) +f [z
0
, z
1
, . . . , z
n
]
n1

k=0
(z z
k
) ,
so the coecient of z
n
in p
n
(z) is f [z
0
, z
1
, . . . , z
n
].
Note that p
n
is unique and p
n
(z
j
) = f
j
, j = 0 n,
f [z

0
, z

1
, . . . z

n
] = f [z
0
, z
1
, . . . , z
n
] ,
for any permutation of the points z
0
, z
1
, . . . , z
n
.
Lemma
If (z
j
, f
j
)
n
j=0
, z
j
, f
j
C, z
j
distinct, then
f [z
0
, z
1
, z
2
, . . . , z
n
] =
n

j=0
f
j

n
k=0
k=j
(z
j
z
k
)
.
Furthermore, if f
j
= f (z
j
) , j = 0 n for some function f (z), then f [z
0
, z
1
, . . . z
n
] = 0
if f P
n1
.
Proof
Compare coecient of z
n
in the Lagrange form of p
n
(z) with
p
n
(z) = p
n1
(z) +f [z
0
, z
1
, . . . , z
n
]
n1

k=0
(z z
k
) . (22)
Coecient of z
n
in (22) is f [z
0
, z
1
, . . . , z
n
].
Recall, Lagrange form
p
n
(z) =
n

j=0
f
j
l
j
(z) =
n

j=0
f
j
n

k=0
k=j
(z z
k
)
(z
j
z
k
)
, (23)
coecient of z
n
in (23),
n

j=0
f
j
n

k=0
k=j
1
(z
j
z
k
)
,
69
hence the result.
If f
j
= f (z
j
), j = 0 n, when f P
n1
, then the uniqueness of the interpolating
polynomial,
p
n
(z) = f (z) P
n1
.
see diagram 20081204.M2AA3.1
The coecient of z
n
in p
n
(z) is f [z
0
, z
1
, . . . , z
n
]. But p
n
P
n1
in this case,
f [z
0
, z
1
, . . . , z
n
] = 0.
Note that,
p
n
(z)

interpolates
(z
j
,f
j
)
n
j=0
= p
n1
(z)

interpolates
(z
j
,f
j
)
n1
j=0
+f [z
0
, z
1
, . . . , z
n
]
n1

k=0
(z z
k
) ,
p
n1
(z) = p
n2
(z)

(z
j
,f
j
)
n2
j=0
+f [z
0
, z
1
, . . . , z
n1
]
n2

k=0
(z z
k
) ,
.
.
.
p
1
(z)

(z
j
,f
j
)
1
j=0
= p
0
(z)

(z
0
,f
0
)
f[z
0
]=f
0
+f [z
0
, z
1
] (z z
0
) ,
p
n
(z) = f [z
0
]

f
0
+
n

j=1
f [z
0
, . . . , z
j
]
j1

k=0
(z z
k
) .
This is the Newton Form of the Interpolating Polynomial.
Note that,
f [z
0
, z
1
, . . . , z
j
] is the coecient of z
j
in p
j
(z) ,
where p
j
P
j
and p
j
(z
k
) = f
k
, k = 0 j.
Theorem
For any distinct z
0
, z
1
, . . . , z
n+1
C, the divided dierence based on all the points,
f[z
0
, z
1
, . . . , z
n+1
]
n+2 points
=
f
n+1 points
[z
0
, z
1
, . . . , z
n
] f
n+1 points
[z
1
, z
2
, . . . , z
n+1
]
z
0
z
n+1
70
Proof
Given (z
j
, f
j
)
n+1
j=0
, we construct p
n
, q
n
P
n
such that,
p
n
(z
j
) = f
j
, j = 0 n coecient of z
n
in p
n
(z) is f [z
0
, z
1
, . . . , z
n
] ,
q
n
(z
j
) = f
j
, j = 1 n + 1 coecient of z
n
in q
n
(z) is f [z
1
, z
2
, . . . , z
n+1
]
Let
r
n+1
(z) =
(z z
n+1
) p
n
(z) (z z
0
) q
n
(z)
z
0
z
n+1
P
n+1
, (24)
r
n+1
(z
0
) = p
n
(z
0
) = f
0
r
n+1
(z
j
) =
(z
j
z
n+1
) f
j
(z
j
z
0
) f
j
z
0
z
n+1
, j = 1 n,
= f
j
r
n+1
(z
n+1
) = q
n
(z
n+1
) = f
n+1
,
r
n+1
(z) P
n+1
is such that,
r
n+1
(z
j
) = f
j
, j = 0 n.
Compare the coecient of z
n+1
in (24),
f [z
0
, z
1
, . . . , z
n+1
]
n+2
=
f
n+1
[z
0
, z
1
, . . . , z
n
] f
n+1
[z
1
, . . . , z
n+1
]
z
0
z
n+1
,
hence result. This is the divided dierence recurrence relation.
Divided Dierence Tableau
z
0
f [z
0
] = f
0
z
1
f [z
1
] = f
1
_
f [z
0
, z
1
] =
f [z
0
] f [z
1
]
z
0
z
1
z
2
f [z
2
]
.
.
.
= f
2
_
f [z
1
, z
2
] =
f [z
1
] f [z
2
]
z
1
z
2
_
f[z
0
, z
1
, z
2
]
.
.
.
=
f [z
0
, z
1
] f [z
1
, z
2
]
z
0
z
2
z
n
f [z
n
] = f
n
_
f [z
n1
, z
n
] =
f [z
n1
] f [z
n
]
z
n1
z
n
_
f [z
n2
, z
n1
, z
n
] etc.
Diagonal entries in this tableau appear in the Newton form of p
n
(z).
Example
n = 2
z
0
= 0, z
1
= 1, z
2
= 4
f
0
= 0, f
1
= b, f
2
= c
71
z
0
= 0 f [z
0
] = a
z
1
= 1 f [z
1
] = b
_
f [z
0
, z
1
] =
a b
1
= b a
z
2
= 0 f [z
2
] = c
_
f [z
1
, z
2
] =
b c
3
=
c b
3
_
f [z
0
, z
1
, z
2
] =
(b a)
_
cb
3
_
4
=
_
a
4

b
3
+
c
12
_
so
p
2
(z) = f [z
0
] +f [z
0
, z
1
] (z z
0
) +f [z
0
, z
1
, z
2
] (z z
0
) (z z
1
)
= a + (b a) z +
_
a
4

b
3
+
c
12
_
z (z 1) .
We may be interested in approximating a function f (z) that is complicated to evaluate, by a
polynomial p
n
(z) P
n
. Evaluate f (z) at z
j

n
j=0
, distinct points, and form the interpolating
polynomial p
n
(z), p
n
(z
j
) = f (z
j
) j = 0 n. Then approximate f (z), by p
n
(z)
see diagram 20081209.M2AA3.1
Theorem
Let p
n
(z) interpolate f (z) at n + 1 distinct points z
j

n
j=0
, z
j
C, then the error: e (z) =
f (z) p
n
(z) is such that
e (z) = f [z
0
, z
1
, . . . , z
n
, z]
n

k=0
(z z
k
) z ,= z
j
j=0n
,
(Note that e (z
j
) = 0, j = 0 n).
Proof
p
n
(z) interpolates f (z) at z
j

n
j=0
. We now add a new point which is dierent from the
points we already have, z
n+1
,= z
j
, j = 0 n.. This implies that the new polynomial is
p
n+1
(z) P
n+1
, interpolates f (z) at z
j

n+1
j=0
.
Newton form of p
n+1
(z) is
p
n+1
(z) = p
n
(z) +f [z
0
, z
1
, . . . , z
n
, z
n+1
]
n

k=0
(z z
k
)
f (z
n+1
) = p
n+1
(z
n+1
) +f [z
0
, z
1
, . . . , z
n
, z
n+1
]
n

k=0
(z
n+1
z
k
)
e (z
n+1
) = f [z
0
, z
1
, . . . z
n
, z
n+1
]
n

k=0
(z
n+1
z
k
) ,
72
but z
n+1
is any point z
n+1
,= z
j
j = 0 n
z
n+1
= z ,= z
j
j = 0 n
e (z) = f [z
0
, z
1
, . . . , z
n
, z]
n

k=0
(z z
k
)

For the above result to be useful, we need to bound
f [z
0
, z
1
, . . . , z
n
, z] .
We restrict ourselves from now on to the real case,
z
j
= x
j
R, j = 0 n, distinct
f (z) = f (x) a real function
f [x
j
] = f (x
j
) j = 0 n
zero order divided dierence based on one point.
First order divided dierence, is based on 2 points
e.g.
f [x
0
, x
1
] =
f [x
0
] f [x
1
]
x
0
x
1
=
f (x
0
) f (x
1
)
x
0
x
1
.
Mean Value Theorem
f (x
1
) = f (x
0
) + (x
1
x
0
)
distance moved
f
t
() where lies between x
0
and x
1
,
this assumes that f C
1
[x
0
, x
1
] x
0
< x
2
(C
1
[x
1
, x
0
] if x
1
< x
0
)
f [x
0
, x
1
] = f
t
()
1st order divided dierence.
Recall
e (z) = f (z) p
n
(z) = f [z
0
, z
1
, . . . , z
n
, z]
n

k=0
(z z
k
) z ,= z
j
j = 0 n,
(e (z
j
) = 0 j = 0 n)
73
Theorem
Let f C
n
[x
0
, x
n
], i.e. f and its rst n derivaties are continuous on [x
0
, x
n
] , where
for ease of exposition we have assumed that the real interpolations points are ordered,
x
0
< x
1
< < x
n
.
Then [x
0
, x
n
] such that
f [x
0
, x
1
, . . . , x
n
] =
1
n!
f
(n)
() ,
n + 1 points nth order divided dierence.
Proof
Let p
n
P
n
interpolate f (x) at x
i
with i = 0 n, let
e (x) = f (x) p
n
(x) e (x
i
) = 0 i = 0 n,
e (x) has at least (n + 1) zeros in [x
0
, x
n
].
see diagram 20081210.M2AA3.1
Rolles Theorem
e
t
(x) has at least n zeroes in [x
0
, x
n
],
e
tt
(x) has at least (n 1) zeroes in [x
0
, x
n
],
.
.
.
e
(n)
(x) has at least 1 zero in [x
0
, x
n
]
Let [x
0
, x
n
] be such that
e
(n)
() = 0
e
(n)
(x) = f
(n)
(x) p
(n)
n
(x) ,
Recall Newton form of p
n
(x)
p
n
(x) = f [x
0
, x
1
, . . . , x
n
] x
n
+. . .
p
(n)
n
(x) = n!f [x
0
, x
1
, . . . , x
n
] R,
f
(n)
() = p
(n)
n
() = n!f [x
0
, x
1
, . . . , x
n
] ,
hence result.
We now combine the above theorems.
74
Theorem
Let f C
n+1
[a, b]. Let x
i

n
i=0
be our interpolation points which are distinct over the
interval [a, b]. If p
n
P
n
interpolates f at x
i

n
i=0
, then the error e (x) = f (x) p
n
(x)
satises
[e (x)[
1
(n + 1)!

i=0
(x x
i
)

max
ayb

f
(n+1)
(y)

x [a, b] .
Proof
The result is clearly true for the interpolation points x = x
i
, i = 0 n, as e (x
i
) = 0
0 0

since the product of factors



n
i=0
(x x
i
) also = 0.
1st Theorem
e (x) = f [x
0
, x
1
, . . . , x
n
, x]
n

k=0
(x x
k
) x ,= x
i
, i = 0 n.
2nd Theorem
e (x) =
f
(n+1)
()
(n + 1)!
n

k=0
(x x
k
) for some [a, b] ,

[e (x)[ =
1
(n + 1)!

k=0
(x x
k
)

f
(n+1)
()

1
(n + 1)!

k=0
(x x
k
)

max
ayb

f
(n+1)
(y)

Let |g|

= max
axb
[g (x)[, (|.|

= innity norm)
|e|


1
(n + 1)!
_
_
_
_
_
n

i=0
(x x
i
)
_
_
_
_
_

_
_
_f
(n+1)
_
_
_

.
Does |e|

0 as n , assuming f C

[a, b]?
Ex. 1
[a, b] =
_

1
2
,
1
2
_
, f (x) = e
x
,
we know that
x, x
i

_

1
2
,
1
2
_
[x x
i
[ 1,

_
_
_
_
_
n

i=0
(x x
i
)
_
_
_
_
_

_
_
_
_
_
n

i=0
[x x
i
[
_
_
_
_
_

1 n,
75
_
_
_f
(n+1)
_
_
_

= |e
x
|

= e
1
2
,
|e|


1
(n + 1)!
e
1
2
0 as n .
Ex 2.
general [a, b], f (x) = cos x,

_
_
_f
(n+1)
_
_
_

1,
x, x
i
[a, b] [x x
i
[ b a,
|e|


1
(n + 1)!
(b a)
n+1
0 as n .
Ex 3.
f (x) = (1 +x)
1
on [0, 1]
f
t
(x) = (1 +x)
2

f
(n+1)
(x) = (1)
n+1
(n + 1)! (1 +x)
(n+2)
,
|e|

0 as n ?
|f p
n
| 0 as n ,
see Sheet5, Q12.
Can we choose the interpolation points x
i

n
i=0
in a smart way?
Fix [a, b], x n, and we are given f.
Choose distinct interpolation points x
i

n
i=0
[a, b] such that we minimize the product of
factors
min
x
i

n
i=0
_
_
_
_
_
n

i=0
(x x
i
)
_
_
_
_
_

, (25)

min
q
n
P
n
_
_
x
n+1
q
n
(x)
_
_

, (26)
solve (26) i.e. nd q

n
P
n
such that
_
_
x
n+1
q

n
(x)
_
_


_
_
x
n+1
q
n
(x)
_
_

q
n
P
n
.
If x
n+1
q

n
(x) has n + 1 distinct zeroes x
i

n
i=0
in [a, b], the we have solved (25).
min
x
i

n
i=0
_
_
_
_
_
n

i=0
(x x
i
)
_
_
_
_
_

min
q
n
P
n
_
_
x
n+1
q
n
(x)
_
_

76
5 Best Approximation in |.|

(Best approximation in the uniform sense or Minimax approximation)


Given g C [a, b], nd q

n
P
n
such that
|g q

n
|

|g q
n
|

q
n
P
n
|g q

n
|

= min
qP
n
_
max
axb
[g (x) q
n
(x)[
_
Theorem
Let g C [a, b] and n 0.
Suppose q

n
P
n
and (n + 2) distinct points
_
x

j
_
n+1
j=0
, where a x

0
< x

1
< x

n
<
x

n+1
b, such that
g
_
x

j
_
q

n
_
x

j
_
= (1)
j
|g q

n
|

j = 0 n + 1, (27)
where = +1 or 1.
Then q

n
P
n
is the Best Approximation to g from P
n
in |.|

i.e.
|g q

n
|

|g q
n
|

q
n
P
n
.
Example
n = 3, = +1 and E = |g q

n
|

see diagram 20081211.M2AA3.1


5 alternating extremes.
Proof
Let E = |g q

n
|

, if E = 0 then q

n
= g is the best approximation. Assume E > 0 and
suppose q
n
P
n
d oing better than q

n
, i.e.
|g q
n
|

< |g q

n
|

= E.
Consider q

n
q
n
P
n
at the n + 2 points
_
x

j
_
n+1
j=0
q

n
_
x

j
_
q
n
_
x

j
_
=
_
q

n
_
x

j
_
g
_
x

j
_
+
_
g
_
x

j
_
q
n
_
x

j
_
= (1)
j+1
E +
j
and [
j
[ < E,
sign ((q

n
) q
n
)
_
x

j
_
= sign
_
(1)
j+1
E
_
j = 0 n + 1,
77
q

n
q
n
P
n
changes sign at least n+1 times q

n
q
n
P
n
has (n + 1) distinct zeroes
FTA
q

n
q
n
0 q
n
= q

n
contradiction to |g q
n
|

< |g q

n
|

n
P
n
is the best approximation.
A polynomial satisfying the condition (27) in the above theorem is said to have the Equioscil-
lation Property (or the error g q

n
is said to have the equioscilation property, note that q

n
may degenerate and have degree < n - see Sheet5, Q10).
The above theorem is one half of the Chebyshev Equioscillation Theorem.
Let g C [a, b] and n 0. Then a unique q

n
P
n
such that
|g q

n
|

|g q
n
|

q
n
P
n
,
and hence satises (27).
Proof
Omitted (straightforward apparently...)
Construction of q

n
is dicult in general. Hence why we study best least squares and
interpolation. However for g (x) = x
n+1
it is easy to construct q

n
.
Theorem
Let [a, b] [1, 1]. Consider g (x) = x
n+1
. Then the best approximation to x
n+1
by P
n
in
|.|

on [1, 1] is
q

n
(x) = x
n+1
2
n
T
n+1
(x) ,
where T
n+1
(x) is the Chebyshev polynomial of degree n + 1.
Proof
Recall T
n
(x) = cos
_
ncos
1
x
_
with n 0, remember the change of variable
= cos
1
x x = cos
[1, 1] [0, ]
T
n
(x) = cos n
T
n+1
(x) = 2xT
n
(x) T
n1
(x) n 1,
T
0
(x) = 1, T
1
(x) = x
T
n+1
(x) = 2
n
x
n+1
+. . .
q

n
(x) = x
n+1
2
n
T
n+1
(x) P
n
.
78
the error
x
n+1
q

n
(x) = 2
n
T
n+1
(x)
= 2
n
cos (n + 1) .
79

You might also like