Professional Documents
Culture Documents
Volume 1
Mathematical Analysis
JEFFREY HUMPHERYS
TYLER J. JARVIS
EMILY J. EVANS
•
"
SOCIETY FOR INDUSTRIAL
AND APPLIED MATHEMATICS
PHILADELPHIA
Copyright© 2017 by the Society for Industrial and Applied Mathematics
10987 654321
All rights reserved . Printed in the United States of America. No part of this book may be reproduced, stored, or
transmitted in any manner without the written permission of the publisher. For information, write to the Society
for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA
No warranties, express or implied, are made by the publisher, authors, and their employers that the programs
contained in this volume are free of error. They should not be relied on as the sole basis to solve a problem
w hose incorrect solution could result in injury to person or property. If the prog ra ms are employed in such
a manner, it is at the user's own risk and the publisher, authors, and their employers disclaim all liability for
such m isuse.
•
5.la.J1L is a reg istered trademark.
Contents
List of Notation ix
Preface xiii
I Linear Analysis I 1
v
vi Contents
6 Differentiation 241
6.1 The Directional Derivative 241
6.2 T he Frechet Derivative in ]Rn . . 246
6.3 The General Frechet Derivative 252
6.4 Properties of Derivatives . . . . 256
6.5 Mean Value Theorem and Fundamental Theorem of Calculus 260
6.6 Taylor's Theorem 265
Exercises . . . . . . . . . . . . . . .. . . . . . 272
8 Integration I 319
8.1 Multivariable Integration . . . . . . . . . . 320
8.2 Overview of Daniell- Lebesgue Integration 326
8.3 Measure Zero and Measurability . . . . . . 331
Contents vii
9 * Integration II 361
9.1 Every Normed Space Has a Unique Completion 361
9.2 More about Measure Zero . . 364
9.3 Lebesgue-Integrable Functions .. . . . . . 367
9.4 Proof of Fubini's Theorem . . . . . . . . . 372
9.5 Proof of the Change of Variables Theorem 374
Exercises . . . . . . . . . . . . . . . . . . . . .. . 378
V Appendices 625
Bibliography 671
Index 679
List of Notation
ix
x List of Notation
S([a, b]; X) the set of all step functions mapping [a, b] into X 228, 323
sign(z) the complex sign z/JzJ of z 656
Skewn(IF) the space of skew-symmetric n x n matrices 5, 29
Symn(IF) the space of symmetric n x n matrices 5, 29
~>.(L) the >.-eigenspace of L 141
a(L) the spectrum of L 141
ae:(A) the pseudospectrum of A 554
Overview
Why Mathematical Analysis?
Mathematical analysis is the foundation upon which nearly every area of applied
mathematics is built. It is the language and intellectual framework for studying
optimization, probability theory, stochastic processes, statistics, machine learning,
differential equations, and control theory. It is also essential for rigorously describing
the theoret ical concepts of many quantitative fields, including computer science,
economics, physics, and several areas of engineering.
Beyond its importance in these disciplines, mathematical analysis is also fun-
damental in the design, analysis, and optimization of algorithms. In addition to
allowing us to make objectively true statements about the performance, complex-
ity, and accuracy of algorithms, mathematical analysis has inspired many of the
key insights needed to create, understand, and contextualize the fastest and most
important algorithms discovered to date.
In recent years, the size, speed, and scale of computing has had a profound
impact on nearly every area of science and technology. As future discoveries and in-
novations become more algorithmic, and therefore more computational, there will be
tremendous opportunities for those who understand mathematical analysis. Those
who can peer beyond the jargon-filled barriers of various quantitative disciplines
and abstract out their fundamental algorithmic concepts will be able to move fluidly
across quantitative disciplines and innovate at their crossroads. In short, mathemat-
ical analysis gives solutions to quantitative problems, and the future is promising
for those who master this material.
To the Instructor
About this Text
This text modernizes and integrates a semester of advanced linear algebra with
a semester of multivariable real analysis to give a new and redesigned year-long
curriculum in linear and nonlinear analysis. The mathematical prerequisites are
xiii
xiv Preface
http://www.siam.org/books/ot152
The intent of this text and the computer labs is to attract and retain students
into the mathematical sciences by modernizing the curriculum and connecting the-
ory to application in a way that makes the students want to understand the theory,
rather than just tolerate it. In short, a major goal of this text is to entice them to
hunger for more.
Detailed Description
Chapters 1- 3 We give a rigorous treatment of the basics of linear algebra over both
JR and <C, including abstract vector spaces, linear transformations, matrices,
the LU decomposition, inner product spaces, the QR decomposition, and least
squares. As much as possible, we try to frame things in a way that does not
require vector spaces to be finite dimensional, and we give many infinite-
dimensional examples.
Chapter 4 We treat the spectral theory of matrices, including the spectral theorem
for normal matrices. We give special attention to the singular value decom-
position and its applications.
1
Specifically, the reader should have had exposure to a rigorous treatment of continuity, con-
vergence, differentiation, and Riemann-Darboux integration in one dimension, as covered, for
example, in [Abbl5].
Preface xv
Chapter 5 We present the basics of metric topology, including the ideas of com-
pleteness and compactness. We define and give many examples of Banach
spaces. Throughout the rest of the text we formulate results in terms of Ba-
nach spaces, wherever possible. A highlight of this chapter is the continuous
linear extension theorem (sometimes called the bounded linear transforma-
tion theorem), which we use to give a very slick construction of Riemann (or
rather regulated) Banach-valued integration (single-variable in this chapter
and multivariable in Chapter 8).
Chapters 8-9 We use the same basic ideas to develop Lebesgue integration as we
used in the development of the regulated integral in Chapter 5. This approach
could be called the Riesz or Daniell approach. Instead of developing measure
theory and creating integrals from simple functions , we define what it means
for a set to have zero measure and create integrals from step functions . This
is a very clean way to do integration, which has the additional benefit of
reinforcing many of the functional-analytic ideas developed earlier in the text.
Chapters 12-13 One of the biggest innovations in the book is our treatment of
spectral theory. We take the Dunford-Schwartz approach via resolvents. This
approach is usually only developed from an advanced functional-analytic point
of view, but we break it down to the level of an undergraduate math major,
using the tools and ideas developed earlier in this text .
In this setting, we put a strong emphasis on eigenprojections, providing in-
sights into the spectral resolution theorem. This allows for easy proofs of
the spectral mapping theorem, the Perron-Frobenius theorem, the Cayley-
Hamilton theorem, and convergence of the power method. This also allows
for a nice presentation of the Drazin inverse and matrix perturbation theory.
These ideas are used again in Volume 4 with dynamical systems, where we
prove the stable and center manifold theorems using spectral projections and
corresponding semigroup estimates.
Chapter 15 We conclude the book with a chapter on applied ring theory, focused
on the algebraic structure of polynomials and matrices. A major focus of this
chapter is the Chinese remainder theorem, which we use in many ways, includ-
ing to prove results about partial fractions and Lagrange interpolation. The
highlight of the chapter is Section 15.7.3, which describes a striking connection
between Lagrange interpolation and the spectral decomposition of a matrix.
Alternatively, Chapters 1- 4 (linear analysis part I), Section 7.5 (conditioning), and
Chapters 12- 14 (linear analysis part II), as well as parts of Chapter 15, as time
permits, make up a very good one-semester advanced linear algebra course for
students who have already completed undergraduate-level courses in linear algebra,
complex analysis, and multivariate real analysis.
Preface xvii
Advanced Analysis
This book can also be used to teach a one-semester advanced analysis course for stu-
dents who have already had a semester of basic undergraduate analysis (say, at the
level of [Abb15]). One possible path through the book for this course would be to
briefly review Chapter 1 (vector spaces) , Sections 2.1-2.2 (basics of linear transfor-
mations), and Sections 3.1 and 3.5 (inner product spaces and norms), in order to set
notation and to remind the students of necessary background from linear algebra,
and then proceed through Part II (Chapters 5-7) and Part III (Chapters 8-11).
Figure 1 indicates the dependencies among the chapters.
Advanced Sections
A few problems, sections, and even chapters are marked with the symbol * to
indicate that they cover more advanced topics. Although this material is valuable,
it is not essential for understanding the rest of the text, so it may safely be skipped,
if necessary.
To the Student
Examples
Although some of the topics in this book may seem familiar to you, especially
many of the linear algebra topics, we have taken a very different approach to these
topics by integrating many different topics together in our presentation, so examples
treated in a discussion of vector spaces will appear again in sections on nonlinear
analysis and other places throughout the text. Also, notation introduced in the
examples is often used again later in the text.
Because of this, we recommend that you read all the examples in each section,
even if the definitions, theorems, and other results look familiar .
xviii Preface
1: Vector Spaces
6: Differentia-
tion
4: Spectral l 7: Contraction
Theory ___J Mappings
Is: Integration I
10: Calculus on
Manifolds
I9:*Int: gration II
11: Complex
12: Spectral w..---, Analysis
Decomposition
13: Iterative
Methods
Exercises
Each section of the book has several exercises, all collected at the end of each
chapter. Horizontal lines separate the exercises for each section from the exer-
cises for the other sections. We have carefully selected these exercises. You should
work them all (but your instructor may choose to let you skip some of the ad-
vanced exercises marked with *)-each is important for your ability to understand
subsequent material.
Although the exercises are gathered together at the end of the chapter, we
strongly recommend that you do the exercises for each section as soon as you have
completed the section, rather than saving them until you have finished the entire
chapter. Learning mathematics is like developing physical strength. It is much
easier to improve, and improvement is greater, when exercises are done daily, in
measured amounts, rather than doing long, intense bouts of exercise separated by
long rests.
Origins
This curriculum evolved as an outgrowth of lecture notes and computer labs that
were developed for a 6-credit summer course in computational mathematics and
statistics. This was designed to introduce groups of undergraduate researchers to
a number of core concepts in mathematics, statistics, and computation as part of
a National Science Foundation (NSF) funded mentoring program called CSUMS:
Computational Science Training for Undergraduates in the Mathematical Sciences.
This NSF program sought out new undergraduate mentoring models in the
mathematical sciences, with particular attention paid to computational science
training through genuine research experiences. Our answer was the Interdisciplinary
Mentoring Program in Analysis, Computation, and Theory (IMPACT), which took
cohorts of mathematics and statistics undergraduates and inserted them into an in-
tense summer "boot camp" program designed to prepare them for interdisciplinary
research during the school year. This effort required a great deal of experimenta-
tion, and when the dust finally settled, the list of topics that we wanted to teach
blossomed into 8 semesters of material--essentially an entire curriculum.
After we explained the boot camp concept to one visitor, he quipped, "It's
the minimum number of instructions needed to create an applied mathematician."
Our goal, however, is much broader than this. We don't want to train or create
a specific type of applied mathematician; we want a curriculum that supports all
types, simultaneously. In other words, our goal is to take in students with diverse
and evolving interests and backgrounds and provide them with a common corpus of
mathematical, statistical, and computational content so that they can emerge well
prepared to work in their own chosen areas of specialization. We also want to draw
their attention to the core ideas that are ubiquitous across various applications so
that they can navigate fluidly across fields.
Acknowledgments
We thank the National Science Foundation for their support through the TUES
Phase II grant DUE-1323785. We especially thank Ron Buckmire at the National
xx Preface
Science Foundation for taking a chance on us and providing much-needed advice and
guidance along the way. Without the NSF, this book would not have been possible.
We also thank the Department of Mathematics at Brigham Young University for
their generous support and for providing a stimulating environment in which to
work.
Many colleagues and friends have helped shape the ideas that led to this
text, especially Randy Beard, Rick Evans, Shane Reese, Dennis Tolley, and Sean
Warnick, as well as Bryant Angelos, Jonathan Baker, Blake Barker, Mylan Cook,
Casey Dougal, Abe Frandsen, Ryan Grout, McKay Heasley, Amelia Henricksen, Ian
Henricksen, Brent Kerby, Steven Lutz, Shane McQuarrie, Ryan Murray, Spencer
Patty, Jared Webb, Matthew Webb, Jeremy West, and Alexander Zaitzeff, who
were all instrumental in helping to organize this material.
We also thank the students of the BYU Applied and Computational Mathe-
matics Emphasis (ACME) cohorts of 2013-2015, 2014- 2016, 2015-2017, and 2016-
2018, who suffered through our mistakes, corrected many errors, and never hesitated
to tell us what they thought of our work.
We are deeply grateful to Chris Grant, Todd Kapitula, Zach Boyd, Rachel
Webb, Jared McBride, and M.A. Averill, who read various drafts of this volume very
carefully, corrected many errors, and gave us a tremendous amount of helpful feed-
back. Of course, all remaining errors are entirely our fault . We also thank Amelia
Henricksen, Sierra Horst, and Michael Hansen for their help illustrating the text
and Sarah Kay Miller for her outstanding graphic design work, including her beau-
tifully designed book covers. We also appreciate the patience, support, and expert
editorial work of Elizabeth Greenspan and the other editors and staff at SIAM.
Finally, we thank the folks at Savvysherpa, Inc., for corporate sponsorship
that greatly helped make the transition from IMPACT to ACME and their help
nourishing and strengthening the ACME development team.
Part I
Linear Analysis I
Abstract Vector
Spaces
3
4 Chapter 1. Abstract Vector Spaces
Remark 1.1.1. Many of the properties of vector spaces described in t his text hold
over arbitrary fields;2 however, we restrict our discussion here to vector spaces over
the real field IR or the complex3 field <C. We denote t he field by IF when a statement
is true for both IR and <C.
Remark 1.1.2. We use the notation I · I to denote the absolute value of a real
number and the modulus of a complex number; that is, if z =a+ bi EC, where a,
b E IR, then lzl = va
2 + b2 . The reader should verify that lzl 2 = zz = zz, where
Definition 1.1.3. A vector space over a field IF is a set V with two operations:
addition, mapping the Cartesian product4 VxV to V, and denoted by (x, y) H x+y;
and scalar multiplication, mapping IF x V to V, and denoted by (a, x ) H ax .
Elements of the vector space V are called vectors, and the elements of the field
IF are called scalars . These operations must satisfy the following properties for all
x,y,z EV and all a,b E IF :
(i) Commutativity of vector addition: x + y = y + x.
(ii) Associativity of vector addition: (x + y) + z = x + (y + z) .
(iii) Existence of an additive identity: There exists an element 0 E V such that
o+x = x.
(iv) Existence of an additive inverse: For each x E V there exists an element
-x EV such that x + (-x) = 0 .
(v) First distributive law: a (x + y) == ax+ ay.
Nota Bene 1.1.4. A subtle point that i sometime missed because it is not
included in the numbered list of properties is that the definition of a vector
space requires the operations of vector addition and scalar multiplication to
take their values in V. More precisely, if we add two vectors together in V, the
result must be a vector in V, and if we multiply a vector in V by a scalar in
2
For more informat ion on general fields, see Appendix B.2.
3 For more informat ion on complex numbers, see Appendix B .l.
4
See Definition A .l. 10 (viii) in Appendix A.
1.1. Vector Algebra 5
IF, the result must also be a vector in V . When these properties hold, we say
that V is closed under vector addition and scalar multiplication. If V is not
closed under an operation, then, strictly speaking, we don't have an operation
on V at all. Checking closure under operations is often the hardest part of
verifying that a given set is a vector space.
(i) The n-tuples lFn forming the usual Euclidean space. Vector addition is
given by (a1, ... , an)+ (b1 , ... , bn) = (a1 +bi , . .. , an+ bn) , and scalar
multiplication is given by c(a 1 , . .. , an) = (ca 1 , ... , can)·
(iii) For [a, b] C JR, the space C([a, b]; JF) of continuous lF-valued functions.
Vector addition is given by defining the function f + g as (f + g)(x) =
f( x) + g(x) , and scalar multiplication is given by defining the function
cf by (cf)(x) = c · f(x). Note that C([a, b]; JF) is closed under vector
addition and scalar multiplication because sums and scalar products of
continuous functions are continuous.
6 Chapter 1. Abstract Vector Spaces
(iv) For [a, b] c IR and 1 ::; p < oo, the space LP([a, b] ; JF) of p-integrablea
functions f: [a, b] -+ lF, that is, satisfying J:
[f(x)IP dx < oo.
For p = oo, let L ([a, b]; JF) be the set of all functions f: [a, b]-+ lF such
00
Pro pos ition 1.1. 7. Let V be a vector space. If x, y E V, then the fallowing hold:
(i) x +y = x implies y = 0 . In particular, the additive identity is unique .
(ii) x +y = 0 implies y = - x. In particular, additive inverses are unique.
(iii) Ox = 0 .
(iv) (- l)x = - x.
P roof.
(i) If x + y = x , then 0 = - x + x = - x + (x + y) = (-x + x) + y = 0 + y = y .
(ii) Ifx+y = 0, then -x = -x+O = - x + (x +y) = (-x+x) +y = O+y = y .
(iii) For each x , we have that x = lx = (1 + O)x = lx +Ox= x +Ox . Hence, (i)
implies that Ox = 0.
(iv) For each x , we have that 0 =Ox= (1 + (-l))x = lx + (-l)x = x + (-l)x.
Hence, (ii) implies that (-l)x = - x . D
1.1. Vector Algebra 7
Remark 1.1.8. Subtraction in a vector space is really just addition of the negative.
Thus, the expression x - y is just shorthand for x + (-y).
Proposition 1.1.9. Let V be a vector space. Ifx,y,z EV and a,b E lF, then the
fallowing hold:
(i) aO = 0.
(iii) If x +y = x + z, then y = z.
Proof.
(i) For each x E V and a E lF, we have that ax= a(x + 0) =ax+ aO. Since the
addit ive ident ity is unique by Proposition 1.1.7, it follows that aO = 0.
(ii) Since a-=/= 0, we have that a- 1 E lF. Thus, x = (lx) = (a- 1 a)x = a- 1 (ax) =
a- 1 (ay) = (a- 1 a)y = ly = y .
Not a Bene 1.1.10. It is important to understand that Proposit ion l.l. 9(iv)
is not saying that we can divide one vector by another vector. T hat is abso-
lutely not t he case. Indeed, this cancellation only works in t he special case
that both sides of the equation are scalar multiples of t he same vector x.
1.1.2 Subspaces
We conclude this section by defining subspaces of a vector space and then providing
several examples.
Proof. The hypothesis of this theorem shows that W is closed under vector addition
and scalar multiplication, so vector addition does indeed map W x W to W and
scalar multiplication does indeed map lF x W to W, as required. Properties (i)-(ii)
and (v)- (viii) of Definition 1.1.3 follow because they hold for the space V. The
proofs of the remaining two properties (iii) and (iv) are left as an exercise- see
Exercise 1.2. D
(i) If W is the union of the x- and y-axes in the plane IR 2 , then it is not a
subspace of IR 2 . Indeed, the sum of the vectors (1, 0) (in t he x-axis) and
(0, 1) (in the y-axis) is (1, 1), which is not in W, so W is not closed with
respect to vector addition.
(ii) Any line or plane in IR 3 that does not contain the origin 0 is not a
subspace of JR 3 because all subspaces are required to contain 0.
(i) Any line that passes through the origin in JR 3 is a subspace of JR 3 and
can be written as {tx I t E IR} for some x E JR 3 . Thus, any scalar
multiple a(tx) is on the line, as is any sum tx + sx = (t + s)x. More
generally, for any vector space V over a field lF and any x E V, the set
W = {tx It E lF} is a subspace of V.
1.1 . Vector Algebra 9
(iii) For [a, b] c IR, the space Co([a, b]; JF) of all functions f E C([a, b]; JF) such
that f(a) = f(b) = 0 is a subspace of C([a,b];lF). To see this, we must
check that if f and g both vanish at the endpoints, then so does cf+ dg
for any c, d E lF.
(v) For any vector space V, both {O} and V are subspaces of V. The
subspace {O} is called the trivial subspace of V. Any subspace of V
that is not equal to V itself is called a proper subspace.
(vi) For [a, b] c IR, the set C([a, b]; lF) is a subspace of LP([a, b]; JF) when
1 ::; p < 00 .
(vii) The space cn((a, b); JF) of lF-valued functions whose nth derivative is
continuous on (a, b) is a subspace of C ( (a, b); JF).
Sound waves also behave like vectors in a vector space. This allows the
construction of technologies like noise-canceling headphones. The idea is very
simple. When an undesired noise n is approaching the ear , produce the signal
that is the additive inverse -n, and play it at the same time. The signal heard
is the sum n + (-n) = 0 , which is silent.
Definition 1.2.1. The span of S, denoted span(S) , is the set of linear combinations
of elements of S, that is, the set of all finite sums of the form
If S is empty, then we define span(S) to be the set {O}. If span(S) = W for some
subspace W of V , then we say that S spans W.
Proposition 1.2.7. If v E span (S) , then span (S) = span (SU {v} ).
Proof. By the previous proposition, we have that span( S) C span( SU {v}) . Thus,
assuming the hypothesis, it suffices to show that span( S U {v}) C span( S).
12 Chapter 1. Abstract Vector Spaces
Given x Espan(S U {v}) we have x = 2:~ 1 ai si + cv, for some {si}~ 1 CS.
Since v Espan(S), we also have that v = 2:1J'= 1 b1t1 for some {t1l:t=l CS. Thus,
we have that x = I:~=l aisi +c 2:1J'= 1 b1 t 1 , which is a linear combination of elements
of S. Therefore, x E span (S). D
Definition 1.2.8. The set S is linearly dependent if there exists a nontrivial linear
combination of elements of S that equals zero; that is, for some nonempty subset
{x1, ... , xm} CS we have
where the elements x i are distinct, and not all of the coefficients ai are zero . If no
such linear combination exists, then the set S is linearly independent .
Remark 1.2.9. The empty set is vacuously linearly independent. Because there
are no vectors in 0, there is no nontrivial linear combination of vectors.
Proposition 1.2.11. If S is linearly independent, then any vector v E span (S) can
be written uniquely as a (finite) linear combination of elements of S . More precisely,
for any distinct elements X1 , .. . , Xm of S, if v = I:Z::, 1 aixi and v = I:Z::, 1 bi xi,
then ai =bi for every i = 1, . .. , m .
Proof. Suppose that v = I:Z::, 1 aix'i and v = I:Z::, 1 bixi · Subtracting gives 0 =
I:Z::, 1 (ai -
bi)xi. Since Sis linearly independent, each term is equal to zero, which
implies ai =bi for each i = 1, . .. , m. D
The next lemma is an important tool for two theorems in a later section (the
replacement theorem (Theorem 1.4.1) and the extension theorem (Corollary 1.4.5)) .
1.2. Spans and Linear Independence 13
It states that linear independence is inherited by subsets. This is sort of a dual result
to Proposition 1.2.5, which states that supersets of sets that span also span.
Lemma 1.2.12. If S is linearly independent and S' c S , then S' is also linearly
independent.
Lemma 1.2.12 and Proposition 1.2.5 suggest that linear independence and the
span are complementary in some sense. There are some special sets that have both
properties. These important sets are called bases of the vector space, and they act
as a coordinate system for the vector space.
Example 1.2.14.
(iii) The vectors (1, 1, 1) , (2 , 1,1) , and (1 , 0, 1) also form a basis for IF 3 . To
show that t he vectors are linearly independent, set
Solving gives
a = y - x + z,
b = x - z,
c = z -y.
This corollary means that if we have a basis B = {x1, ... , xn} of V, we can
write any element v E V uniquely as a linear combination v = :L;~=l aixi, and thus
v can be identified with the n-tuple ( a 1 , . . . , an) E lFn. That is to say, the basis B
has given us a coordinate system for V. A different basis would, of course, give us a
different coordinate system. In the next chapter, we show how to transform vectors
from one coordinate system to another.
1.3.1 Products
Proposition 1.3.1. Let Vi, V2, ... , Vn be a collection of vector spaces over the field
lF . The Cartesian product
n
V = IJ Vi = Vi X Vi X · · · X Vn = {(v1, V2, ... , Vn) I Vi E Vi}
i=l
forms a vector space over lF with additive identity (0, 0, .. . , 0), and vector addition
and scalar multiplication defined componentwise as
for all (x1, x2, ... , Xn) , (y1, y2, ... , Yn) E V and a E lF.
(x,y)
Example 1.3.2. The product space I1~= 1 lF is exactly the vector space lFn.
Example 1.3.3. The space JR 2 x JR can be written as the set of pairs (x,y),
where x = (x1,x2) E JR 2 and y = y 1 E R The points of JR 2 x JR are in a
natural bijective correspondence with the points of JR 3 by sending ((x 1 , x 2 ), y 1 )
to (x1,x2,y1) as in Figure 1.1.
Definition 1.3.6. Let W1, W2 be subspaces of the vector space V. IfW1nW2 = {O},
then the sum W1+W2 is called a direct sum and is denoted W1EB W2. More generally,
if (Wi)~ 1 is a collection of subspaces of V, then the sum W = L:;~= l Wi is a direct
sum if
Win (L#i
wj) = {O} for all i = 1, 2, . .. , n. (1.2)
Proof.
(i) =? (ii): If w = X1 + X2 + ... + Xn and w = Y1 + Y2 + ... + Yn, each Xi, Yi E wi,
then for each i, we have Xi - Yi = L:;#i(Yj - Xj ) E Win L:;#i Wj = {0}.
Hence, uniqueness holds.
(ii)=>(iii): Because each Si is a basis of Wi, it is linearly independent; hence, 0 tf_ Si .
Since this holds for every i, we also have 0 tf_ S. If Sis linearly dependent, then
there exists some nontrivial linear combination of elements of S that equals
zero. This contradicts the uniqueness of the representation in (ii), since the
unique representation of zero is 0 = 0 + 0 + · · · + 0.
Moreover, the set S spans W, since every element of W can be expressed as a
linear combination of elements from {Wi}~ 1 , which in turn can be expressed
as a linear combination of elements of Si .
Finally, if s E Sin Sj for some i f. j, then the uniqueness of the representation
0
Corollary 1.4.2. If V has a basis of n elements, then all other bases of V have n
elements.
Proof. These statements follow immediately from the replacement theorem. See
Exercise 1. 20. D
Proof. By the replacement theorem, there exists S' c S such that TU S' spans
V. It suffices to show that TU S' is linearly independent. Assume without loss
of generality that S' = {s 1, ... , Sm-n} and suppose, by way of contradiction, there
exists a nontrivial linear combination of elements of T U S' satisfying
Since T is linearly independent, this implies that at least one of the bi is nonzero.
We assume without loss of generality that b1 =f:. 0. Thus, we have
Hence, TU S" spans V, where S" = {s2, ... , Sm-n}· This is a contradiction since
T U S" has only m - 1 elements, and Corollary 1.4.4 requires any set that spans
the m-dimensional space V to have at least m elements. Thus, TU S is linearly
independent. D
1A. Dimension, Replacement, and Extension 19
Ji = x 2 - 1, fz = x3 - x, and h = x3 - 2x 2 - x +1
in the space lF[x ;3]. The set of monomials {l,x,x 2 ,x 3 } forms a basis for
lF[x;3]. Let T = {!1,h,h} and S = {l,x,x 2 ,x3 }. By the extension theorem
there exists a subset of S' of S such that S' UT forms a basis for lF[x; 3]. One
option is S' = {x}. A straightforward calculation shows
1= fz - 2fi - h, x2 = fz - Ji - h, and x 3 = fz + x,
so {!1 ,fz,h,x} forms a basis for lF[x;3].
Proof. This follows trivially from the extension theorem (Corollary 1.4.5). It is
also proved in Exercise 1.21. D
Vista 1.4.8. Both the replacement and extension theorems require t hat the
underlying vector space be finite dimensional. This raises the question: Are
infinite-dimensional vector spaces important? Or can we get away with nar-
rowing our focus to the finite-dimensional case? Although you may have come
pretty far in life just using finite-dimensional vector spaces, many important
areas of mathematics rely heavily on infinite-dimensional vector spaces, such
as C([a, b]; !F) and L00 ([a, b]; !F). One example where such a vector space oc-
curs is t he st udy of differential equations, which includes most , if not all, of
the laws of physics. Infinite-dimensional vector spaces are also widely used in
other fields such as finance , economics, geology, climate science, biology, and
nearly every area of engineering.
We conclude this section with a simple but useful result on dimension of direct
sums and Cartesian products.
n
V =IT Vi= Vi X Vi X · · · X Vn = {(v1, Vz, ... , Vn) I vi E Vi}
i= l
20 Chapter 1. Abstract Vector Spaces
satisfies
n
dim(V1 x Vi x · · · x Vn) = L dim(V;) .
i=l
Theorem 1.4.10 (Zorn's Lemma). Let (X, :::;) be a nonempty partially ordered
set. 6 If every chain in X has an upper bound in X, then X contains a maximal
element.
By chain we mean any subset C C X such that C is totally ordered; that is,
for every a, f3 E C we have either a ::; f3 or f3 :::; a . A chain C is said to have an
upper bound in X if there is an element I E X such that a :::; I for every a E C.
Although this theorem is about bases of vector spaces, the idea of its proof is
useful in many settings where one needs to prove that a maximal set having certain
properties exists.
Proof. Let V be any vector space. If V = {O}, then the empty set is a basis.
Hence, we may assume that V is nontrivial.
Let Y = {S0 }aEI be the set of all linearly independent subsets S 0 of V. Since
V is nontrivial, Y is not empty. The set Y is partially ordered by set inclusion c.
To use Zorn's lemma, we must show that every chain in Y has an upper bound.
5 Proofs that rely on the axiom of choice are usually only proofs of existence. Specifically, the
theorem here about existence of a basis doesn't say anything about how to construct a basis.
6 See Definition A.3.15.
1.5. Quotient Spaces 21
1. 5.1 Co sets
Proof. It suffices to show that the relation "' is (i) reflexive, (ii) symmetric, and
(iii) transitive.
Recall that an equivalence relation defines a partition. More precisely, for each
x E V, we define the equivalence class of x to be the set [[x]] = {y I y "' x }. Thus,
every element x E V is in exactly one equivalence class.
T he equivalence classes defined by"' have a particularly useful structure. Note
that
{y I y "-' x } = {y I y - x E W}
= {y I y = x + w for some w E W}
= {x + w I w E W}.
7
The definition of an equivalence relation is given in Appendix A.l.
15. Quotient Spaces 23
a+W
Hence, the equivalence classes of"' are just translates of the subspace W, and we
often denote them by x + W = {x + w I w E W} = [[x]]; see Figure 1.2. The
equivalence classes of W are called the cosets of W. Cosets are either identical or
disjoint, and we have x + W = x' + W if and only if x - x' E W . As we show in
Theorem 1.5.7 below, the set of all cosets of Wis a vector space.
Definition 1.5.3. The set {x +W I x EV} (or equivalently {[[x]J I x EV}) of all
cosets of W in V is denoted V /W and is called the quotient of V modulo W .
Example 1.5.4.
(i) Let V = IR 3 and let W = span((O, 1, 0)) be the y-axis. We show that
there is a natural bijective correspondence between the elements (cosets)
of the quotient V / W and the elements of IR 2 .
Note that any (a , b, c) EV can be written as (a, b, c) = (a, 0, c) + (0, b, 0) ,
and (0, b, 0) E W. Therefore, the coset (a , b, c) +Win the quotient V/W
is equal to the coset (a, 0, c) + W, and we have a surjection cp : IR 2 -t
V / W, defined by sending (a, c) E IR 2 to (a , 0, c) + W E V/W.
If cp(a,c) "'cp(a' , c'), then (a,O,c) "'(a',O,c') , implying that (a,0,c) -
(a', O,c') =(a - a' , O, c - c') E W = {(O,b, O) I b E IR}. It follows that
a - a' = 0 = c - c', and so a = a' and c = c'. Thus, the map cp is
injective. This gives a bijection from IR 2 to V/W. Below we show that
V / W has a natural vector-space structure, and the bijection cp preserves
all the properties of a vector space.
24 Chapter 1. Abstract Vector Spaces
(ii) Let V = C([O, 1] ; JR) be the vector space of real-valued functions defined
and continuous on the interval [O, 1], and let W = {f EV I f(O) = O} be
the subspace of all functions that vanish at 0. We show that there is a
natural bijective correspondence between the elements (cosets) of V /W
and the real numbers.
Given any function f E V , let f(x) = f(x) - f(O) E W , and let Jo be
the constant function fo(x) = f(O). We can check that f E Wand that
Jo E V. Thus, f = Jo+ f. This shows that f(x) "" fo( x) modulo W,
and the coset f +Win V/W can be written as Jo+ W .
Now we can proceed as in the previous example. Given any a E JR there
is a corresponding constant function in V , which we also denote by
fa(x) = a. Let 'I/; : JR---+ V/W be defined as 'l/;(a) =fa+ W. This map is
surjective, since any f + W E V/W can be written as Jo+ W = 'l/;(f(O)) .
Also, given any a, a' E JR, if 'l/;(a) ""'l/;(a') , then fa - fa ' E W. But fa - fa'
is constant, and since it vanishes at 0, it must vanish everywhere; that is,
a= a'. It follows that 'I/; is injective and thus also a bijection.
(iii) Let V = IF[x] and let W = span({x 3 ,x 4 , ... }) be the subspace of IF[x]
consisting of all polynomials with no nonzero terms of degree less than 3.
Any f = ao + a1 x + · · · + anxn E IF[x] is equivalent mod W to a0 + a 1x +
a 2x 2, and so the coset f +Wean be written as ao + a1x + a2x 2 + W . An
argument similar to the previous example shows that the quotient V /W
is in bijective correspondence with the set {(ao , a1, a2) I ai E IF} = IF 3
via the map (ao, a1, a2) f-t ao + a1x + a2x 2 + W.
Lemma 1.5.6. The operations EE: V/W x V/W---+ V/W and 0 : IFx V/W---+ V/W
are well defined for all x, y E V and a E IF.
Figure 1.3 shows the additive part of this lemma in JR 2: different representa-
t ives of each coset can sum to different vectors, but those different sums still lie in
the same coset.
Proof.
(i) We must show that the definition of EE does not depend on the choice of
representative of the coset; that is, if x + W = x' +Wand y + W = y' + W,
then we must show (x + W) EE (y + W) = (x' + W) EE (y' + W).
1.5. Quotient Spaces 25
(x+y)+W (x+y)+W
y+W y+W
x+W x+W
Figure 1.3. For any two vectors x1 and x2 in the coset x + W and any
other two vectors Y1 and Y2 in the coset y + W, the sums x 1 + y 1 and x2 + Y2 both
lie in the coset (x + y) + W .
Theorem 1.5. 7. The quotient space V / W is a vector space when endowed with the
operations tE and [:::J of vector addition and scalar multiplication, respectively.
Example 1.5.8.
(i) Consider again the example of V = JR3 and W = span((O, 1, 0)) , as in
Example l.5.4(i). We saw already that the map cp is a bijection from JR 2
to V/W, but now V/W also has a vector-space structure. We now show
that cp has the special property that it preserves both vector addition
and scalar multiplication. (Note: Not every bijection from JR 2 to V / W
has this property.)
26 Chapter 1. Abstract Vector Spaces
(iii) In the case ofV = JF[x] and W = span({x 3 ,x 4 , •• . }), any two cosets can
be written in the form (ao+a1x+a2x 2)+W and (bo+b1x+b2x 2)+W, and
the operation EE is given by ( (ao +a1x+a2x 2) + W) EE ( (bo +b1x+ b2x 2) +
W) = ((a 0 + b0 ) + (a 1 + b1 )x + (a2 + b2)x 2) + W . Similarly, the operation
of [J is given by d[J ((ao +a 1x +a2x 2) + W) = (dao +da1x +da2x 2) + W.
f(x) = 1
also has integral 1 on the interval [O, 1]. When studying integration, it is
helpful to think of t hese two functions as being the same.
A set Z has measure zero if it is so small that any two functions that
differ only on Z must have the same integral. The set of all integrable functions
that are supported only on a set of measure zero is a subspace of the space of
all integrable functions.
The precise way to make sense of the idea that functions like f and h
should be "the same" is to work with the quotient of the space of all integrable
functions by the subspace of functions supported on a set of measure zero. We
treat these ideas much more carefully in Chapters 8 and 9.
Exercises
Note to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
first and second line are for Section 1, the exercises between the second and third
lines are for Section 2, and so forth .
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *). We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with & are especially important and are likely to be used later
in this book and beyond. Those marked with t are harder than average, but should
still be done.
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
1.1. Show that the set V = (0, oo) is a vector space over JR with vector addition
(EB) and scalar multiplication (0) defined as
xEBy = xy
a('.)x = xa,
1.15. Prove that there is a basis for IF[x; n] consisting only of polynomials of degree
n E f::!.
1.16. For every positive integer n, write IF[x] as the direct sum of n subspaces.
1.17. Prove Proposition 1.3.1.
1.18. Let
Show that
1.19. Show that any function f E C([-1, l];JR) can be uniquely expressed as
an even continuous function on [- 1, 1] plus an odd continuous function on
[- 1, l]. Then show that the spaces of even and odd continuous functions of
C([- 1, 1]; JR), respectively, form complementary subspaces of C([- 1, 1]; JR).
1.26. Using the operations defined in Definition 1.5.5, prove the remaining details
in Theorem 1.5.7.
1.27. Prove that the quotient V /W satisfies
1.30. Let V = lF[x] and W = span( { x, x 3 , x 5 , .. . } ) be the span of the set of all
odd-degree monomials. Prove that there is a bijective map 'ljJ : V /W ---t lF[y]
satisfying 'l/;(p +WEB q + W) = 'l/;(p + W) + 'l/;(q + W) and 'l/;(c [:J (p + W)) =
c'l/;(p + W).
Notes
For a friendly description of many modern applications of linear algebra, we recom-
mend Tim Chartier's little book [Cha15] .
The reader who wants to review elementary linear algebra may find some of
the following books useful [Lay02, Leo80, 0806, Str80]. G. Strang also has some
very clear video explanations of many important linear algebra concepts available
through the MIT Open Courseware page [StrlO].
Linear
Transformations
and Matrices
The Matrix is everywhere. It is all around us. Even now, in this very room. You can
see it when you look out your window, or when you turn on your television. You
can feel it when you go to work, when you go to church, when you pay your taxes.
It is the world that has been pulled over your eyes to blind you from the truth.
-Morpheus
A linear transformation (or linear map) is a function between two vector spaces
that preserves all the linear structures, that is, lines map into lines, the origin maps
to the origin, and subspaces map into subspaces.
The study of linear transformations of vector spaces can be broken down
into roughly three areas, namely, algebraic properties, geometric properties, and
operator-theoretic properties. In this chapter, we explore the algebraic properties
of linear transformations. We discuss the geometric properties in Chapter 3 and
the operator-theoretic properties in Chapter 4.
We begin by defining linear transformations and describing their attributes.
For example, when a linear transformation has an inverse, we say that it is an
isomorphism, and that the domain and codomain are isomorphic. This is a math-
ematical way of saying that, as vector spaces, the domain and codomain are the
same. One of the big results in this chapter is that all n -dimensional vector spaces
over the field lF are isomorphic to lFn .
We also consider basic features of linear transformations such as the kernel
and range. Of particular interest is the first isomorphism theorem, which states that
the quotient space of the domain of a linear transformation, modulo its kernel, is
isomorphic to its range. This is used to prove several results about the dimensions
of various spaces. In particular, an important consequence of the first isomorphism
theorem is the rank-nullity theorem, which tells us that for a given linear map, the
dimension of the domain is equal to the dimension of its range (called rank) plus
the dimension of the kernel (called nullity). The rank-nullity theorem is a major
result in linear algebra and is frequently used in applications.
Another useful consequence of the first isomorphism theorem is the second
isomorphism theorem, which provides an identity involving sums and intersections
31
32 Chapter 2. Linear Transformations and Matrices
Definition 2.1.1. Let V and W be vector spaces over a common field lF. A map
L : V -t W is a linear transformation from V into W if
(2.1)
Example 2.1.3.
(i) For any positive integer n, the projection map 'Tri : lFn -t lF given by
(a1 , .. . , an) i-t ai is linear for each i = 1, .. . , n.
(ii) For any positive integer n, then-tuple (a 1 , ... ,an) of scalars defines a
linear map (x1 , ... ,xn) H a1x1 + · · · + anXn from lFn -t lF.
(iii) More generally, any m x n matrix A with entries in lF defines a linear
transformation lFn -t lFm by the rule x i-t Ax.
(iv) For any positive integer n, the map cn((a,b);lF) -t cn- 1 ((a , b);lF) de-
fined by f i-t d~ f is linear.
(v) The map C([a, b]; JF) -t lF defined by f i-t J: f(x) dx is linear.
(vi) For any interval [a, b] CIR, the map C([a, b];lF) -t C([a,b];lF) given by
J:
f i-t f(t) dt is linear.
(vii) For any interval [a, b] C IR and any p E [a, b], the evaluation map ep :
C([a, b]; JF) -t lF defined by f(x) i-t f(p) is linear.
(viii) For any interval [a, b] C IR, a polynomial p E lF[x] is continuous on [a, b],
and thus the identity map lF[x] -t C([a, b]; lF) is linear.
(ix) For any positive integer n, the projection map f,P -t lFn given by
is a linear transformation.
34 Chapter 2. Linear Transformations and Matrices
is a linear operator.
(xi) For any vector space V defined over the field IF and any scalar a E IF,
the scaling operator h°' : V ---,> V given by hoi(x) = a x is linear.
(xii) For any e E [O, 27r), the map po : IR 2 -t IR 2 given by rotating around
e
the origin counterclockwise by angle is a linear operator. This can be
written as
Unexample 2.1.4.
(ii) maps the origin to the origin; that is, L(O) = O; and
(iii) maps subspaces to subspaces; that is, if X is a subspace of V, then the set
L( X) is a subspace of W.
Proof.
(ii) Since L(x) = L(x + 0) = L(x) + L(O), it follows by the uniqueness of the
additive identity that L(O) = 0 by Proposition 1.1.7.
(iii) Assume that Y1, Y2 E L(X) . Hence, Y1 = L(x1) and Y2 = L(x2) for some
x 1, x2 EX. Since ax1 +bx2 EX, we have that ay1 +by2 = aL(x1)+bL(x2) =
L(ax1 + bx2) E L(X). D
Remark 2.1.6. In Proposition 2.l.5(i), the line tx + (1 - t)y maps to the origin if
L(x) = L(y) = 0. If we consider a point as a degenerate line, we can still say that
linear transformations map lines to lines.
Definition 2.1.7. Let V and W be vector spaces. The kernel (or null space) of
a linear transformation L: V-+ W is the set JV (L) = {x EV I L(x) = O} . The
rnnge (or image) of L is the set f/t (L) = {L(x ) E WI x EV}.
Proposition 2.1.8. Let V and W be vector spaces, and let L : V-+ W be a linear
map. We have the following :
Proof. Note that both JV (L) and f/t (L) are nonempty since L(O) = 0.
36 Chapter 2. Linear Transformations and Matrices
(i) Ifx 1, x 2 E JV (L), then L (x 1) = L(x2) = 0, which implies that L(ax1 +bx2) =
aL(x1) + bL(x2) = 0. Therefore ax1 + bx2 E JV (L) .
(ii) This follows immediately from Proposition 2.1.5 (iii). D
(iv) Let L : C 1([0, l];lF) -t C([O, 1]; JF) be given by L[f] = f' + f. It is easy
to show that Lis linear. Note that JV (L) = span{e- x}. To prove that
Lis surjective, let g(x) E C([O, 1]; JF), and for 0 :S x :S 1, define
Remark 2.1.10. Given a linear map L from one function space to another, we
typically write L[f] as the image of f. To evaluate the function at a point x ,
we write L[f](x).
Proposition 2.2.1.
(i) If L: V--+ W and K : W--+ X are linear transformations, then the composi-
tion K o L : V --+ X, is also a l'inear transformation.
2.2. Basics of Linear Transformations II 37
(ii) If L : V-+ W and K : V-+ W are linear transformations and r, s E lF, then
the map r L + sK, defined by (rL + sK)(x) = rL(x) + sK(x), is also a linear
transformation.
Proof.
(i) If a, b E lF and x , y EV , t hen
Definition 2.2.2. Let V and W be vector spaces over the same field lF. Let
2'(V ; W) be the set of linear transformations L mapping V into W with the point-
wise operations of vector addition and scalar multiplication:
(i) If f, g E 2'(V; W), then f + g is the map defined by (J + g)(x) = f(x) + g(x).
(ii) If a E lF, then af is the map defined by (af)(x) = af(x).
Corollary 2.2.3. Let V and W be vector spaces over the same field lF. The set
2'(V; W) with the pointwise operations of vector addition and scalar multiplication
forms a vector space over lF.
Remark 2.2.4. For notational convenience, we denote the composition of two lin-
ear transformations K and Las the product KL instead of K oL. When expressing
the repeated composition of linear operators K: V-+ V we write as powers of K;
for example, we write K 2 instead of K o K.
Nota Bene 2.2.5. The distributive laws hold for compositions of sums and
sums of compositions; that is, K(L1 + L2) = KL1 + KL2 and (L1 + L2)K =
L 1K + L 2K. However, commutativity generally fails. In fact, if L : V-+ W
and K : W-+ X and X =/:- V, then the composition LK doesn 't even make
sense. Even when X = V, we seldom have commutativity.
38 Chapter 2. Linear Transformations and Matrices
2.2.2 lnvertibility
We finish this section by developing the main ideas of invertibility and defining the
concept of an isomorphism, which tells us when two vector spaces are essentially
the same.
Example 2.2.10.
(i) For any positive integer n, let W = {(0, a2, a3, ... , an) I ai E IF} be a
subspace of IFn. We claim IFn-l is isomorphic to W. The isomorphism
between them is the linear map (b1, b2, ... , bn-1) H (0, bi , b2, . .. , bn-1)
with the obvious inverse.
8
For experts, we note that it is common to define an isomorphism to be a bijective linear trans-
formation, and although that is equivalent to our definition, it is the ''wrong" definition. What
we really care about is that all the properties of the two vector spaces match up-that is, any
relation in one vector space maps to (by the isomorphism or by its inverse) the same relation
in the other vector space. The importance of requiring the inverse to preserve all the important
properties is evident in categories where bijectivity is not sufficient. For example, an isomorphism
of topological spaces (a homeomorphism) is a continuous map that has a continuous inverse, but
a continuous bijective map need not have a continuous inverse.
9
It is worth noting the Greek roots of these words: iso means equal, morph means shape or form,
and auto means self.
2.2. Basics of Linear Transformations II 39
2.2.3 Isomorphisms
Remark 2.2.13. Of course not every operator is invertible, but even when an
operator is not invertible, many of the results of invertibility can still be used. In
Sections 4.6.1 and 12.9, we examine certain generalized inverses of linear operators.
These are operators that are not quite inverses but do have some of the properties
enjoyed by inverses.
Two vector spaces that are isomorphic are essentially the same with respect to
the linear structure. The following shows that every property of vector spaces that
we have discussed so far in this book is preserved by isomorphism . We use this often
because many vector spaces that look complicated can be shown to be isomorphic
to a simpler-looking vector space. In particular, we show in Corollary 2.3.12 that
all vector spaces of dimension n are isomorphic.
(iii) The set of all subspaces of V is in bijective correspondence with the set of all
subspaces of W .
(iv) If K : W--+ X is any linear transformation, then the composition KL: V--+
X is also a linear trans!ormation, and we have
and
f,f (KL) = f,f (K).
Proof.
(ii) Since L is surjective, any element w E W can be written as L(v) for some
v EV. Because Bis a basis, it spans V, and we can write v = 2::7= 1 aivi for
some choice of a 1, ... , an E IF. Applying L gives w = L(v) = L(2:7= 1 aivi) =
2::7= 1 aiL(vi), so the set LB= {.L(v1), ... , L(vn)} spans W. It is also linearly
independent since 2:~ 1 ci.L(vi) = 0 implies 2::7= 1 ci.L- 1.L(vi) = 0 by part (i).
Since B is linearly independent, we must have Ci = 0 for all i.
The converse follows by applying the same argument with L- 1 .
Similarly, ££- 1 = I .9w, so the maps Land £- 1 are inverses and thus bijective.
formula relating the dimension of the image (called the rank) and the dimension of
the kernel (called the nullity).
We also use the first isomorphism theorem to prove the extremely important
result that all n-dimensional vector spaces over IF are isomorphic to !Fn.
Finally, we use the first isomorphism theorem to prove another theorem, called
the second isomorphism theorem, about the relation between quotients of sums and
intersections of subspaces. This theorem provides a formula, called the dimension
formula, relating the dimensions of sums and intersections of subspaces.
Example 2.3.2. Exercise 1.29 shows for a vector space V that V/{O} ~ V.
Alternatively, this follows from the proposition as follows. Define a linear map
7r: V --t V/{O} by 7r(x) = x + {O}. By the proposition, 7r is surjective. Thus,
it suffices to show that 7r is injective. If 7r(x) = 7r(y), then x + {O} = y + {O} ,
which implies that x - y E {O}. Therefore, x = y and 7r is injective.
Proof. Lis injective if and only if L(x) = L(y) implies x = y, which holds if and
only if L( z) = 0 implies z = 0, which holds if and only if JV (L) = {0}. D
Proof. We show that Lis well defined (see Appendix A.2.4) and linear. Ifx+ W =
y + W, then x - y E W, which implies that L(x) - L(y) E L(W) c Y. Hence,
L (x + W) = L(x) + Y = L(y) + Y = L (y + W) . Thus, Lis well defined. To show
10 An epimorphism is a surjective linear transformat ion. The name comes from the Greek root
epi-, meaning on, upon, or above.
42 Chapter 2. Linear Transformations and Matrices
Proof. Apply the previous lemma with W =A" (L), with Y = {O}, and with Z =
a (L) to get an induced linear transformation L: V/JY (L)-+ a (L) /{O} ~a (L)
that is clearly surjective since x +A" (L) maps to L(x) + {O}, which then maps
to L(x) E f/t(L). Thus, it suffices to show that Lis injective. IfL(x+A"'(L)) =
L(y + A" (L)), then L(x) + {O} = L(y) + {O}. Equivalently, L(x - y) E {O},
which implies that x - y E A" (L), and so x +A" (L) = y +A" (L). Thus, Lis
injective. D
The above theorem is incredibly useful because we can use it to prove that a
quotient V/W is isomorphic to another space X. The standard way to do this is
to construct a surjective linear transformation V-+ X that has kernel equal to W.
The first isomorphism theorem then gives the desired isomorphism.
Example 2.3.6.
(i) If V is a vector space, then we can show that V /V ~ {O}. Define the
linear map L : V -+ {O} by L(x) = 0. The kernel of Lis all of V, and
so the result follows from the first isomorphism theorem.
(ii) Let V = {(0, a2, a3, ... , an) I ai E IF} C IFn for some integer n 2 2. The
quotient IFn /V is isomorphic to IF. To see this use the map n 1 : IFn -+IF
(see Example 2.l.3(i)) defined by (a 1 , . . . , an) H a 1 . One can readily
check that this is linear and surjective with kernel equal to V, and so
the first isomorphism theorem gives the desired result.
(iii) The set of constant functions ConstF forms a subspace of cn([a, b]; IF).
The quotient cn([a , b]; IF)/ ConstF is isomorphic to cn- 1 ([a, b]; IF). To
see this, define the linear map D : cn([a, b]; IF) -+ cn- 1 ([a, b]; IF) as the
derivative D[f](x) = f'(x). Its kernel is precisely ConstlF., and so the first
isomorphism theorem implies the quotient is isomorphic to flt (D) . But
D is also surjective, since, by the fundamental theorem of calculus , it is
a right inverse to Int: f HJ: f(t) dt (see Example A.2.22). Since Dis
surjective, the induced map D: cn([a,b];IF)/ConstF-+ cn- 1 ([a,b];IF)
is an isomorphism.
2.3. Rank, Nullity, and the First Isomorphism Theorem 43
(iv) For any p E [a, b] C JR let Mp = {! E C([a, b]; IF) I f(p) = O}. We claim
that C([a, b]; IF)/Mp ~IF. Use the linear map ep : C([a, b]; IF) --+IF given
by ep(f) = f(p). The kernel of ep is precisely Mp, and so the result
follows from the first isomorphism theorem.
Proof. If W = {O} or W = V , then the result follows trivially from Example 2.3.2
or Example 2.3.6(i). Thus, we assume that Wis a proper, nontrivial subspace of V.
Let S = {x 1 , ... , x,.} be a basis for W. By the extension theorem (Corollary 1.4.5)
we can choose T = {Yr+1, . . . , yn} so that SUT is a basis for V and SnT = f/J. We
claim that the set Tw = {y,.+ 1 + W, ... ,yn + W} is a basis for V/W . It follows
that dim V = n = r + (n - r) = dim W + dim V /W .
We show that Tw is a basis. To show that Tw is linearly independent, assume
/Jr+l D (Yr+l + W) EB ... EB /Jn D (Yn + W) = 0 + w
Thus, I:7=r+l /3jy j E W, which implies that I:7=r+l /3jYj = 0 (since S spans W,
not T). However since T is linearly independent, we have that each /3j = 0. Thus,
Tw is linearly independent. To show that Tw spans V/W, let v + W E V/W
for some v E V. Thus, we have that v = I: ~= l aixi + I:7=r+l /3jYj· Hence,
v + w = /Jr+l D (Yr+l + W) EB ... EB /Jn D (Yn + W). D
Proof. Note that (2.2) implies that dimV = dimo.A'( L) + dimV/Jf"(L). The
first isomorphism theorem (Theorem 2.3.5) tells us that V / JY (L) ~ ~ (L ), and
dimension is preserved by isomorphism (Proposition 2.2.14(ii)), so dim V/JY (L) =
dim~ (L). 0
Remark 2.3.10. So far , when talking about addition and scalar multiplication
of cosets we have used the notation EB and D in order to help the reader see
that these operations on V /W differ from the addition and scalar multiplication
in V. However, most authors use + and · (or juxtaposition) for these operations
on the quotient space and just expect the reader to be able to identify from con-
text when the operators are being used in V /W and when they are being used
44 Chapter 2. Linear Transformations and Matrices
in V. From now on we also use this more standard, but possibly more confusing,
notation.
Proof. By Lemma 2.3.3, Lis injective if and only if dim(JV (L)) = 0. By the rank-
nullity theorem, this holds if and only if rank(L) = n. However, by Corollary 1.4.7
we know rank(L) = n if and only if~ (L) = W, which means Lis surjective. D
Corollary 2.3.11 implies that two vector spaces of the same finite dimension,
and over the same field lF, are isomorphic if there is either an injective or a surjective
linear mapping between the two spaces. The next corollary shows that such a map
always exists.
Proof. Let T = {x 1, x2, ... , Xn} be a basis for V. Define the map L : lFn -+ V as
L((a1,a2, ... ,an))= a1x1 + a2x2 + · · · + anXn· It is straightforward to check that
this is a linear transformation. We wish to show that L is bijective and hence an
isomorphism. By Corollary 2.3.11, it suffices to show that Lis surjective. Because T
is a basis, every element x E V can be written as a linear combination x = :l:~=l aixi
of vectors in T, so x = L(a 1, ... , an)· Hence, Lis surjective, as required. D
Remark 2.3.14. Corollary 2.3.12 is a big deal. Among other things, it means
that even though there are many different descriptions of finite-dimensional vector
spaces, there is essentially (up to isomorphism) only one n-dimensional vector space
over lF for each nonnegative integer n. When combined with the results of the next
two sections, this allows essentially all of finite-dimensional linear algebra to be
reduced to matrix analysis.
Remark 2.3.15. Again, we note that many vector spaces also carry additional
structure that is not part of the vector-space structure, and the isomorphisms we
have discussed do not necessarily preserve this other structure. So, for example,
while JF[x; n 2 - 1] is isomorphic to lFn as a vector space, and these are both isomor-
2
(I - LK)L = L - L(KL) = L - L = 0.
This implies that every element of &it(L) is in the kernel of (I - LK). But Lis
surjective, so JV (I - LK) = &it (L) = V. Hence, (I - LK) = 0, and I = LK. 0
The following diagram is sometimes helpful when thinking about the second
isomorphism theorem:
V1nVi ~ Vi
The next corollary tells us about the dimension of sums and intersections of
subspaces. This result should seem geometrically intuitive: for example, if W1 and
W2 are two distinct planes (through t he origin) in IR 3 , then their sum is all of IR 3 ,
and their intersection is a line. Thus, we have dim W1 + dim W2 = 2 + 2 = 4, and
dim(W1 n W2) + dim(W1 + W2) = 1 + 3 = 4; see Figure 2.1.
We call the ai above the coordinates of x in the basis S and denote the m x 1
l
(column) matrix of these coordinates by [x] 8 . We proceed similarly for T:
en
C21 C1ml
C2m
[t1, ... ,tm] = [s1, ... , sm] '. . .
[
Cml Cmm
Thus,
11 Inwhat follows, we want to keep track of the order of the elements of a set. Strictly speaking,
a set isn't an object that maintains order-for example, the set {x,y, z} is equal to the set
{z, y , x}-and so we use square brackets to mean that the set is ordered. Thus, [x,y, z] means
that x is the first element, y is the second element, and so on. That said , we do not always put
the word "ordered" in front of the word basis-it is implied by the use of square brackets.
48 Chapter 2. Linear Transformations and Matrices
[x]s =
a1]
a2. [cu
..
C21.
.. ~=1 [:.:] = C[x]r,
am Cm l Cm2 Cmm bm
where C is the matrix [ciJ]. Hence, C transforms the coordinates from the basis
T into coordinates in the basis S. We call C the transition matrix from T into S
and sometimes denote it with the subscripts Csr to indicate that it provides the
S-coordinates for a vector written originally in terms of T , that is,
[x]s = Csr[x]r .
[x 2 -1 ,x +l ,x -l]=[x 2 ,x,l] [~ ~ ~1
-1 1 -1
This gives
[q]s=Csr[q]r = [t ! !] [- [~]
-~ ~ -~
2
2]
6 -5
which corresponds to q(x) = 2x 2 - 2x + 6 = 2(x 2 - 1) + 3(x + 1) - 5(x - 1) .
Example 2.4.2. If S = T , then the matrix Css is just the identity matrix.
Example 2.4.3. To illustrate the importance of order, assume that the vec-
tors of Sand Tare the same but are ordered differently. For example, if m = 2
and if T = [s2, si] is obtained from S = [s 1, s2] by switching the basis vectors,
then the transition matrix is Csr = [~ n
2.4.2 Con structing the M atrix Represe ntation
The next theorem tells us that a given linear map between two finite-dimensional
vector spaces with ordered bases for the domain and codomain has a unique matrix
representation.
Theorem 2.4.4. Let V and W be finite -dimensional vector spaces over the field lF
with bases S = [s 1, s2, ... , sm] and T = [t 1 , t 2 , . . . , tnJ, respectively. Given a linear
transformation L: V -t W, there exists a unique n x m matrix Mrs describing L
in terms of the bases S and T; that is, there exists a unique matrix Mrs such that
Proof. If x = 2.::;'1=1 aJSJ, then L(x) = 2.::;'1= 1 aJL(sJ)· Since each L(sJ) can be
written uniquely as a linear combination of elements of T, we have that L( SJ) =
2.::~ 1 CiJti for some matrix M = [ciJ]· Thus,
[L(x)]r =
b11
b2
..
.
[C11
C21
.
..
~:.:1 r~:1 = M[x]s.
r
bn Cnl Cn2 Cnm am
50 Chapter 2. Linear Transformations and Matrices
Nota Bene 2.4.5. It is very important to remember that the matrix repre-
sentation of L depends on the choices of ordered bases S and T. Different bases
and different ordering almost always give different matrix repre entations.
Remark 2.4.6. The transition matrix we defined in the previous section is just
the matrix representation of the identity transformation Iv, but with basis S used
to represent the inputs of Iv and basis T used to represent the outputs of Iv.
Crs[x]s =
o
[0
2
0 ~] m~ [i] ~ [L(x)Jr.
0 1 0 0 0
0 0 2 0 0
Css = 0 0 0 3 0
0 0 0 0 4
0 0 0 0 0
2.5. Composition, Chang e of Ba sis, and Similarity 51
0 1 0 0 0 4 -12
0 0 2 0 0 - 12 12
[L[p]]s = 0 0 0 3 0 6 -24
0 0 0 0 4 -8 12
0 0 0 0 0 3 0
This agrees with the direct calculation of p' (x) 12x 3 - 24x 2 + 12x - 12,
written in terms of the basis S.
S = [e1,e2, ... ,en] = [(1,0,0, . . . ,0), (0, 1,0, . .. ,0), .. . , (0,0,0, . .. , 1)].
Hence, an n-tuple (a1, ... , an) can be written x = a 1e 1 + · · · + anen, or in matrix
form as
We often write vectors in column form and suppress the notation [ · ]s if the choice
of basis is clear.
Remark 2.4.10. Unless the vector space and basis are already otherwise defined,
when using matrices to represent linear operators, we assume that a matrix defines
a linear operator on lFn with the standard basis {e 1,e 2, . .. , e n} ·
Dus== CurBrs.
So [KL(v) ]u = Dus[v]s, and Dus is the matrix for the transformation KL. In
matrix notation, this gives
[du
d21
d1 2
d22 d,m
d2m 1 [cu
C21
C12
C22
c,,1 [bu
C2n b21
b12
b22 b,m
b2m 1
... ... . '
dpl dp2 dpm Cpl Cp2 Cpn bn1 bn2 bnm
which we may write as Dus= CurBrs. D
Remark 2.5.2. This result tells us, among other things, that the matrix represen-
tation of a linear transformation is an invertible matrix precisely when the linear
transformation is invertible. In this case, we say that the matrix is nonsingular.
If the matrix (or the corresponding transformation) is not invertible, we say it is
singular.
(AB)C = A(BC) .
Proof. First observe that function composition is associative; that is, given any
functions f : V ---+ W, g : W ---+ X and h : X ---+ Y, we have (ho (go f))(v) =
h((g o f)(v)) = h(g(f(v))) =(ho g)(f(v)) =((hog) o f)(v) for every v E V. The
corollary follows since matrix multiplication is function composition. D
2.5. Composition, Change of Basis, and Similarity 53
Theorem 2.5.4. Let Crs be the matrix representation of L : V ---+ W from the
basis S into the basis T. If P8 s is the transition matrix on V from the basis§ into
S and Qy.T is the transition matrix on W from the basis T into T, then the matrix
representation By.§ of L in terms of S and T is given by By.§= Qy.TCrsPss·
Proof. Let [Y]r = Crs [x]s. We have [x]s = P8 s[x]§ and [y]y. = Qy.r[Y]r. When
we combine these expressions we get [y]y. = Qy.TCrsP8 s[x]s. By uniqueness of
matrix representations (Theorem 2.4.4), we know By.§ = Qy.TCrsPss· D
l
[x]s - - - - - - - -- [y]r
Corollary 2.5.6. Let S and § be bases for the vector space V with transition
matrix P8 s from S into S . Let Css be the matrix representation of the operator
L: V---+ V in the basis S. The matrix Bss = (P8 s)- 1 CssPss is the unique matrix
representation of L in the basis S.
Remark 2.5. 7. When the bases we are working with are understood from the con-
text, we often drop the basis subscripts. In these cases, we usually also drop the
square brackets [·] and denote a vector as an n-tuple x = (x 1 , ... , Xn) of its coordi-
nates. However, when multiple bases are being considered in the same problem, it
is wise to be very explicit with bases and write out all the basis subscripts so as to
avoid confusion or mistakes.
Definition 2.5.8. Two square matrices A, B E Mn(lF) are similar if there exists a
nonsingular PE Mn(lF) such that B = p - l AP.
Remark 2.5.11. It is common to talk about the rank and nullity of a matrix when
one really means the rank and nullity of the linear transformation represented by
the matrix. Since any two similar matrices represent the same linear transformation
(but with different bases), they must have the same rank and nullity.
Many other common properties of matrices are actually properties of the linear
transformations that the matrices describe (for example, the determinant and the
eigenvalues), both of which are discussed later. All of these properties must be
the same for any two similar matrices.
p = [ 1 l]
-2 0 and
1
p_ = 1 -1]
2 [o2 1 .
Hence,
D = p- 1 BP=~ [O
2 2
-1]1 [l0 -11][-21 0l] -_[-10 1OJ '
which defines the linear transformation L in the basis S. Thus, D is similar
to B.
where n) n! (2.6)
( j - j!(n-j)!'
The first interesting property of the Bernstein polynomials is that they sum
to 1, as can be seen from t he binomial theorem:
We show that the set { Bj(x)}j=o forms a basis for IF[x; n] and provides the transition
matrix between the Bernstein basis and the usual monomial basis [1, x, x 2 , . .. , xn].
~I
1 0 0 0 1 0 0 0 0
-4 4 0 0 1 1/4 0 0 0
Psr= 6 -12 6 0 and Qrs = 1 1/2 1/6 0 0
-4 12 -12 4 1 3/4 1/2 1/4 0
1 -4 6 -4 1 1 1 1 1
(2.7)
56 Chapter 2. Linear Transform ations and Matrices
Bt
1.0
0.8
0 .6
0.4
0.2
0 .0
Bf Bt Combined
1.0
0.8
0.6
0.4
I
0.2
Proof.
~ . . n'i' .
= L.) -1)'-J ·1 ( . - .);(. - .)' .,x'
i =J J. i J . n i . i.
Theorem 2.6.5. For any n E N the set Tn = {Bj( x)}j=o of degree-n Bernstein
polynomials forms a basis for F[x; n] .
2.6. Important Exampl e: Bernstein Polynomial s 57
(-l)J-k(n) (j) if j ?. k
PJk =
{0
J k
if j < k
for j, k E { 0, 1, ... , n}. (2.8)
If we already knew that the Bernstein polynomials formed a basis, then (2.7) tells
us that P would be the transition matrix Psr from the Bernstein basis Tn into the
power basis S.
Now define the matrix Q = [%]by
if i ?. j
for i,j E {O, 1, ... , n}. (2 .9)
if i < j
Once we see that the set Tn of Bernstein polynomials forms a basis, it will be clear
that matrix Q is the transition matrix Qrs from the basis S to the basis Tn.
We verify that QP =I = PQ by direct computation. When 0 :::; k :::; i :::; n,
the product Q P takes the form
n
(QP)ik = L%PJk
J=O
= Li %PJk = 2:,
J=k
i
J=k
(-l)j-k ( ;") ( " ) = Bk(l). i
When i = k we have that B k( l) = 1, and when k < i we have that Bk(l) = 0.
Also, when 0 :::; i < k :::; n, we have that (QP)ik = 0 since both matrices are
lower-triangular. Hence, QP =I, and by Proposition 2.3.17 we also have P Q = I.
Thus, P and Q have the necessary properties, and the argument outlined
above shows that {Bj(x)}j=o is a basis of JF[x; n] . D
Example 2.6.6. Using the matrix A in Example 2.4.8, representing the deriva-
tive of a polynomial in JF[x; 4], we use similarity to transform it to the Bernstein
basis. Multiplying out P8,j,APsr , we get
58 Chapter 2. Linear Transformations and Matrices
-4 4 0 0 0
-1 -2 3 0 0
P5j,APsr = 0 -2 0 2 0
0 0 -3 2 1
() 0 0 -4 4
Thus, using the representation of p(x) in Example 2.6.3, we express the
derivative as
~1 Iil 1~:1
4 0 0 1
1-·
-1 -2 3 0
0 -2 0 2 -10
0 0 -3 2 1 -4 -12
0 0 0 ·-4 4 -7 -12
Thus, p'(x) = -12B0(x) - 9Bf(x) - lOB~ (x) - 12B3(x) -12B4(x).
where Po, ... , Pn are the control points and Bj(t) are the Bernstein
polynomials.
the linear system into a form that is easy to solve, either directly by inspection or
through a process called back substitution. In this section, we explore row reduction
in greater depth.
Type I: Swapping rows, denoted Ri +-+ Rj. The corresponding elementary matrix
(called a type I matrix) is formed by interchanging rows i and j of the identity
matrix. Left multiplying by this matrix performs the row operation. For
example,
[~ ~ ~][~ ~
001 ghi
;]=[: ~ f]gh.
A type I matrix is also called a transposition matrix.
Type III: Add a scalar multiple of a row to another row , denoted Ri -+ aRj + R i
(with i -/:- j). A type III elementary matrix is formed by inserting a in the
(i,j) entry of the identity matrix. For example, if (i , j) = (2, 1), then we have
Ol [ad
a1 01 0 eb fcl =
[aa + d
a cf l .
ab b+ e ac :
[ 001 ghi g h i
Remark 2. 7.1. All three elementary matrices are found by performing the desired
row operation on the identity matrix.
Remark 2. 7.3. Since the elementary matrices are invertible, row reduction can be
viewed as repeated left multiplication of both sides of a linear system by invertible
matrices. Multiplication by invertible matrices represents a change of basis, and
left multiplication by the elementary matrices can be thought of as simply changing
60 Chapter 2. Linear Transformations and Matrices
the basis of the range, but leaving the domain unchanged. Thus, any solution of the
system EAx = 0 is also a solution of the system Ax= 0, and conversely.
More generally, any solution of the system Ax= bis a solution of the system
EAx = Eb for any elementary matrix E, and conversely. This is equivalent to
saying that a solution to Ax = b can be found by row reducing both the matrix
A and the vector b using the same operations in the same order. The goal is to
judiciously choose these elementary matrices so that the linear system is reduced
to some nice form that is easy to solve. Typically this is either an upper-triangular
matrix or a diagonal matrix.
[i ~] [~~] [~].
To solve it using row operations, or equivalently elementary matrices, we first
swap the rows. This is done by left multiplying both sides by the type I matrix
corresponding to Ri +-+ R2· Thus, we have
Now we eliminate the (2, 1) entry by using the type III matrix corresponding
to R2---+ -2R1 + R2. This yields
(2.10)
2
[ 1
313]
1 4 .
Thus, we can more compactly carry out the row reduction process by writing
2.7. Linear Systems 61
Definition 2. 7.6. The matrix B is row equivalent to the matrix A if there exists a
finite collection of elementary matrices E1, E2, ... , En such that B = E1E2 · · · EnA·
Remark 2. 7.8. Sometimes we say that one matrix is row equivalent to another
matrix, but more commonly we say that one augmented matrix is row equivalent
to another augmented matrix, or, in other words, that one linear system is row
equivalent to another linear system. Thus, if two linear systems (or augmented
matrices) are row equivalent, they have the same solution.
Definition 2. 7.9. A matrix A is in row echelon form (REF) if the following hold:
(i) The first nonzero entry, called the leading entry, of each nonzero row is always
strictly to the right of the leading entry of the row above it.
(ii) All nonzero rows are above any zero rows.
A matrix A is in reduced row echelon form (RREF) if the following hold:
(i) It is in REF.
(ii) The leading entry of every nonzero row is equal to one.
(iii) The leading entry of every nonzero row is the only nonzero entry in its column.
Example 2.7.10. The first two of the following matrices are in REF, the
next is in RREF, and the last two are not in REF. Can you say why?
[~ ~1
2 3
25 0I] ' [I0 01 4 ~] ,
[~
4 2 0
4 5
3 4 7 0
0 6
0 0 6 9 0 0 0 1
0 0
[~ ~1
2
[~ !]
2
4
0
0
5
0
Remark 2. 7.11. REF is the canonical form of a linear system that can then be
solved by what we call back substitution. In other words, when an augmented matrix
is in REF, we can solve for the coordinates of x and substitute them back into the
partial solution, as we did with (2 .10).
62 Chapter 2. Linear Transformations and Matrices
Proof. (==;.) By the previous proposition, any square matrix A can be row reduced
to an RREF upper-triangular matrix B by a sequence E1, E2 , ... , Ek of elementary
matrices such that E 1E2 · · · EkA = B. If A is nonsingular, then it has an inverse
A-1, and so
elementary matrices El, E2, . .. , Ek, we have E1E2 · · · Ek[A JI ] = [I J E1E2 ·· · Ek]·
This shows that E 1E2 · · · EkA =I and hence E 1E 2 · ··Ek= A- 1. D
(2.11)
13The widely used computing libraries for solving linear systems have many efficiencies and en-
hancements built in that are quite sophisticated . We present only the entry-level approach
here.
64 Chapter 2. Linear Transformations and Matrices
We can write this as an augmented linear system and row reduce to RREF
form (using elementary matrices)
Io J .
- 21 0
(2.12)
The first two columns correspond to basic variables (x 1 and x2) , and the third
column corresponds to the free variable X3. The augmented system can then
be written as
X1 = X3,
X2 = -2X3,
126 7314]
[5 [12 314 ] [1 0 -1 1-23 ] .
8 -t 01 2 3 -t 0 1 2
Thus, the system is row equivalent to the reduced system Bx= d, where Bis
given in (2.12)
and d = [-2
3]T. The augmented system can be written as
2.8. Determinants I 65
X 1 = X3 - 2,
X 2 = - 2X3 + 3,
Example 2. 7.19. In the previous two examples, the first and second columns
are basic and obviously linearly independent, and the corresponding columns
from the original matrix are also linearly independent and form a basis of t he
range, so we have
~ (A) = span { [~] , [~]}.
2.8 Determinants I
In this sect ion and the next , we develop the classical theory of determinants. Most
linear algebra t ext s define the determinant using the cofactor expansion. This is a
fair approach, but it is difficult to prove some of the properties of the determinant
rigorously t his way, and so a little hand waving is common. Instead, we develop
the determinant using permutations. While this approach is initially a little more
difficult , it allows us to give a rigorous derivation of the main properties of the
determinant. Moreover, permutations are a powerful mathematical tool with broad
applicability beyond determinants, and so we favor this approach pedagogically.
66 Chapter 2. Linear Transformations and Matrices
2. 8.1 Permutations
Throughout this section, assume that n is a positive integer and Tn = {1, 2, ... , n }.
Remark 2 .8.2. The composition of two permutations on a given set is again a per-
mutation, and the inverse of every permutation is again a permutation. A nonempty
set of functions is called a group if it has these two properties.
Notation 2 .8.3. If O" E Sn is given by O"(j) = ij for every j E Tn, then we denote
O" by the ordered tuple [i1, . . . , in]. For example, if O" E S3 is such that 0"(1) = 2,
0"(2) = 3, and 0"(3) = 1, then we write O" as the 3-tuple [2, 3, 1] .
S1 = {[1]},
S2 = {[1 , 2],[2,1]},
S3 = {[l , 2, 3], [2, 3, l], [3, 1, 2], [2, 1, 3], [l , 3, 2], [3, 2, l]}.
2.8. Determinants I 67
Note that IS1I = 1, IS2I = 2, and IS3I = 6. For general values of n, we have
the following theorem.
Theorem 2.8.5. The number of permutations of the set Tn is n!; that is, ISn I = n!.
[n, O"(l), ... , O"(n - 1)], [O"(l), n, .. . , O"(n - 1)], . .. , [0"(1), ... , O"(n - 1), n].
All permutations of Tn can be produced in this way, and no two are the same. Thus,
there are n · (n - 1)! = n! permutations in Sn· 0
1 if O" is even,
signO" ={
- 1 if O" is odd.
Example 2.8.9. A permutation that swaps two elements, fixing the rest, is
a transposition. More precisely, O" is a transposition of Tn if there exists
i,j E Tn, i =J j, such that O" (i) = j, O"(j) = i, and O"(k ) = k for every
k =J i, j. For example, the permutations [2, 1, 3, 4, 5] and [1 , 2, 5, 4, 3] are both
transpositions of T5.
Assuming i < j, the inversions of the transposition are (j, i), (j, i + 1),
(j, i + 2) , ... , (j , j - 1) and (i + 1, i), (i + 2, i), ... , (j - 1, i). This adds up to
2(j - i) - 1, which is odd. Hence, the sign of the transposition is -1.
Lemma 2.8.10 (Equivalent Definition of sign(u)). Let O" E Sn· Consider the
polynomial p(x 1 , x 2 , ... , Xn) = Tii<j(xi -Xj ). The sign of a permutation is given as
(2.13)
68 Chapter 2. Linear Transformati ons and Matrices
Theorem 2.8.1 2. The sign of the composition of two permutations is equal to the
product of their signs; that is, if O", T E: Sn, then sign ( O" o T) = sign ( O") sign (T) .
Proof. Note that x 1 ,x 2 , ... ,Xn in (2.13) are dummy variables, so we could just
as well have used Yn, Yn-1, . .. , YI or any other set of variables in any other initial
order. Thus, for any O", T E Sn we have
. ( ) p(xCT(r(I))>XCT(r(2)),· · ·,XCT(T(n)))
sign O"T = -~~'-'---~~-----'---'--'-'--
Proof. If e is the identity map, then l = sign(e) = sign(0"0"- 1) = sign(O") sign(0"- 1).
Thus , if O" is even (respectively, odd), then so is 0"- 1. D
Definition 2.8.1 4. Let A = [aij] be in Mn(IF) and O" E Sn. The elementary
product associated with O" E Sn is the product a1CT(l)a2CT(2) · · · anu(n).
2.8. Determinants I 69
Remark 2.8.15. An elementary product contains exactly one element from each
row of A and exactly one element of each column.
(2.14)
and let r:J E 53 be given by [3, 1, 2]. The elementary product is a13a2 1a32 .
Table 2.1 contains a complete list of permutations and their corresponding
elementary products for a 3 x 3 matrix.
Table 2.1. A list of the permutations, their signs, and the corresponding
elementary products for a generic 3 x 3 matrix.
Proof. Since A = [ai1] is upper triangular, then aij = 0 whenever i > j. Thus,
for an elementary product a 1a(l)aza(Z) · · · ana(n) to be nonzero, it is necessary that
CJ(k) ~ k for each k . But O"(n) ~ n implies that O"(n) = n. Similarly, O"(n-1) ~ n -1,
but since CJ(n) = n, it follows that O"(n - 1) = n - 1. Working backwards , "':Ne see
that O"(k) = k for all k. Thus, O" is the identity, and all other elementary products
are zero. Therefore, <let (A) = a11a22 · · · ann · D
Proof. Let B = [bij] =AT; that is, bij = aji· It follows that
As O" runs through all the elements of Sn, so does 0"-1; thus, by writing T = O"-l,
we have
2.9 Determinants II
In this section we show that determinants can be computed using row reduction.
This provides a practical approach to computing the determinant, as well as a
way to prove several important properties of determinants. We also show that
the determinant can be computed using cofactor expansions. From there we prove
Cramer's rule and define the adjugate of a matrix, which relates the determinant
to the inverse of a matrix.
(ii) type II (Rk-+ aRk}, then det (B) = adet (A) (this holds even if a = O);
(iii) type III (Re-+ aRk +Re}, with k # e, then det (B ) = det (A).
Here the last equality comes from setting v = o-T and noticing that when o-
ranges through every element of Sn, then so does v.
(ii) Note that bij = aij when i # k and bij = aaij when i = k. Thus,
(iii) Note that bij = aij when i # l and bij = % + aakj when i = l. Thus,
In the proof of (iii) above, we also proved the following corollary of (i) .
Remark 2.9.3. For notational convenience, we often write the determinant of the
matrix A= [aiJl E Mn(IF) as
au a12 a1n
a21 a22 a2n
det (A) =
-2 -4 -6 1 2 3 1 2 3
4 5 6 = - 2· 4 5 6 = -2· 0 -3 -6
7 8 10 7 8 10 0 - 6 -11
1 2 3 1 2 3
= 6. 0 1 2 = 6 . 0 1 2 =6.
0 -6 -11 0 0 1
Corollary 2.9.5. If A E Mn(IF), then det (aA) =an <let (A), a E IF.
det (AB) = det (EkEk - 1 · · · E1B) =<let (Ek) det (Ek- 1) · · · det (E1) det (B)
= det (EkEk - 1 · · · E 1) det (B) = det (A) <let (B). 0
Proof. det(B) = <let (P- 1 AP) = <let (P- 1 ) <let (A) <let (P) =<let (A). 0
Definition 2.9.11. The (i, j) submatrix Aij of A E Mn(IF) is the (n -1) x (n -1)
matrix obtained by removing row i and column j from A . The (i,j) minor Mij of
A is the determinant of Aij. We call ( - 1)i+ j Mij the (i, j) cofactor of A .
Ex ample 2.9.12. Using the matrix A from Example 2.9.4, we have that
A23 =
[-2 -4]
7 8
.
Lemma 2.9.13. If Aij is the (i,j) submatrix of A E Mn(IF), then the signed
elementary products associated with (- l)i+JaijAij are the same as those associated
with A .
74 Chapter 2. Lin ear Transformations and Matrices
Remark 2 .9 .14. How can the elementary products associated with (-l)i+JaijAij
be the same as those associated with A? Suppose we fix i and j and then sort
the elementary products of the matrix into two groups: those that contain aij, and
those that do not. We find that there are (n-1)! elementary products that contain
aij and (n-1) · (n-1)! elementary products that do not contain aij· So, the number
of elementary products of Aij is the same as the number of elementary products of
A that contain aij .
Recall also that an elementary product contains exactly one element from each
row of A and each column of A (see Remark 2.8.15). The elementary products of A
that contain aij as a factor cannot have another element that lies in row i or column
j as a factor; in fact the other factors of the elementary product must come from
precisely the submatrix Aij. All that remains is to verify the correct sign of the
elementary product and to formalize the proof of the lemma. This is done below.
The number of inversions of O" equals the number of inversions of a plus the number
of inversions that involve j. Thus, we need only find the number of inversions that
depend on j.
Let k be the number of elements of the n-tuple O" = [0"(1), 0"(2), . .. , O"(n)]
that are greater than j, yet whose position in then-tuple precede that of j. Thus,
n - j - k is the number of elements of O" greater than j that follow j. Since there
are n - i spots that follow j, it follows that there are (n - i) - (n - j - k) elements
smaller than j that follow j. Hence, there are k+ (n-i)-(n- j - k) = i + j +2(k-i)
inversions due to j. Since (-l)i+i+ 2 (k-i) = (-l)i+j, the signs match the elementary
products of A with those in (-l)i+iaiJAij · D
Theorem 2.9.16. Let Mij be the (i,j) minor of A E Mn(lF). If A = [aij], then
n
(i) for any fixed j E {l, . . . , n}, we have <let (A) = 2._)-l)i+jaijMij;
i= l
n
(ii) for any fixed i E {l , . .. ,n}, we have det(A) = 2._) - l)i+jaijMij·
j=l
Proof. For each (i,j) pair, there are (n-1)! elementary products in Aij · By taking
a column sum of cofactors of A (as in (i)), or by taking a row sum (as in (ii)), we
get n! unique elementary products, and by the previous lemma, each of these occurs
with the correct sign. D
Example 2.9.17. Theorem 2.9.16 says that we can split the determinant into
a sum like the one below,
or alternatively as
Ex ample 2.9.18. If
A~ [l !~ i1
we may expand det(A) in cofactors along the bottom row (choose i = 4 in
76 Chapter 2. Linear Transformations and Matrices
1 3 4 3
M42 = 5 7 8 = :~.:)-1)H 1 ai1Mi1
1 2 3 i =l
Definition 2.9.19. Assume Mij is the (i,j) minor of A = [%] E Mn(IF). The
adjugate adj (A) = [ciJ] of A is given by Cij = (-l)i+J MJi (note that the indices of
the cofactors are transposed (i,j) f-t (j,i) in the adjugate) .
Remark 2.9.20. Some people call this matrix the classical adjoint, but this is a
confusing name because the adjoint has another meaning that is more standard
(used in Chapter 3) . We always call the matrix in Definition 2.9.19 the adjugate.
Example 2.9.21. Consider the matrix A from Example 2.9.4. Exercise 2.48
shows that
adj (A) = [ ;
-3
;g -~2]
- 12 6
.
The adjugate can be used to give an explicit formula for the inverse of a
matrix. Although this formula is not well suited to computation, it is a powerful
theoretical tool.
Proof. Let A = [aiJl and adj (A) = [ciJ]. Note that the (i, k) entry of A adj(A) is
Now if i = k, then the (i, i) entry of A adj(A) is L,7= 1 (-l)i+JaiJMiJ> which is equal
to det(A) by Theorem 2.9.16.
2.9. Determinants II 77
~a c = ~(-l)k+ja ·M . =
L.., iJ Jk L.., iJ kJ
{det(A)
0
if i = k,
"f . _/.. k 0
j=l j =l I i r .
One of the most important applications of the adjugate is Cramer's rule, which
allows us to write an explicit formula for the solution to a system of linear equations.
Although the equations become unwieldy very rapidly, they are useful for proving
various properties of the solutions, and this can often make solving the system much
simpler.
Moreover, if Ai(b) E Mn(IF) is the matrix A with the ith column replaced by b, then
the ith coordinate of x is
det (Ai(b))
(2 .1 6)
Xi= det(A) ·
-2 -4
A= [ ~ 5 -61
6
10
'
8
78 Chapter 2. Linear Transformations and Matrices
6 -4 -6 -2 6 -6
det(A1(b)) = 6 5 6 = -:30, <let (A2(b)) = 4 6 6 = 132,
1 8 10 7 1 10
-2 -4 6
and <let (A3(b)) = 4 5 6 = -84.
7 8 1
Exercises
Note to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
first and second line are for Section 1., the exercises between the second and third
lines are for Section 2, and so forth.
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *). We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with .&. are especially important and are likely to be used later
in this book and beyond. Those marked with t are harder than average, but should
still be done.
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
2.1. Determine whether each of the following is a linear transformation from JR. 2
to JR. 2. If it is a linear transformation, give JV (L) and flt (L ).
(i) L(x,y) = (y,x) .
(ii) L(x, y) = (x, 0).
(iii) L(x, y) = (x + 1., y + 1) .
(iv) L(x,y) = (x 2 ,y 2 ).
2.2. Recall the vector space V = (0, oo) given in Exercise 1.1. Show that the
function T(x) = logx is a linear transformation from V into R
2.3. Determine whether each of the following is a linear transformation from JF[x; 2]
into JF[x; 4]. Prove your answers.
(i) L maps any polynomial p(x) to x 2; that is, L[p](x) = x 2 .
(ii) L maps any polynomial p(x) to xp(x); that is, L[p](x) = xp(x).
Exercises 79
Hint: Use the rank-nullity theorem (Corollary 2.3 .9) on Ka(L) : 3f (L)---+ X,
which is K restricted to the domain 3f ( L).
80 Chapter 2. Linear Transform ations and Matrices
2.14. Given the setup of Exercise 2.13, prove the following inequalities:
(i) rank(KL) ~ min(rank(L), rank(K)).
(ii) rank(K) + rank(L) - dim(W) ~ rank(KL).
2.15. Let {W1, W2, ... , Wn} be a collection of subspaces of t he vector space V.
Show that the mapping L : V -t V/W1 x V/W2 x · · · x V/Wn defined
by L(v) = (v + W1, v + W2, .. . , v + Wn) is a linear transformation and
J1f (L) = n~ 1 Wi. This is the vector-space analogue of the Chinese remain-
der theorem . The Chinese remainder theorem is treated in more detail in
Chapter 15.
2.16.* Let V be a vector space and suppose that S s;:; T s;:; V are subspaces of V.
Prove
2.17. Let L be a linear operator on JF2 that maps the basis vectors e 1 and e 2 as
follows:
L(e1) = e1 + 21e2 and L(e2 ) = 2e1 - e2.
(i) Compute L(2e1 - 3e2) and L 2(2e1 - 3e2) in terms of e 1 and e2.
(ii) Determine the matrix representations of L and L 2.
2.18. Assuming the polynomial bases [1,x,x 2] and [1,x,x 2,x 3 ,x 4 ] for IF[x;2] and
IF [x; 4], respectively, find the matrix representations for each of the linear
transformations in Exercise 2.3.
2.19. Let L: IF[x; 2] -t lF[x; 2] be given by L[p] = p + 4p'. Find the matrix repre-
sentation of L with respect to the basis S = [x 2 + 1, x - 1, 1] (for both the
domain and codomain).
2.20. Let a =f. 0 be fixed, and let V be the space of infinitely differentiable
real-valued functions spanned by S = [e"'x,xe"'x,x 2 e"'x]. Let D be a lin-
ear operator on V given by D[f](x) = f' . Find the matrix A representing D
on S. (Tip: Save your answer. You need it to solve Exercise 2. 41. )
2.21. Recall Euler's identity: ei 6 = cosB+i sinB . Consider C([O, 2n]; q as a vector
space over C. Let V be the subspace spanned by the vectors S = [eilJ, e- ilJ].
Given that T = [cosB,sinB] is also a basis for V ,
(i) find the transition matrix from S into T;
(ii) find the transition matrix from T into S;
(iii) verify that the transition matrices are inverses of each other.
2.22. Let V be the vector space V of the previous exercise. Let D : V -t V be the
derivative operator D(f) = ddlJ f (B). Write the matrix representation of D in
terms of
(i) the basis Son both domain and codomain;
(ii) the basis Ton both domain and codomain;
(iii) the basis S on the domain and basis T on the codomain.
2.23. Given a linear transformation defined by a matrix A, prove that the range of
the transformation is the span of the columns of A.
Exercises 81
2.24. Show that the matrix representation of the operator L 2 : IF'[x; 4] -+ IF'[x; 4]
given by L2[p](x) = p"(x) on the standard polynomial basis [l,x,x 2 ,x 3 , x 4 ]
equals (Css) 2 , as given in Example 2.4.8. In other words, the matrix rep-
resentation of the second derivative operator is the square of the matrix
representation of the derivative operator.
2.25. The trace of an n x n matrix A = [aij], denoted tr (A) , is the sum of its
diagonal entries; that is, tr (A) = au + a22 + · · · + ann· Show the following
for any n x n matrices A, B E Mn(IR) (where B = [bij]):
(i) tr (AB)= tr (BA).
(ii) If A and B are similar, then tr (A) = tr (B). This implies, among
other things, that the trace is determined only by the linear operator
and does not depend on the choice of basis we use to write the matrix
representation of that operator.
(iii) tr (ABT) = L~= l 2::;= 1 %bij·
2.26. Let p(z) = akzk +ak_ 1zk-l + · · ·+a 1z+ao be a polynomial. For A E Mn(IF'),
we define p(A) = akAk + ak_ 1Ak-l + · · · + a1A + aol. Prove: If A and Bare
similar, then p(A) and p(B) are also similar.
2.27. Prove that similarity is an equivalence relation on the set of all n x n matrices.
What are all the equivalence classes of 1 x 1 matrices?
2.28. Prove that if two matrices are similar and one is invertible, then so is the
other.
2.29. Prove that similarity preserves rank and nullity of matrices.
2.30. Prove: If IF'm ~ IF'n, then m = n. Hint: Consider using the rank-nullity
theorem.
(ii) Write the matrix representation of the derivative D: IF[x; n]---+ IF[x; n-1]
in terms of the Bernstein bases.
(iii) The previous step is different from Example 2.6.6 because the domain
and codomain of the derivative are different. Use the previous step and
Exercise 2.32 to write a product of matrices that gives the matrix rep-
resentation of the derivative IF[x; n] ---+ IF[x; n] (as an operator) in terms
of the Bernstein basis.
2.34. The map I: IF[x; n] ---+IF given by fr-+ J;
f(x) dx is a linear transformation.
Write the matrix representation of this linear transformation in terms of the
Bernstein basis for IF[x; n] and the (single-element) standard basis for IF.
2.35. t Define the nth Bernstein operator Bn : C( [O, 1]; IF) ---+ IF[x; n] to be the linear
transformation
n
Bn[f](:r:) = "L, J(j/n)Bj(x).
j=O
For any subspace WC C([O, 1]; IF), the transformation Bn also defines a linear
transformation W ---+ IF[x; n] .
(i) Find the matrix representation of B 1 : IF[x; 2]---+ IF[x; 1] in terms of the
Bernstein bases.
(ii) Let V c C([O, l];IF) be the two-dimensional subspace spanned by the
set S = [sin x, cos x] . For every n 2: 0, write the matrix representation
of the linear transformation Bn : V---+ IF[x; n] in terms of the basis S of
V and the Bernstein basis of IF[x; n].
2.36. Using only type III elementary matrices, reduce the matrix
[l ~]
0
(i) A= 1
0
A~ [~ ~]
0
(ii) 1
0
Exercises 83
[~ l]
2
(iii) A= 5
7
[1 ~]
2
(iv) A = 2
0
A~ [l ~]
2
(v) 25
8
2.38. &
(i) Let ei and ej be the ith and jth standard basis elements, respectively,
(thought of as column vectors). Prove that the product e iej is the
matrix with a one in its (i,j) entry and zeros everywhere else.
(ii) Let u, v E JFn, a E JF, and av Tu i= 1. Prove that
T
(I -auv T)-l -_I - auv .
avTu - 1
2.43. How many inversions are there in the permutation a = [l, 4, 3, 2, 6, 5, 9, 8, 7]?
2.44. List all the permutations in S4 .
2.45. Use the definition of determinant to compute the determinant of the matrix
2.46. Prove that if a matrix A has a row (or column) of zeros, then det(A) = 0.
84 Chapter 2. Linear Transform ations an d Matrices
2.47. Recall (see Definition C.1.3) that the Hermitian conjugate AH of any m x n
matrix is the conjugate transpose
A H =A
-T
= [-aji.j
Prove that for any A E Mn(C) we have det(AH) = det(A).
M= [~ ~],
then det(M) = det(A) det(D). Hint: Use row operations.
2.51. Let x , y E IF'n, and assume yHx =f. l. Show that det(I - xyH) = 1 - yHx.
Hint: Note that
M= [~~],
where A is invertible, then det(M) = det(A) det(D - CA- 1 B) . Hint: Use
Exercise 2.50 and the fact that
[CA B] [A OJ
D = C: I
[J0 A- B ]
1
D - CA- 1 B .
2.53. Use Corollary 2.9.23 to compute the inverse of the matrix
A~[~ Hl
2.54. Use Cramer's rule to solve the system Ax= b in Exercise 2.37(v).
2.55.t Consider the matrix
x20
~ l'. x~l.
Xo
X1 x21 x?
.
Vn
Xn x2n xn
n
Prove that
det(Vn) =IT (xj - xi )·
i<j
Hint: Row reduce the transpose. Subtract xo times row k - 1 from row k for
k = 1, ... , n . Then factor out all the (xk - x 0 ) terms to reduce the problem
and proceed recursively.
Exercises 85
2.56. t The Wronskian gives a sufficient condition that determines whether a set of n
l
functions in the space cn- 1 ([a, b]; IF) is linearly independent. The Wronskian
W(x) of a set of functions <cf = {yl (x), Y2(x), . . . , Yn(x)} is defined to be
Y1 (x) Y2(x)
Notes
As mentioned at the end of the previous chapter, some standard references for the
material of this chapter include [Lay02, Leo80, 0806, Str80, Ax115] and the video
explanations of G. Strang in [StrlO].
Inner Product Spaces
14 By the word algebraic we mean anything related to the study of mathematical symbols and the
rules for manipulating these symbols.
15 In a real vector space, the inner product gives us angles by way of the law of cosines (see
Section 3.1), but in a complex vector space, the idea of an angle breaks down somewhat and
isn't very intuitive. However, for both real and complex vector spaces, orthogonality is well
defined by the inner product and is an extremely useful concept in theory, application, and
computation.
l6 An orthonormal set is one where each vector is orthogonal to every other vector and all of the
vectors are of unit length.
87
88 Chapter 3. Inner Product Spaces
one typically has to solve a linear system in order to determine the coordinates; see
Example l.2.14(iii) and Exercise 1.6 for details.
After establishing this and several other important properties of orthonormal
bases, we show that any finite linearly independent set can be transformed into
an orthonormal one that preserves the span. This algorithm is called the Gram-
Schmidt orthonormalization method, and it is widely used in both theory and ap-
plications. This means that we can always construct an orthonormal basis for a
given finite-dimensional vector space oir subspace, and then bring to bear the power
of the inner product in determining the coefficients of a given vector in this new
orthonormal basis.
More generally, the inner product also gives us the ability to compute the
lengths of vectors (or distances between vectors by computing the length of their
difference). In other words, the inner product induces a norm (or length function)
on a vector space. Armed with this induced norm, we can examine many ideas
from geometry, including perimeters, angles between vectors and subspaces, and
the Pythagorean theorem. A norm function also gives us the ability to establish
convergence properties for sequences, which in turn enables us to take limits of
vector-valued functions; this is at the heart of the theory of calculus on vector spaces.
Because of the far-reaching consequences of norm functions, we take a brief
detour from inner products and devote a section to the general properties of norms.
In particular, we consider examples of norm functions that are very useful in math-
ematical analysis but cannot be induced by an inner product. This allows us to
expand our way of thinking and understand the key properties of norm functions
apart from inner products and Euclidean geometry. Given a linear map from one
normed vector space to another, we can look at how these linear maps transform
unit vectors. In particular, we look at the maximum distortion obtained by map-
ping the set of all unit vectors. We show that this quantity is a norm on the set of
linear t ransformations from the domain to the codomain.
We return to properties of the inner product by considering how they affect
linear transformations. We introduce the adjoint map and fundamental subspaces.
These ideas help us to decompose the domains and codomains of a linear trans-
formation into complementary subspaces that are orthogonal to one another. As a
result, we are able to form orthonormal bases for each of these subspaces. One of
the biggest applications of this is least squares, which is fundamental to classical
statistics. It gives us the ability to fit lines to data and, more generally, to solve
many curve-fitting problems where the unknown coefficients can be formulated as
linear combinations of certain basis functions.
A vector space V together with an inner product (·, ·) is called an inner product
space and is denoted by the pair (V, (·, ·)).
Remark 3.1.2. If lF = JR., then the conjugate symmetry condition given in (iii)
simplifies to (x ,y) = (y, x ).
Proposition 3.1.3. Let (V, (., ·)) be an inner product space. For any x , y, z E V
and any a E lF, we have
Proof.
Remark 3.1.4. From the proposition, we see that an inner product on a real vector
space is also linear in its first entry; that is, (ax+ by, z) = a (x, z) + b (y, z) . Since
it is linear in both entries, we say that the inner product on a real vector space is
bilinear. For complex vector spaces, however, the conjugate symmetry only makes
the inner product half linear in the first entry since sums can be pulled out of
the inner product, but scalars come out conjugated. Thus, the inner product on a
complex vector space is called sesquilinear, 17 meaning one-and-a-half linear.
Example 3.1.5. In JR.n, the standard inner product (or dot product) is
n
(x, y ) = x Ty = L aibi, (3.1)
i= l
17 Some books define the inner product to be linear in the first entry instead of the second, but
linearity in the second entry gives a more natural connection to complex inner products.
90 Chapter 3. Inner Product Spaces
(3.4)
(3.5)
Remark 3 .1.8. In each of the three examples above, we give the real case and the
complex case separately to emphasize the importance of the conjugate term in the
inner product. Hereafter, we dispense with the distinction and simply write IF'n,
L 2 ([a, b]; IF'), and Mmxn(IF'), respectively. In each case, we use the complex version
of the inner product, since the complex inner product reduces to the real version if
the corresponding field is real.
Remark 3.1.9. Usually when denoting the inner product we just write (-, ·). How-
ever, if the theorem or problem we are studying has multiple vector spaces and/or
inner products, we distinguish the different inner products with a subscript denoting
the space, for example, (-, )v·
Example 3.1.10. On the vector space f, 2 over the field IF' (see Examples
l.l.6(iv) and l.l.6(vi)) , the standard inner product is given by
00
Remark 3.1.12. By Definition 3.1.l(i), we know that llxll 2': 0 for all x E V .
We call this property positivity. Moreover, we know that ll x ll = 0 if and only
if x = 0. We can also show that the length function preserves scale; that is,
llax ll = J (ax, ax) = Jlal 2 (x , x ) = lal ll x ll · The function II · II also has another key
property called the triangle inequality; this is examined in Section 3.5.
Remark 3.1.15. Note that (y, x ) = 0 if and only if (x, y) = 0. In other words,
orthogonality is a symmetric relation between two vectors.
The zero vector 0 is orthogonal 18 to every x EV, since (x, x) + (x, 0) = (x, x).
In the following proposition, we show that the converse is also true.
Proof. If (x ,y) = 0 holds for all x EV, then it holds when x = y. Thus, we have
that 0 = (y,y) = ll Yll 2 , which implies that y = 0. D
18 There is no consensus in the literature as to whether the two vectors in the definition should
have to be nonzero. As we see in the following section, it doesn 't really matter that much since
we are usually more interested in orthonormality, which is when the vectors are orthogonal and
of unit length .
92 Chapter 3. Inner Product Spaces
Definition 3.1.18. Let (V, (-, ·)) be an inner product space over JR. We define the
angle between two nonzero vectors x and y to be the unique angle E [O, n] such e
that
(x,y)
(3.8)
cos e = l xll llYll.
Remark 3.1.19. In IR 2 it is straightforward to show that (3.8) follows from the
law of cosines (see Figure 3.1), given by
Example 3.1.21. In L 2 ([0, l]; <C) the vectors f(t) = e 27rit and g(t) = e 10r.it
are orthogonal, since
1_ . 1 . e87ritll
(!, g) =
1 o
e2r.it e l0m t dt =
10
e8mt dt = - -. = 0.
8ni 0
A similar computation shows that Iii I = llgll = 1. If h = 4e 2 7rit +3e 10 rrit, then
by the Pythagorean law, we have llhll 2 = ll4fll 2 + ll3gll 2 = 25 .
3.1. Introduction to Inner Products 93
Definition 3.1.22. For any vector x E V, x =f. 0, and any v E V , the orthogonal
projection of v onto span( {x}) is the vector
. x
proJspan({x})(v) = (x, v) llxll2. (3.9)
so despite the fact that the definition of projspan( {x}) depends explicitly on x, it
really is determined only by the one-dimensional subspace span( {x}). Neverthe-
less, for convenience we usually write projx(v), instead of the more cumbersome
projspan( {x}) (v) ·
r = v - projx v v
Proof. It is clear from the definition that projx is a linear operator. Since the
projection depends only on the span of x , we may replace x by the unit vector
u = x/llxll
(i) For any v EV we have
proju(proju(v)) = (u, proju(v)) u = (u, (u, v) u) u
= (u, v) (u, u) u = (u, v) u = proju(v).
(ii) To show that au J.. r for any a E lF, take the inner product
(au, r) = (au, v - pro ju (v)) = (au, v - (u, v) u)
= (au, v) - (au, (u, v) u) =a (u, v) - a (u, v) (u, u) = 0.
(iii) Given any vector x E span( {u} ), the square of the distance from v to xis
2 2 2
llv - xll 2 = llr + proju(v) - x ll =llrll +11 proju(v) - x11 .
Example 3.1.25. Consider the vector space IR[x] with inner product (!, g) =
f~ 1 f(x)g(x) dx. If f(x) = 5x 3 - 3:x, then
where
8 - { 1 if i = j,
iJ - 0 if i =f. j
is the Kronecker delta.
Example 3.2.2. Let V = C([O, l ]; JR) . We verify that S = {1, vU(x -1/2)}
is an orthonormal set with the inner product
1
(f, g) = fo f(x)g(x) dx . (3.10)
1
Note that (1, vU(x - 1/2)) = vU f 0 (x - ~) dx = vU (~ - ~) = 0. Also
1 1
(1, 1) = f0 1 dx = 1 and ( vU(x - 1/2), vU(x - 1/2)) = f0 12(x- 1/2) 2 dx =
2 2d
J l/2
-1 / 2 1 u u = 1.
Proof.
Remark 3.2.4. The values ai = (xi, x ) in Theorem 3.2.3(i) are called the Fourier
coefficients of x. The ability to find these coefficients by simply taking the inner
product is the hallmark of orthonormal sets.
Definition 3.2.6. Let {xi} ~ 1 be an orthonormal set that spans the subspace
X C V. For any v E V we define the orthogonal projection onto X to be the sum
of vector projections along each x i; that is,
m
proh(v) = L (xi, v) xi . (3.11)
i=l
r v
= (x, v) - / f
\i=l
cixi, v) = (x, v) - (x , v) = 0.
(iii) Given any vector x EX, the square of the distance from x to vis
2 2 2
llv - xll = llr + proh(v) - x ll = ll r ll +II proh(v) - x ll 2 ·
The last term II proh(v) - xll 2 is always nonnegative and is zero only when
proh(v) = x, and thus llv - x ii is minimized when x = proh(v). D
Remark 3.2.8. Theorem 3.2.7(iii) above shows that the linear map proh depends
only on the subspace X and not on the particular choice of orthonormal basis
{xi}~1 ·
19
Many textbooks use the term unitary for complex-valued orthonormal square matrices, and they
call real-valued orthonormal matrices orthogonal matrices. In this text we prefer to use the term
orthonormal for both of these because it more accurately describes the essential nature of these
matrices, and it allows us to treat the real and complex cases identically.
3.3. Gram-Schmidt Orthonormalization 99
Not a B ene 3 .2.18. One way to determine if a given linear operator on lFn
is orthonormal (with t he usual inner product) is t o represent it as a matrix in
the standard basis. If t hat mat rix is ort honormal, t hen the linear operator is
orthonormal.
Xk - Pk-1
qk - k = 2, ... , n ,
- ll xk - Pk-1 II '
where
k- 1
Pk-1 = projQ"_ 1 (xk) = L (qi, xk ) q i (3.15)
i=l
is the projection of xk onto Qk-1 = span ({q1, ... , qk-d).
X2 - Pl 1
and q2 = ll x2 - Pill = J6 [~ll ·
and
Example 3.3.3. Consider the inner product space JF[x; 2] with the inner
product
1
(f, g) = /_ f(x)g( x )dx. (3.16)
1
i i / i ) i
qi = 1fij = J2 and P1 = \ J2, x J2 = 0.
3.3. Gram- Schmidt Orthonormalization 101
It follows that
Corollary 3.3.6. If (V, (., · )v) is an n -dimensional inner product space over lF,
then it is isomorphic tor with the standard inner product (3.2) .
102 Chapter 3. Inner Product Spaces
Proof. Using Theorem 3.3.1, we choose an orthonormal basis [x1, . . . , xn] for V
and define the map L : V -t IFn by
This is clearly a bijective linear transformation. To see that the inner product is
preserved, we note that if x = 2=7= 1 bjxj and y = L~=l ckxk, then
Remark 3.3. 7. Just as Corollary 2.3.12 is a big deal for vector spaces, it follows
that Corollary 3.3.6 is a big deal for inner product spaces. It means that although
there are many different descriptions of finite-dimensional inner product spaces,
there is essentially (up to isomorphism) only one n -dimensional inner product space
over IF for each n E N. Anything we can prove on IFn with the standard inner
product automatically holds for any finite-dimensional inner product space over IF.
Example 3.3.8. Recall that the orthonormal basis {q 1 , q2, q3}, computed in
Example 3.3.3, has the following transition matrix:
0
V3
0 ~
y'5
l·
Consider the linear isomorphism L : IF[x; 2] -t IF 3, given by L(qi) = ei·
Since [q 1 , q2, q3] is orthonormal, this map also preserves t he inner product,
so (j,g)IF[x; 2] = (L(f),L(g))JF3·
To express this map in terms of the power basis [l , x, x 2 ], first write an
arbitrary element of IF[x; 2] as p(x) = ax 2 +bx+ c = [1, x, x 2 ] [c b a]T, and
then compute L(p) as
Proof. Since A has rank n, the columns of A are linearly independent, so we can
apply Theorem 3.3.1. Let x 1 , ... , Xn be the columns of A. By the replacement
theorem (Theorem 1.4.1) there are vectors X n+ 1 , ... , Xm such that space lFm is
spanned by x1 , ... , Xm·
Using the notation in Theorem 3.3.1, let Po = 0, let rjk = (qj, xk) for j =
1, 2, ... , k - 1, and let rkk = [[ xk - Pk-1 [[. We have
X1 = r11q1,
x2 = r12q1 + r22q2 ,
Let Q be the orthonormal matrix [q1 , ... ,qmJ, and let R be the first n columns of
the upper-triangular matrix [rij] above. Since the columns of A are x1, ... , xn, this
gives A = QR, as required. D
Remark 3.3.10. Along with the full QR decomposition given in Theorem 3.3.9,
another useful decomposition is the reduced QR decomposition. Given a matrix
A E Mmxn of rank n::::; m, the reduced QR decomposition of A is the factorization
A = QR, where Qis an mxn orthonormal matrix and Ris an nxn upper-triangular
matrix.
-2
3
3
-2
3.5
-0.5
2.5
0.5
l
Example 3.3.11. To compute the reduced QR decomposition of
104 Chapter 3. Inner Product Spaces
T T 1 [ T
x 1 =[1 1 1 1] , x 2 =[-2 3 3 -2) , and x3="2 7 -1 5 l].
x2 - P1 = x 2 - r12c11 = 2
S [- 1 1 1 -l]T .
~
R=
[ru0 (q1,x2 )
llx2 - Pill
0 0
For the full QR decomposition, we can choose any additional x 4 such that
[x 1, x 2, x 3, x 4] spans JR 4, and then continue the Gram-Schmidt process. A con-
venient choice in this example is x4 = e4 = [0 0 0 1JT . We calculate r 14 =
(q1,x4) = 1/2 and rz4 = r34 = -1/2 . This gives p4 = ~(q1 - qz - q3) and
i
X4 - p 4 = [-1 -1 1 l]T , from which we find q4 = ~ [-1 -1 1 l]T.
Thus for the full QR decomposition, we have
and
l
rn
R= 0
0
0
Remark 3.3.13. The QR decomposition can also be used to solve a linear system
of the form Ax = b. By writing A = QR we have QRx = b, and since Q is
orthonormal, we have
Rx = QT QRx = QTb.
Since R is upper triangular, the system R x= QTb can be backsolved to find x. This
method takes about twice as many operations as Gaussian elimination, but is more
stable. In practice, however, the cases where Gaussian elimination is unstable are
extremely rare, and so Gaussian elimination is usually preferred for solving dense
linear systems.
Nota Bene 3.4.2. Beware that although vis a vector, v-1 is a et (in fact a
whole subspace)- not a vector.
Recall that projv(x) is the projection ofx onto the nonzero vector v, and thus
the residual x - projv(x) is orthogonal to v; see Proposition 3.l.24(ii) for details.
This means that we can project an arbitrary vector x E !Fn onto v_l_ by the map
v
projv.L (x) = x - projv(x) = x - llv ll 2 (v, x) = ( I - vvH)
vHv x.
Definition 3.4.3. Fix a nonzero vector v E !Fn. For any x E !Fn the Householder
transformation reflecting x across the orthogonal complement v _l_ of v is given by
VVH) x .
Hv(x) = x - 2 projv(x) = (I - 2 vHv (3.18)
If xf f2 + X1 ffxff
= (ffxff + x1)x + ff xf f(x1 + ff x ff) e1
x - - - - - -,---,-- - - - - -
\Ix \\ +x1
= -\\xffe1. D
Lemma 3.4. 7. Given x E en' let X1 denote the first component of x. Recall2 1
that the complex sign of X1 is sign(x1) = xif\x1 \ when X1 -=J 0, and sign(O) = 1.
Choosing v = x + sign(x1) ff x f\ e1 implies that
Proof. Let x 1 be the first column of the matrix A. Following Lemma 3.4.7, set
v1 = x 1 +ei 111 llxil!e 1, where ei 111 = sign(x 1), so that the Householder reflection Hv 1
takes x 1 to the span of e 1. Therefore, Hv 1 A has the form
* * * *
0
* * *
0
Hv1A= * * *
0
* * * ... *
where * indicates an entry of arbitrary value.
Now let x 2 be the second column of the new matrix Hv 1 A. We decompose x 2
into x 2 = x2 + x~, where x2 and x~ are of the form
Zm]T ·
Set v2 = x~ + ei112 llx~lle2. Since x~ and e2 are both orthogonal to e 1, the vector v 2
is also orthogonal to ei. Thus, the re:A.ection Hv 2 acts as the identity on the span
of ei , and Hv 2 leaves x2 and the first column of Hv 1 A unchanged.
We now have Hv 2x2 = Hv 2x2 + Hv 2 X~ = x2 + Hv 2 X~. By Lemma 3.4.7 the
vector Hv 2 x~ lies in span{e2}; so Hv 2x2 E span{e1,e2}, and the matrix Hv 2Hv 1A
has the form
* * * *
0
* * *
0 0
Hv2Hv1A = * *
0 0
* * ... *
Repeat the procedure for each k, = 3, 4, . . . , e, choosing Xk equal to the kth
column of Hvk - l · · · Hv 2 Hv 1A, and decomposing Xk = xlc + x%, where xlc and x%
are of the form
O]T and x% = [0
3.4. QR with Householder Transformations 109
Set vk = x;: + ei 8k ll x% 11 ek· The vectors x;: and ek are both orthogonal to the
subspace spanned by e 1, ... , ek-1, so vk is as well. This implies that the reflection
Hvk acts as t he identity on the span of e 1, ... , ek- l· In other words, since the
first k-1 columns of Hvk-i · · · Hv 2 Hv 1 A are in upper-triangular form, they remain
unchanged by Hvk· Moreover, we have HvkXk = xk + Hvkx;:. By Lemma 3.4.7 we
have that Hvkxk E span{ek} , and so Hv kxk E span{e1,e2 , ... , ek} , as desired.
Upon termination, we set R = Hvt · · · Hv 2 Hv 1 A, noting that it is upper
triangular. The QR decomposition follows by taking Q = H~1 H~2 • • • H~t . By
Proposition 3.4.4 each Hvk is orthonormal, so the product Q is as well. D
Remark 3.4.10. Recall that at the kth reflection, the first k - 1 columns are
unaffected by Hvk. It turns out that the first k - l rows are also unaffected by Hv • .
This is harder to see than for columns, but it follows from the fact that the kth
Householder transform is block diagonal with the (1, 1) block being the (k - 1) x
(k - 1) identity. The numerical algorithms for computing the QR decomposition
take advantage of these facts and skip those parts of the calculations. This speeds
up these algorithms considerably.
-2
3 -3.5
0.5 1
3 2.5 .
-2 0.5
This gives
l/2 1/2 1/2
1/2 1/2 -1/2 -1/1/2
21
1/ 2 -1/2 1/ 2 -1 / 2 )
r1/2 - 1/2 -1/2 1/2
which yields
H.,A ~ [~ j5 ~1
Let x2 be the second column of Hv 1 A, noting that
110 Chapter 3. Inner Product Spaces
Now set
This gives
r~ ~']
0 0
0 0
Hv2 = () 0 1 0 '
() -1 0 0
which yields
r~ ~']
1
5
Hv2Hv1A = 0
0
Therefore,
and
l/2 -1/2 1/2
1/2 1/2 -1/2 -1/2 -1/2]
1/2 1/2 1/2 1/ 2 '
r1/2 -1 / 2 -1 / 2 1/2
where the second equality comes from the fact that Hv 1 and Hv 2 are orthonormal
matrices.
Definition 3.5.1. A norm on a vector space V is a map IH : V-+ [O, oo) satisfying
the fallo wing conditions fo r all x , y E V and all a E lF:
(i) Positivity: ll x ll ~ 0, with equality if and only if x = 0.
(ii) Scale preservation: ll ax ll = !a l!! x ii ·
(iii) Triangle inequality: !I x+ YI! :=::: llx ll + ll Yll ·
If II · II satisfies all the conditions above except that llxll = 0 need not imply that
x = 0, then II · II is called a seminorm . A vector space with a norm is called a
normed linear space and is often denoted by the pair (V, II· II ).
Theorem 3.5.2. Every inner product space is a normed linear space with norm
!! x ii = J (x , x ) .
Proof. The first two conditions follow immediately from the definition of an inner
product; see Remark 3.1.12. It remains to verify the triangle inequality:
2 2 2
ll x+y ll = ll x ll + (x,y) + (y,x) + llYll
2 2
:=::: ll x ll + 2l(x, Y)I + ll Yll
2 2
:=::: ll xll + 2l lxll llY ll + llYll (by Cauchy-Schwarz)
2
=( !! x ii + ll Yll) .
Remark 3.5.3. In most situations, the most natural norm to use on an inner
product space is the induced norm llx ll 2 = (x,x). Whenever we are working in
an inner product space, we will almost always use the induced norm, unless we
explicitly say otherwise.
Example 3.5.4. For any x = [x 1 x2 ... Xn] T E lFn , the following are
norms on lFn:
(i) 1-norm.
(3 .19)
T he 1-nor m is sometimes called the taxicab norm or the Manhattan norm
because it tracks t he distance traversed along streets in a rectilinear grid.
(ii) 2-norm.
(3.20)
This is the usual Euclidean norm , and it measures the distance that a
crow would fly to get from one point to another.
(iii) oo-norm.
(3.21)
112 Chapter 3. Inner Product Spaces
Figure 3.5 shows the sets {x E IR 2 : llxll :::; l} for t he 1-norm, the 2-norm, and
the oo-norm, and F igure 3.6 shows the analogous sets in JR 3 . As mentioned
above, the 2-norm arises from the st andard inner product on !Fn, but neither
the 1-norm nor the oo-norm arises from any inner product.
Figure 3.5. The closed unit ball {x E IR 2 : llxll :::; 1} for the 1-norm, the
2-norm, and the oo-norm in IR 2 , as discussed in Example 3.5.4.
Figure 3.6. The closed unit ball {x E IR3 : llxll :::; l} for the 1-norm, the
2-norm, and the oo-norm in IR3 , as discussed in Example 3.5.4.
Example 3.5.5. The 1- and 2-norms are special cases of p-norms (p ::;::: 1)
given by
llxllP = \f;
( n
lxjlP
) l/p
(3 .22)
We show in Corollary 3.6.7 that the p-norm satisfies the triangle inequality.
It is not hard to show that the oo-norm is the limit of p-norms asp --t oo;
that is,
llxlloo = lim l xllp·
p--foo
It is common to say that the oo-norm is the p-norm for p = oo.
3.5. Normed Linear Spaces 113
[[ A[[F = Jtr(AHA).
This arises from the Frobenius (or Hilbert- Schmidt) inner product (3 .5). It is
worth noting that the square of the Frobenius norm is just the sum of squares
of the matrix elements. Thus, if you stack the elements of A into a long vector
of dimension mn x 1, the Frobenius norm of A is just the usual 2-norm of that
stacked vector.
The Frobenius norm holds a prominent place in applied linear algebra,
in part because it is easy to compute, but also because it is invariant under
orthonormal transformations; that is, [[UA [[ F = [[ A[[F , whenever U is an
orthonormal matrix.
Example 3.5.7. For p E [l,oo) the usual norm on the space £P([a,b];lF) (see
Example l.1.6(iv)) is
llfllLP = ( .l b
tf[P dx
) l /p
Similarly, for p = oo the usual choice of norm on the space L 00 ([a, b]; JF) is
This last example is an especially important one that we use throughout this
book. This is sometimes called the sup norm. a
0
· sup is short for supremum.
Definition 3.5.8. For any normed linear space Y and any set X, let L 00 (X ; Y)
be the set of all bounded functions from X to Y, that is, functions f : X -+ Y such
that the £=-norm [[f[[ L= = supxEX [[ f(x)[[y is finite.
Definition 3.5.10. Given two normed linear spaces (V,11 · llv) and (W,11 · llw),
let @(V; W) denote the set of bounded linear transformations, that is, the set of
linear maps T : V -t W for which the quantity
llT(x)llw
llTllv,w = sup-- - - = sup llT(x)llw (3.23)
x;fO 11 X 11V llxllv=l
is finite. The quantity 11 · llv,w is called the induced norm on @(V; W), that is, the
norm induced by ll ·llv and ll · llw - For convenience, ifT: V -t Vis a linear operator,
then we usually write @(V) to denote @(V; V), and we write 11 · llv to denote 22 the
induced norm II · llv,v. We call @(V) the set of bounded linear operators on V,
and we call the induced norm I · llv on @(V) the operator norm .
Theorem 3.5.11. The set @(V; W), with operations of vector addition and scalar
multiplication defined pointwise, is a vector subspace23 of ..ZO(V; W), and the pair
(&#(V; W), 11 · llv,w) is a normed linear space.
llaTllvw=sup llaT(x)l lw -sup lal ll T(x)ll w lal sup llT(x) llw lalllTllv,w.
' x;fO llxllv x;fO llxllv x;fO llxllv
Remark 3.5.12. Let (V, II · llv) and (W, I · llw) be normed linear spaces. The
induced norm of LE @(V; W) satisfies llL(x)llw:::; llLllv,wllxllv for all x EV .
Theorem 3.5.14. Let (V, II· llv), (W, I · llw), and (X, II· llx) be normed linear
spaces. If TE ~(V; W) and SE ~(W; X) , then STE ~(V; X) and
Definition 3.5.15. Any norm I · II on the finite -dimensional vector space Mn(F)
that satisfies the submultiplicative property is called a matrix norm.
Example 3.5.17. Using the p-norm on both lFm and lFn yields the induced
matrix norm on Mmxn(lF) defined by
llAxllP
llAllP = sup
x;60
- 11 - 11- , 1::; P::;
X p
oo. (3.26)
Unexample 3.5.18. The Frobenius norm (see Example 3.5 .6) is not an
induced norm, but, as shown in Exercise 4.28, it does satisfy the submul-
tiplicative property llABllF::; llAllFllBllF ·
Example 3.5.19. Let (V,(.,·)) be an inner product space with the usual
norm II· II· If the linear operator L : V -t Vis orthonormal, then llL(x) I = llxll
for every x E V, which implies that the induced norm II · II on ~(V) satisfies
116 Chapter 3. Inner Product Spaces
llLll = l. A linear operator that preserves the norm of every vector is called
an isometry. This is a much stronger condition than just saying that llLll = 1;
in fact an isometry preserves both lengths and angles.
n
llAlloo = sup L laijl·
Is;i s; m j=l
(3.28)
In other words, the l -norm and the oo -norm are, respectively, the largest column
and row sums (after taking the modulus of each entry).
Proof. We prove (3.28) and leave (3.27) to the reader (Exercise 3.27). Note that
Hence,
llAxlloo ~ ,~~fm
t. O;j Xj :". ,~~fmt. la., IIx, [ :". (,~~fmt, la;j 1) llxlloo·
Hence, for all x "I- 0, we have
It suffices now to prove the reverse inequality over all x E !Fn . Let k be the row
index satisfying
3.6. Important Norm Inequalities 117
Let x be the vector whose ith entry is 0 if aki = 0 and is akiflakil if aki =f- 0. The
only way x could be 0 is if A = 0, in which case the theorem is clearly true; so we
may assume that at least one of the entries of x is not zero, and thus l xll oo = 1.
We have
Nota Bene 3.5.22. One way to remember that the 1-norm is the largest
column sum is to observe that the number one looks like a column. Similarly,
the symbol for infinity is more horizontal in shape, corresponding to the fact
that the infinity norm is the largest row sum.
In this section, we examine several important norm inequalities that are fun-
damental in applied analysis. In particular, we prove the Minkowski and Holder
inequalities. Before doing so, however, we prove Young's inequality, which is one of
the most widely used inequalities in all of mathematics.
i
Lemma 3.6.1. If -i; + = 1, where 1 < p, q < oo (meaning both 1 < p < oo and
1 < q < oo), then for all real x > 0 we have
x x 1- q
1 <-+--
- p q
. (3.29)
Proof. Let f(x) be the right side of (3 .29); that is, f(x) = ~ + x'; q. Note that
f(l) = 1. It suffices to show that x = 1 is the minimum of f(x). Clearly, f'(x) =
1p + x- q(l-q),
q
which simplifies to f'(x) = 1(1
p
- x-q), since 1p = q -q l . This implies
that x = 1 is the only critical point. Since f"(x) = ~x - q-l > 0 for all x > 0, it
follows that f attains its global minimum at x = 1. D
118 Chapter 3. Inner Product Spaces
1 <-
1 (ap-l) (ap-l)l-q - - +
1
- - +- - - =
aP-l aP+q-pq-l aP-l bq-l
= - - +- = -
1 (aP
-
bq)
+ -q .
- p b q b pb qbl -q pb qa ab p
(3.32)
Proof. When p = 1 and q = oo, the result is immediate. Since IYkl ::; llYllcx:i, it
follows that
n n
L JxkYkl:::; L lxkJ llYlloo = JJxJJ1 l Yllcx:i·
k=l k=l
When 1 < p, q < oo, Young's inequality gives
3.6. Important Norm Inequalities 119
(3.34)
Minkowski's inequality is precisely the triangle inequality for the p-norm, thus
showing that the p-norm is, indeed, a norm on lFn.
n ) l/q ( n ) l /q
:::; ( ~ lxk + YklP l xllp + ~ lxk + YklP llYllP
= l x + Yll~/q(llxllP + llYllp)
= l x + Yll~- (llxllP + llYllp)·
1
(3.36)
3.7 Adjoints
Let A be an m x n matrix. For the usual inner product (3.2) , we have that
for any x E JFm and y E JFn. In this section, we generalize this property of Hermitian
conjugates (see Definition C.1.3) to arbitrary linear transformations and arbitrary
inner products. We call the resulting map the adjoint.
Before developing the theory of the adjoint, we present the celebrated Riesz
representation theorem, which states that for each bounded linear transformation
L : V --+ JF, there exists w E V such that L(v) = (w, v) for all v E V. In
fact, there is a one-to-one correspondence between the vectors w and the linear
transformations L . In this section, we prove the finite-dimensional version of this
result. The infinite-dimensional version is beyond the scope of this text and is
typically seen in a standard functional analysis course.
Applying this to V = JFn with the standard basis S = [e1, .. . ,en], if x = 2::~ 1 aiei,
then f(x) = 2:: ~ 1 f(ei)ai = yHx, where y = 2::~ 1 f(ei)ei·
This shows that every linear function f : JFn --+ lF can be written as
f(x) = (y,x) for some y E JFn, where(-,·) is the usual inner product. Moreover, we
have llf(x)ll = l(y,x)I:::; llYllllxll , which implies that llfll:::; llYll· Also, l(y,x)I =
lf(x)I :::; llfllllxll for all x, which implies that llYll 2 :::; llfll ll Yll , and hence llYll :::; llfll·
Therefore, we have llfll = llYll· By Corollary 3.3.6 and Remark 3.3.7, these results
hold for any finite-dimensional inner product space. We summarize these results in
the following theorem.
Remark 3. 7.2. It is useful to explicitly find the vector y that the previous theorem
promises must exist. Let [x 1 , ... , xn] be an orthonormal basis in V. If x E V, then
we can write x uniquely as the linear combination x = I:~=l aixi, where each
ai = (xi,x). Hence,
Vista 3. 7 .3. Although the proof of the Riesz representation theorem given
above is very simple, it relies on the finite-dimensionality of V. If V is infinite
dimensional, then the sum used to define y becomes an infinite sum, and it
is not at all clear that it should converge. Nevertheless, the result can be
generalized to infinite dimensions, provided we restrict ourselves to bounded
linear transformations. The infinite-dimensional Riesz representation theorem
is a famous result in functional analysis that has widespread applications in
differential equations, probability theory, and optimization, but the proof of
the infinite-dimensional case would take us beyond t he scope of this book.
3.7.2 Adjoints
Since every linear functional on a finite-dimensional inner product space Vis defined
by the inner product with some vector in V, it is natural to ask how such functions
and the corresponding inner products change when a linear transformation acts on
V. Adjoints answer this question.
Definition 3. 7.6. Assume that (V, (-, ·)v) and (W, (·, ·)w) are inner product spaces
and that L : V --+ W is a linear transformation. The adjoint of L is a linear
transformation L * : W --+ V such that
Theorem 3. 7.10. Assume that (V, (., ·)v) and (W, (., ·)w) are finite -dimensional
inner product spaces. If L : V -t W is a linear transformation, then the adjoint L *
of L exists and is unique.
Vista 3. 7 .11. As with the Riesz representation theorem, the previous propo-
sition can be extended to bounded linear transformations of infinite-dimensional
vector spaces. You should expect to be able to prove this generalization after
taking a course in functional analysis.
3.8. Fundamental Subspaces 123
Example 3.8.2. For a single vector v E V , the set {v }J_ is the hyperplane
in V defined by (v, x ) = 0. Thus, if V = JFn with the standard inner product
and v = (v1 , . . . ,vn), then VJ_= {(x1, ... ,xn) I V1X1 + · · · + VnXn = O}.
Proof. If y1, y2 E SJ_ , then for each x E S, we have (x, ay1 + by2 ) = a (x, Y1) +
b (x, Y2) = 0. Thus, ay1 + by2 E SJ_. D
Remark 3.8.4. For an alternative to the previous proof, recall that the intersection
of a collection of subspaces is a subspace (see Proposition 1.2.4) . Thus, we have
that SJ_ = nxES{x}J_. In other words, the hyperplane {x}J_ is a subspace, and so
the intersection of all these subspaces is also a subspace.
Proof. If x E W, then (x, y) = 0 for ally E W..L . This implies that x E (W..l )..l,
and so W c (W..l)..L . Suppose now that x E (W..l)..l. By Theorem 3.8.5, we can
write x uniquely as x = w + Wj_, where w E w and Wj_ E w..l. However, we
also have that (w..L,x) = 0, which implies that 0 = (w..L, w + w..L) = (w..L, w) +
(w..L, w..L) = (w..L, W..L) = llw..Lll 2 , and hence W..L = 0. This implies x E W, and
thus (W..l)..l c W . D
Remark 3.8. 7. It is important to note that Theorem 3.8.5 and Lemma 3.8.6 do
not hold for all infinite-dimensional subspaces. For example, the space lF[x] of
polynomials is a proper subspace of C([O, l];lF) with the inner product (f,g) =
J01 fgdx, yet it can be shown that the orthogonal complement of lF[x] is the zero
vector.
Proof. Note that w E fl (L)..l if and only if (L(v), w) = 0 for all v E V, which
holds if and only if (v, L*w) = 0 for all v EV. This occurs if and only if L*w = 0,
which is equivalent tow E JV (L*) . The proof of (3.39) follows from (3.38) and
Lemma 3.8.6; see Exercise 3.43. D
3.8. Fundamental Subspaces 125
I
./V(L) JY (L*)
~
L w
v
/ I
th?(L)
Corollary 3.8.11. Let V and W be finite -dimensional vector spaces. The linear
transformation L maps the subspace f% ( L *) bijectively to the subspace f% ( L), and
L* maps f% (L) bijectively to f% (L*).
column space of A, and it has dimension equal to the rank of A. Since the adjoint
L * is represented by the matrix AH, it follows that the range of AH is the span of
its columns.
The null space fl (A) of A is the set of n-tu ples v = [V1 Vn JT in wn
such that Av= 0, or equivalently
r
0 =Av=
b~1
b~ rV11
V2
=
r(b1,v)1
(b2, v)
.
b~i L (b~, v)
From this we see immediately that fl (A) is orthogonal to the column space/% (AH),
or, in other words, fl (A) is orthogonal to /%(AH) as described in (3 .39). The
fundamental subspaces theorem also tells us that the direct sum of these two spaces
is all of v = wm.
Using the same argument for fl (AH), we get the dual statements that fl (AH)
is orthogonal to the column space/% (A) and that the direct sum of these two spaces
is the entire space w = wn.
Nota Bene 3.8.13. Corollary 3.8.11 hows that L : /% (L*) -t /% (L) and
L * : /% (L) -t /% ( L *) are isomorphisms of vector space , but it i important to
note that L * restricted to/% (L) is generally not the inverse of L restricted to
/% (L*). Instead, the Moore - Penrose pseudoinverse Lt : W -t Vi the inverse
of L restricted to/% (L*) (see Proposition 4.6.2).
A= 0 0 0. [l 2 3]
The range of A is the span of the columns, which is /% (A) = span( [1 0] T).
This is the x-axis in JR 2 . The kernel of A is the hyperplane
[~ : ~] [i:] ~ [f:] ·
14
r b
L~V'A)
Figure 3.8. Projecting the vector b onto ~(A) to get p = proja(A) b
{blue) . The best approximate solution to the overdetermined system Ax = b is the
x
vector that solves the system Ax = p , as described in Section 3. 9.1 . The error is
the norm of the residual r = b - Ax (red) .
3. 9 Least Squares
Many applications involve linear systems that are overdetermined, meaning that
there are more equations than unknowns . In this section, we discuss the best
approximate solution (called the least squares solution) of an overdetermined linear
system. This technique is very powerful and is used in virtually every quantitative
discipline.
Proposition 3.9.1. For any A E Mmxn(lF) the system Ax= b has a least squares
solution. It is unique if and only if A is injective.
Theorem 3.9.3. If A E Mmxn(lF) is injective, that is, of rank n, then the unique
least squares solution of the system Ax = b is given by
(3.42)
Remark 3.9.4. If the matrix A is not injective (not of rank n), then the normal
equation (3.41) always has a solution, but the solution is not unique. This gen-
erally only occurs in applications when one has collected too few data points, or
when variables that one has assumed to be linearly independent are in fact linearly
dependent. In situations where A cannot be made injective, there is still a choice
of solution that might be considered "best." We discuss this further when we treat
the singular value decomposition in Sections 4.5 and 4.6.
3.9. Least Squares 129
lj [Ylj
~ ~ = [~] , and ~
X12 2
A = , x b =
[
Xn 1 Yn
The least squares solution is found by solving the normal equation (3.41), which
takes the form
[E~:~ ~: 2:~~1 Xi] [~] = [lf7~;~;i]. (3.43)
The matrix AT A is invertible as long as the xi terms are not all equal. Simplifying
(3.42) yields
(3.44)
Example 3.9.5. Consider points (3.0, 7.3) , (4.0,8.8), (5 .0, 11.1), and
(6.0, 12.5). Using (3.43) or the explicit solution (3.44), we find the line to
be m = 1.79 and b = 1.87; see Figure 3.9(a).
(a) (b)
Figure 3.9. Least squares solution fitting (a) a line to four data points and
(b) an exponential curve to four points.
130 Chapter 3. Inner Product Spaces
t1
t2 111 [. k ] [logW1]
log W2
A= . . , x = , and b = .
[: : loga :
tn 1 logwn
If the data points are (3.0, 7.3) , (4 .0, 3.5), (5.0, 1.2), and (6.0, 0.8), where the
first coordinate is measured in years, and the second coordinate is measured in
grams, we can use (3.44) to find the exponential parameters to be k = -0. 7703
and a= 71.2736. Thus, the half life in this case is -(log2) / k = 0.8998 years;
see Figure 3.9(b).
X1
A= ["l
x~
.
X2
x2 Xn
n
Again the least squares solution i:s obtained by solving the normal equation
AH Ax = AHb. The solution is unique if and only if the matrix A has rank 3,
which occurs if there are at least three distinct values of xi in the data set.
Vista 3.9.8. A widely used approach for computing least squares solutions
of linear systems is to use the QR decomposition introduced in Section 3.3.3.
If A= QR is a QR decomposition of A , then the least squares solution xis
found by solving the linear system Rx = QHb (see Exercise 3.17). Typically
using the QR decomposition takes about twice as long as solving the normal
equation directly, but it is more stable.
Exercises 131
Exercises
Note to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
first and second line are for Section 1, the exercises between the second and third
lines are for Section 2, and so forth.
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *). We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with .& are especially important and are likely to be used later
in this book and beyond. Those marked with t are harder than average, but should
still be done.
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
3.1. Verify the polarization and parallelogram identities on a real inner product
space, with the usual norm !!xii = ./(x,X) arising from the inner product:
(i) (x,Y) = i (!Ix+ Yll 2 - !Ix - Yll 2 ) ·
(ii) llxll 2 + llYll 2 = ~ (!Ix+ Yll 2 + !I x - Yll 2 ) ·
It can be shown that in any normed linear space over JR for which (ii) holds,
one can define an inner product by using (i); see [Pro08, Thm. 4.8] for details.
3.2. Verify the polarization identity on a complex inner product space, with the
usual norm !!xii = ./(x,X) arising from the inner product:
Using (3.8), find the angle e between the following sets of vectors:
5
(i) x and x .
2 4
(ii) x and x .
3.4. Let (V, (-,·))be a real inner product space. A linear map T: V-+ Vis angle
preserving if for all nonzero x, y E V, we have that
(Tx, Ty) (x,y)
(3.45)
!!Tx!!llTYll !!x!!llYll .
132 Chapter 3. Inner Product Spaces
Prove that T is angle preserving if and only if there exists a > 0 such that
llTxll = allxll for all x EV. Hint: Use Exercise 3.l (i) for one direction. For
the other direction, verify that (3.45) implies that T preserves orthogonality,
and then write y as y = projx y + r .
1
3.5. Let V = C([O,l];JR) have the inner product (f,g) = f 0 f(x)g(x)dx . Find
the projection of ex onto the vector x - l. Hint: Is x - 1 a unit vector?
3.6. Prove the Cauchy- Schwarz inequality by considering the inequality
3.7. Prove the Cauchy- Schwarz inequality using Corollary 3.2.10. Hint: Consider
the orthonormal (singleton) set { fxrr}.
3.8. Let V be the inner product space C([- n,n];JR) with inner product
1
fgdx
(f,g) =
1
- 1 Vl=X2 '
Hint: Recall the trigonometric identity:
3.14. Prove that for any proper subspace X c V of a finite-dimensional inner prod-
uct space the projection projx : V 4 Xis not an orthonormal transformation.
3.15. Let
A~ r; ~l
(i) Use the Gram- Schmidt method to find the QR decomposition of A.
(ii) Let b = [- 1 6 5 7f. Use (i) to solve AH Ax = AHb.
3.23. Let (V, 11 · II) be a normed linear space. Prove that lllxll - llYlll :::; ll x -y ll for
all x,y EV. Hint: Prove llxll -- llYll:::; llx -yll and llYll - llxll:::; ll x -yll ·
3.24. Let C([a, b]; IF) be the vector space of all continuous functions from [a, b] C JR.
to IF. Prove that each of the following is a norm on C([a, b]; IF):
(i) ll!llu =I: lf(t)I dt.
(ii) ll J llL 2 = u: lf(t)l 2 dt) 1! 2.
(iii) llfllv"' = supxE[a,b] IJ(x)I .
3.25. Prove Proposition 3.5.9.
3.26. & Two norms II · Ila and 11 · llb on the vector space X are topologically equivalent
if there exist constants 0 < m :::; M such that
3.31. Prove that in Young's inequality (3.30), equality holds if and only if aP = bq.
3.32. Prove that for every a, b 2: 0 and every c: > 0, we have
(3.46)
3.33. Prove that if() =/=- 0, 1, then equality holds in the arithmetic-geometric mean
inequality if and only if x = y.
3.34. Let (X1, II · llxJ,. .. , (Xn, II · llxJ be normed linear spaces, and let
X = X1 x · · · x Xn be the Cartesian product. For any x = (x1, ... ,xn) EX
define
JJ x JJP = { (Z:~= l ll x ill1;.J l /p ~f PE [1, oo ),
supi IJxdx; if p = oo.
For every p E [1, oo] prove that JI · lip is a norm on X . Hint: Adapt the proof
of Minkowski's inequality.
Note that if Xi = lF for every i, then X = lFn and JI · JJP is the usual p-norm
on lFn.
3.35 .t Suppose that x , y E ]Rn and p, q, r 2: 1 are such that ~ + %= ~ · Prove that
3.37. Let V = JR[x; 2] be the space of polynomials of degree at most two, which is
a subspace of the inner product space L 2([0, 1]; JR). Let L : V -+ JR be the
linear functional given by L[p] = p'(l). Find the unique q E V such that
L[p] = (q,p), as guaranteed by the Riesz representation theorem. Hint: Look
at the discussion just before Theorem 3.7.1.
3.38. Let V = JF[x; 2], which is a subspace of the inner product space L 2([0, 1]; JR).
Let D be the derivative operator D: V-+ V; that is, D[p](x) = p'(x). Write
the matrix representation of D with respect to the power basis [1, x, x 2] of
lF[x; 2]. Write the matrix representation of the adjoint of D with respect to
this basis.
3.39. & Prove Proposition 3.7.12.
3.40. Let Mn(lF) be endowed with the Frobenius inner product (see Example 3.1.7).
Any A E Mn (lF) defines a linear operator on Mn (JF) by left multiplication:
Br--+AB.
(i) Show that A*= AH.
(ii) Show that for any A 1, A2, A3 E Mn(lF) we have (A2, A3A1) = (A2Ai, A3 )·
Hint: Recall tr(AB) = tr(BA).
136 Chapter 3. Inner Product Spaces
(iii) Let A E Mn(lF). Define the linear operator TA : Mn(lF) ---+ Mn(lF) by
TA(X) = AX - XA, and show that (TA)* =TA*.
3.41. Let
A=
1 1 1
0 0 0 0 .
ol
[2 2 2 0
3.50. Let (xi, Yi)f= 1 be a collection of data points that we have reason to believe
should lie (roughly) on an ellipse of the form rx 2 + sy 2 = l. We wish to
find the least squares approximation for r and s. Write A, x , and b for
the corresponding normal equation in terms of the data xi and Yi and the
unknowns r and s.
Notes
Sources for the infinite-dimensional cases of the results of this chapter include
[Pro08, Con90, Rud91]. For more on the QR decomposition and the Householder
algorithm, see [TB97, Part II]. For details on the stability of Gaussian elimination,
as discussed in Remark 3.3.13, see [TB97, Sect. 22].
Spectral Theory
Spectral theory describes how to decouple the domain of a linear operator into a
direct sum of minimal components upon which the operator is invariant. Choosing a
basis that respects this direct sum results in a corresponding block-diagonal matrix
representation. This has powerful consequences in applications, as it allows many
problems to be reduced to a series of small individual parts that can be solved
independently and more easily. The key tools for constructing this decomposition
are eigenvalues and eigenvectors, which are widely used in many areas of science and
engineering and have many important physical interpretations and applications.
For example, they are used to describe the normal modes of vibration in
engineered systems such as musical instruments, electrical motors, and even static
structures, like bridges and skyscrapers. In quantum mechanics, they describe the
possible energy states of an electron or particle; in particular, the atomic orbitals
one learns about in chemistry are just eigenvectors of the Hamiltonian operator for
a hydrogen atom. Eigenvalues and eigenvectors are fundamental to control theory
applications ranging from cruise control on a car to the automated control systems
that guide missiles and fly unmanned air vehicles (UAVs) or drones.
Spectral theory is also widely used in the information sciences. For example,
eigenvalues and eigenvectors are the key to Google's PageRank algorithm. They
can be used in data compression, which in turn is essential for reducing both the
complexity and dimensionality in problems like facial recognition, intelligence and
personality testing, and machine learning. Eigenvalues and eigenvectors are also
useful for decomposing a graph into clusters, which has applications ranging from
image segmentation to identifying communities on social media. In short, spectral
theory is essential to applied mathematics.
In this chapter, we restrict ourselves to the spectral theory of linear operators
on finite-dimensional vector spaces. While it is possible to extend spectral theory
to infinite-dimensional spaces, the mathematical sophistication required is beyond
the scope of this text and is more suitable for a course in functional analysis.
139
140 Chapter 4. Spectral Theory
Most finite-dimensional linear operators have eigenvectors that span the space
(hereafter called an eigenbasis) where the corresponding matrix representation is
diagonal. In other words, a change of basis to the eigenbasis shows that such a
matrix operator is similar to a diagonal matrix. That said, not all matrices can be
diagonalized, and some can only be made block diagonal. In the first sections of this
chapter, we develop this theory and expound on when a matrix can be diagonalized
and when we must settle for block diagonalization.
One of the most important and useful results of linear algebra and spectral
theory specifically is Schur's lemma, which states that every matrix operator is
similar to an upper-triangular matrix, and the transition matrix used to perform
the similarity transformation is an orthonormal matrix. Any such upper-triangular
matrix is called the Schur form 24 of the operator. The Schur form is well suited to
numerical computation, and, as we show in this chapter, it also has real theoretical
significance and allows for some very nice proofs of important theorems.
Some matrices have special structure that can make them easier to under-
stand, better behaved, or otherwise more useful than arbitrary matrices. Among
the most important of these special matrices are the normal matrices, which are
characterized by having an orthonormal eigenbasis. This allows a normal matrix to
be diagonalized by an orthonormal transition matrix. Among the normal matrices,
the Hermitian (self-adjoint) matrices are especially important. In this chapter we
define and discuss the properties of these and other special classes of matrices.
Finally, the last part of this chapt er is devoted to the celebrated singular value
decomposition (SVD), which allows any matrix to be separated into parts that
describe explicitly how it acts on its fundamental subspaces. This is an essential
and very powerful result that you will use over and over, in many different ways.
Remark 4.1.2. In addition to all the eigenvectors corresponding to >., the >.-
eigenspace E).. always contains the zero vector 0, which is not an eigenvector.
Remark 4.1.3. The definitions of eigenvalues and eigenvectors given above only
apply to finite-dimensional operators on complex vector spaces, that is, for matrices
with complex entries. For matrices with real entries, we simply think of the real
entries as complex numbers. In other words, we use the obvious inclusion Mn(IR) c
Mn(C) and define eigenvalues and eigenvectors for the corresponding matrices in
Mn(C).
Nota Bene 4.1.5. Proposition 4.1.4 motivates the traditional method offind-
ing eigenvectors as taught in an elementary linear algebra course. Given that
>. i an eigenvalue of the matrix A, you can find the corresponding eigenvec-
tors by solving the linear system (A - >.I)x = 0. For a worked example see
Example 4.1.12.
A[x]s = >.[x]s.
as always, by the transition matrix Ors - Thus, we have [x]r = Crs[x]s and
B = CrsA(Crs)- 1 . This implies that
Remark 4.1.6. The previous discussion shows that the spectral theory of a linear
operator on a finite-dimensional vector space and the spectral theory of any associ-
ated matrix representation of that operator are essentially the same. But since the
representation of the eigenvectors depends on a choice of basis, we will, from this
point forth, phrase our study in terms of matrices- that is, in terms of a specific
choice of basis.
Unless otherwise stated, we assume throughout the remainder of this chapter
that L is a linear operator on a finite-dimensional vector space V and that A is its
matrix representation for a given basis S.
!
A= [ ~] and x = [- ~] .
Theorem 4.1.8. Let A E Mn(lF) and A EC. The following are equivalent:
(i) A is an eigenvalue of A .
Example 4.1.12. The matrix A from Example 4.1.7 has the characteristic
polynomial
.\-1
p(.A) = <let (AI - A) = _4
1
26 0f course, t he polynomial p(z) does not necessarily factor into linear terms over JR, since JR is
not algebraically closed, but it always factors completely over C.
27 A polynomial p is manic if the coefficient of the term of highest degree is one. For example,
A = [1
o -1]
0 '
then p(.X) = >. 2 + 1, and so O"(A) == {i, -i}. By solving for the eigenvectors,
we have that L;±i(A) = JV (±il - Jl) = span{[l =Fi]T }. Note again that the
geometric and algebraic multiplicities of the eigenvalues of A are the same.
A= rn ~].
Note that p(.X) = (.A - 3) 2 , so O"(A) = {3} and JV (A - 3I) = span{e 1 , e 2 }.
Thus, the algebraic and geometric multiplicities of .A = 3 are equal to two.
Remark 4.1.15. Despite the previous examples, the geometric and algebraic mul-
tiplicities are not always the same, as the next example shows.
A= [~ ~] .
Note that O"(A) = {3} , yet JV (A. - 3I) = span{e 1 }. Thus, the algebraic
multiplicity of A = 3 is two, yet the geometric multiplicity is one.
4.l Eigenvalues and Eigenvectors 145
All finite-dimensional operators (over <C) have eigenvalues, but not all opera-
tors on infinite-dimensional spaces do. In particular, consider the following example.
where f is a given function. One numerical way to find u(x) in this equation
involves solving a linear equation Ax = b, where A is a tridiagonal n x n
matrix in the form
b a 0 0 0
c b a 0
A= 0 c b a
( 4.6)
0 0 c b 0
a
0 0 c b
The eigenvalues and eigenvectors of the matrix are also important when solving
this problem.
146 Chapter 4. Spectral Theory
[
.
pSlllW1,k
n .
p SlllWn ,k
]T , ~
h
W ere
d
Wj,k = n+l an p =
(c)l/2
a . Th"IS Can
be verified by setting 0 =(A - >.I)xk, which gives, for each j, that
0 = Cp,.j-1 SlllWj-
.
1,k +
(b - /\\),.j
p SlllWj,k + ap
. ,.J+l .
SlllWj+l,k
Remark 4.1.20. If A and B are similar matrices, that is, A= PBP- 1 for some
nonsingular P, then A and B define the same operator on lFn , but in different
bases related by P . Since the determinant and eigenvalues are determined only by
a linear operator, and not by its matrix representation, the following proposition is
immediate.
Propos ition 4 .1.21. If A, B E Mn(IF) are similar matrices, that is, B = p-l AP
for some nonsingular matrix P, then the following hold:
(i) A and B have the same characteristic polynomials.
(ii) A and B have the same eigenvalues.
(iii) If>. is an eigenvalue of A and B , then P : °L.>. (B) --+ °L.>. (A) is an isomorphism,
and dim °L.>. (A) =dim °L.>. (B) .
Proof. This follows from Remark 4.1.20, but can also be seen algebraically from
the following computation. For any scalar z we have
(zI - B) = (zI - p- 1 AP) = p- 1 (zI - A)P.
This implies
Finally, we conclude with two observations that follow immediately from the
results of this section but are nevertheless very useful in many settings.
Propos ition 4 .1.22. The diagonal entries of an upper-triangular (or a lower-
triangular) matrix are its eigenvalues.
4.2. Invariant Subspaces 147
Proposition 4.1.23. A matrix A E Mn(lF) and its transpose AT have the same
characteristic polynomial.
Nota Bene 4.2.2. Beware that when we say "Wis £-invariant," it does not
mean each vector in Wis fixed by L . Rather it means that W, as a space, is
mapped to itself. Thus, a vector in W could certainly be mapped to a different
vector in W, but it will not be sent to a vector outside of W.
Remark 4.2.3. For a vector space V and operator Lon V, it is easy to see that
{O} and V are invariant. But these are not useful- we are really only interested in
proper, nontrivial invariant subspaces.
You should convince yourself (and prove) that when determining whether a
subspace is invariant, it is sufficient to check the basis vectors.
Example 4.2.5. The kernel JV (L) of a linear operator Lis always £ -invariant
because L(JV (L)) = {O} c JV (L).
148 Chapter 4. Spectral Theory
Proof. By Corollary 1.4.5, there exists a set T = [tk+l, t k+2, ... , tn] such that
S' = SU Tis a basis for V. Let A = [ai1] be the unique matrix representation of
L on S'. Since W is invariant, the map of each basis element of S can be uniquely
represented as a linear combination of elements of S. Specifically, we have that
L(s1) = I:7=l <1ijSi for j = 1, . . . , k. Note that aij = 0 when i > k and j:::; k. The
image L (t 1 ) of each vector in T can be expressed as a linear combination of elements
of S'. Thus, we have L(t1) = I:7=i aijSi + L:~k+l aijti for j = k + 1, ... , n. Thus,
(4. 7) holds, where
a12 a1k
Au=
[""
a21 a22
:
ak1 ak2
a2k
akk
Proof. Let A = [aiJ] be the matrix representation of Lon 8. Since W1 and W2 are
invariant, the map of each basis element of 8 can be uniquely represented as a linear
combination in its respective subspace. Specifically, we have L(xJ) = .L7=l aijXi E
W1 for j = 1, ... , k and L(xj) = .L7=k+l aijXi E W2 for j = k + 1, ... , n. Thus, the
Au =
[""
a21
:
ak1
a12
a22
ak2
"l
other aij terms are zero, and (4.8) holds, where
a2k
akk
and A22 =
["'+''+'
ak+2,k+1
.
an ,k+l
ak+1 ,k+2
ak+2 ,k+2
an ,k+2
ak+"n
ak+2,n
an ,n
l
are the unique matrix representations of L on W1 and W2 with respect to the bases
81 and 82, respectively. D
where each Aii is the matrix representation of L restricted to Wi with the basis Si .
4.3 Diagonalization
If a matrix has a set of eigenvectors that forms a basis, then the matrix is similar
to a diagonal matrix. This is one of the most fundamental ideas in linear analysis.
In this section we describe how this works. As before, we assume throughout this
section that L is a linear operator on a finite-dimensional vector space with matrix
representation A, with respect to some given basis.
Theorem 4.3.1. If >.1, ... , >.k are distinct eigenvalues of L with corresponding
eigenvectors x 1 , x 2 , ... , xk , then these eigenvectors are linearly independent.
(4.10)
but since {x1, x2, ... , Xr} is linearly independent, each ai(.>.i - Ar+l) is zero. Because
the eigenvalues are distinct, this implies ai = 0 for each i. Hence, Xr+l = 0,
which is a contradiction (since by definition eigenvectors cannot be 0). Therefore,
r = k. 0
A= [! ~]
from Examples 4.1.7 and 4.1.12 has distinct eigenvalues, and so the previous
theorem shows t hat the corresponding eigenvectors are linearly independent
and thus form an eigenbasis.
Having distinct eigenvalues is a sufficient condition for the existence of
an eigenbasis; however, as shown in Example 4.1.14, it is not necessary-the
eigenvalues are not distinct , and yet an eigenbasis can be also found. On
the other hand , there are cases where an eigenbasis does not exist for a matrix;
see, for example, Example 4.1. 16.
Proof. If A E Mn (IF) is simple, then there exist n distinct eigenvalues >.1, ... , An
with corresponding eigenvectors x 1 , x 2 , ... , Xn · Hence, by Theorem 4.3.1, these
form a linearly independent set, which is an eigenbasis. 0
A= [! ~]
from Examples 4.1.7 and 4.1.12. The eigenvalues are a-(A) = {-2, 5} with
corresponding eigenvectors [-1 1] T and [3 4JT, respectively. Setting
p = [-1 3]
1 4 , D = [-20 5OJ ' and p-l = ~
7
[-4 3]
1 1 ,
as expected.
The matrix A represents an operator L : JF 2 --+ JF 2 in the standard
basis. The previous computation shows that if we change the basis to the
eigenbasis {[- 1 1JT , [3 4] T} , then the matrix representation of L is D ,
which is diagonal. This "decouples" the action of the operator L into its action
on the two eigenspaces ~ - 2 = span([-1 l]T) and ~ 5 = span([3 4]T).
Example 4.3.9. Not every square matrix is diagonalizable. For example, the
matrix
A= [~ ~]
in Example 4.1.16 cannot be diagonalized. In this example, JV (A - 31) =
span {ei}, which is not a basis for JH: 2, and thus A does not have an eigenbasis.
4.3. Diagonali zation 153
Proof. For k E N, we have that Ak (P- 1 BP)(P- 1BP)··· (P- 1 BP), which
reduces to Ak = p - 1 Bk P. D
0, 1, 1, 2, 3, 5, 8, .. . '
and >-2
l
= - -- ,
- v'5
2
and the corresponding eigenvectors are [>-1 l]T and [>-2 l] T, respectively
(The reader should check this!). Since the eigenvectors are linearly indepen-
dent, they form an eigenbasis, and A can be written as A = PD p - l, where
1 [ 1
and P- 1
= v'5 _1
The kth Fibonacci number is the second entry in vk+li given by
Vk+l = p D k p -1 V1 = 1
v'5 [>-1
1 0] [-11 ->-1>-2] [l]0 .
>.~
>.k - >.k
Multiplying this out gives Fk = ~.
Vista 4.3.13. In Section 12.7.1 we show that the semisimple spectral map-
ping theorem actually holds for all matrices, not just semisimple ones, and it
holds for many functions , not just polynomials.
Remark 4.3.15. It is easy to see by taking the transpose that (4.12) is equivalent
to AT x = >.x. In other words, given >., the row vector x T is a left eigenvector of A
if and only if x is a right eigenvector of AT.
Remark 4.3.16. We define the left eigenspace of>. to be the set of row vectors
that satisfies (4.12) . By appealing to the rank-nullity theorem, we can match the
dimensions of the left and right eigenspaces of an eigenvalue >. since it is always
true that dim...%(>..! -A)= dim...%(>..! - AT).
Remark 4.3.17. Sometimes, for notational convenience, we denote the right eigen-
vectors by the letter r and the left eigenvectors by the letter £T .
Example 4.3.18. If
-9
B = [ 35
H
U AU=
l-). .*....r *]
l ,
: M
0
yields
* ...
A-
-
[>-h
0
*]'
A22
Proof. Let A be an Hermitian matrix. By the first spectral theorem, there exist an
orthonormal matrix U and a diagonal D such that UH AU= D . The eigenbasis is
then given by the columns of U, which are orthonormal. The diagonal elements are
real since D is also Hermitian (and the diagonal elements of an Hermitian matrix
are real). D
1 2i]
A = [ -2i -2
Thus, the spectrum is a(A) = {-3, 2}, and so both eigenvalues are real. The
corresponding eigenvectors, when scaled to be unit vectors, are
158 Chapter 4. Spectral Theory
and 1 [2i]
v'5 1 )
[" -~ H~'
0 ti2
12 t22 t22 t2n
THT= .
""1
ln t2n tnn 0 0 tnn
and
ti2 0
f'b'. t22
""1l"
t2n ti2 t 22
T TH =
0 0 tnn tin t2n
jJ
4.5. The Singular Value Decomposition 159
Remark 4.5.2. If A is Hermitian, then it is clear that (x, Ax) is real valued. In
particular, we have that
(x,Ax) = (AHx,x) = (Ax,x) = (x,Ax).
A= [~ ~] and B = [~ ~] .
Note that A is not positive definite because it is not Hermitian. To show that
B is not positive definite, let x = [--1 1] T, which gives xH Bx = -4 < 0.
Theorem 4.5.5. The Hermitian matrix A E Mn(lF) is positive definite if and only
if its spectrum contains only positive eigenvalues. It is positive semidefinite if and
only if its spectrum contains only nonnegative eigenvalues.
matrix Q such that QHAQ = diag(A 1, ... , Ari 0, .. . , 0). The last n -r columns of Q
form an orthonormal basis for JV (A), and the first r columns form an orthonormal
basis for JV (A)J_. Letting D = diag(A 1, ... , Ar), we have the following block form:
(4.13)
Proof. Let x1 , ... ,xn be an orthonormal eigenbasis for A, ordered so that the
corresponding eigenvalues satisfy Ai ::'.: A2 ::'.: · · · ::'.: Ar > Ar+i = · · · = An = 0.
The set Xr+ 1, ... , Xn is an orthonormal basis for the kernel JV (A), and x 1, . .. , x k
is an orthonormal basis for JV (A)J_. If Q 1 is defined to be the n x r matrix with
columns equal to x 1, ... , Xr , and Q 2 to be then x (n - r) matrix with columns
equal to Xr+1, ... , xn, then (4.13) follows immediately. D
Proof. Write A= U DUH, where U is orthonormal and D = diag(d 1, ... , dn) ::'.: 0
is diagonal. Let D 112 = diag( .J"(I;, ... , v'cI;,) ::'.: 0, so that D 112D 112 = D. Setting
s = n 112uH gives sHs = A, as required.
If A is positive definite, then each diagonal entry of D is positive, and so is
each corresponding diagonal entry of D 112 . Thus, D and Sare nonsingular. D
Proof. It is clear that (-,·) A is sesquilinear. Since A> 0, there exists a nonsingular
SE Mn(IF) satisfying A = SHS. Hence, (x,x)A = xHAx = xHSHSx = llSxll§,
which is positive if and only if x-:/- 0. D
Proof. Note that (AH A)H =AH A and (x, AH Ax) = (Ax, Ax) = ll Ax ll 2 ::'.: 0. Thus,
AH A is positive semidefinite. See Exercise 3.46(iii) to show rank( AH A) = r. D
(4.14)
where O"i ::'.:: 0"2 ::'.:: · · · ::'.:: O"r > 0 are all positive real numbers. This is called the
singular value decomposit ion (SVD) of A, and the r positive values O"i , 0"2 , .. . , O"r
are the singular values30 of A .
where D = diag( di, . . . , dr) and di ::'.:: d2 ::'.:: · · · ::'.:: dr > 0. Write each di as a square
di= O"f , and let E i be the diagonal r x r block Ei = diag(O"i,. . .,O"r)- In block
form, we may write
v =[Vi Vi]
30
The additional zeros on the diagonal are not considered singular values.
31
Beware that Vi V 1H =JI, despite the fact that V 1HV1 =I and VV H =I.
4.5. The Singular Value Decomposition 163
Note that I= VVH = Vi V1H + V2V{1, and thus A = AVi V1H + AViV2H = AVi Vt .
Therefore, UI;VH = A.
Since the singular values are the positive square roots of the nonzero eigen-
values of AH A, they are uniquely determined by A, and since they are ordered
0' 1 ;::: 0'2 ;::: · · · ;::: O'ri the matrix I; is uniquely determined by A. D
Remark 4.5.11. The matrix I; is unique in the SYD, whereas the matrices U and
V are not necessarily unique.
Remark 4.5.12. The SYD gives orthonormal bases for the four fundamental sub-
spaces in the fundamental subspaces theorem (Theorem 3.8.9). Specifically, the first
r columns of V form a basis for !fl (AH); the last n -r columns of V form a basis for
JV (A); the first r columns of U form a basis for !fl (A); and the last m - r columns
of U form a basis for JV (AH) . This is visualized in Figure 4.1.
80 56 16]
56 68 40 .
[ 16 40 32
Since u(AH A) = {O , 36, 144} , the singular values are 6 and 12. The right
singular vectors, that is, the columns of V, are determined by finding a set of
orthonormal eigenvectors of AH A. Specifically, let
The left singular vectors, that is, the columns of U1 , can be computed
by observing that u i = ; , Avi for i = 1, 2. The remaining columns of U are
calculated by finding unit vectors orthogonal to u 1 and u2. We let
u ~ ~ [:
2 1
1
1 -1
-1 - 1
- 1 1
1 1
:,]
1
-1
.
1 - 1
-1 -1
-1 1
1 1
Remark 4.5.14. From (4.15) we have that UI;VH = U1 I; 1 V1H. The equation
(4.16)
is called the compact form of the SVD. The compact form encapsulates all of the
necessary information to recalculate the matrix A. Moreover, A can be represented
by the outer-product expansion
(4.17)
where ui and vi are column vectors for U1 and V1 , respectively, and a 1 , a 2 , ... , ar
are the positive singular values.
4.6. Consequences of the SVD 165
Example 4.5.15. The compact form of the SVD for the matrix A from
Example 4.5.13 is
Proof. From the SVD we have A = U2:VH. Set Q = UVH and P = VI;VH . Since
2: is positive semidefinite, it follows that P is also positive semidefinite. D
If A has full column rank, that is, rank A = n, then AH A is invertible and
x = (AH A)- 1 AHb is the unique least squares solution. If A is not injective, then
166 Chapter 4. Spectral Theory
x
there are infinitely many least squares solutions. If is a particular solution, then
any x + n with n E JV (A) also satisfies (4.18) . The SVD allows us to find the
unique particular solution that is orthogonal to JV (A).
Proof. If x = Atb = V1~1 1 Urb, then x Ea (Vi)= a (AH) =JV (A)..L and
1
AH Ax= V1~ 1 uru1~1 vt-1Vi~1 urb = V1~1urb = AHb .
In this section, we show that As has rank s and is the "best" rank-s approximation
of A in the sense that the norm of the difference, that is, llA - As II, is minimized
against all other ll A - B JJ where B has ranks. We make this more precise below.
with minimizer
s
B = As = L O"iUiv{f, (4.22)
i=l
where each O"j is the jth singular value of A and Uj and vj are, respectively, the
corresponding columns of U1 and Vi in the compact form (4.16) of the singular value
decomposition.
Proof. Let W = [v 1 · · · v s+l] be the matrix whose columns are the first s + 1
right singular vectors of A . For any B E Mmxn(C) of rank s, Exercise 2.14(i)
shows that rank(BW) ::; rank(B) = s. Thus, by the rank-nullity theorem, we have
dim JV (BW) = s + 1 - rank (BW) ~ 1. Hence, there exists x E JV (BW) c 1p+ 1
satisfying ll x ll 2 = 1. We compute
r s+l
AWx = LO"iUiv{fWx = LO"iXiui,
i=l i= l
3 2 This theorem and its counterpart for the Frobenius norm are often just called the Eckart-Young
theorem, but there seems t o be good evidence that Schmidt and Mirsky discovered these results
earlier than Eckart and Young (see [Ste98, pg. 77]) , so all four names get attached to these
theorems.
3 3 An inequalit y is sharp if there is at least one case where equality holds. In other words, no
stronger inequalit y could hold.
168 Chapter 4. Spectral Theory
A version of the previous theorem also holds in the Frobenius norm (as defined
in Example 3.5.6).
(
t
j=s+l
a} ) = inf
rank(B)=s
llA- B llF, (4.23)
Proof. Let A E Mm xn (IF) with SVD A = U2;VH. The invertible change of variable
Z = UH BV (combined with Exercise 4.32(i)) gives
inf llA - BllF = inf 112; - Z llF· (4.24)
rank(B )=s rank(Z)=s
From Example 3.5.6, we know that the square of the Frobenius norm of a matrix is
just the sum of the squares of the entries in the matrix. Hence, if Z = [zij], then
we have
m n
2
112; - Zll} = L L l2;ij - ZiJl
i=l j=l
r r m n
s = 20 s = 10 s= 5
the matrix b.. has to be sufficiently large in order for A + b.. to be smaller in rank
than A.
Example 4.6.6. If A= 0 E Mn(lF) is the zero matrix, then A+cl has rank n
for any c # 0. Notice that by adding a small perturbation to the zero matrix,
the rank goes from 0 to n. Adding even the smallest matrix to the zero matrix
increases the rank of the sum.
Corollary 4.6.7. Let A E Mmxn(lF) have SVD A= UI;VH . Ifs< r, then for any
b.. E Mmxn(lF) satisfying rank(A + b..) = s, we have
Theorem 4.6.8. Let A E Mmxn(lF) have SVD A= UI;VH. The infimum of llb..11
such that rank(! - Ab..) < m is CT1 1, with
(4.25)
Proof. To make sense of the expression I - Ab.., we must have IE Mmxm(lF) and
b.. E Mnxm(lF) . If rank(! - Ab..) < m, then there exists x E lFm with x # 0 such
that Ab..x = x. Thus,
which implies
1 -< llb..xll2
CT-1 < llb..ll 2
llxll 2 - ·
Since llb..*112 = llCT1 v1u~ll2 = CT;:-1, it suffices to show that rank(! - Ab..*) <
1
m . However, this follows immediately from the fact that (I - Ab..*)u 1 = O; see
Exercise 4.40. D
Exercises 171
Exercises
Note to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
first and second line are for Section 1, the exercises between the second and third
lines are for Section 2, and so forth.
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *) . We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with & are especially important and are likely to be used later
in this book and beyond. Those marked with t are harder than average, but should
still be done.
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
4.1. A matrix A E Mn(lF) is nilpotent if Ak = 0 for some k EN. Show that if>. is
an eigenvalue of a nilpotent matrix, then >. = 0. Hint: Show that if>. is an
eigenvalue of A, then >. k is an eigenvalue of A k .
4.2. Let V =span( {l, x, x 2}) be a subspace of the inner product space £ 2([0, l]; JR).
Let D be the derivative operator D: V--+ V given by D[p](x) = p'(x) . F ind
all the eigenvalues and eigenspaces of D . What are their algebraic and geo-
metric multiplicities?
4.3. Show that the characteristic polynomial of any 2 x 2 matrix has the form
4.8. Let V be the span of the set S = {sin(x) , cos(x), sin(2x), cos(2x)} in the
vector space C 00 (JR.; JR.).
(i) Prove that S is a basis for V.
(ii) Let D be the derivative operator. Write the matrix representation of D
in the basis S.
(iii) Find two complementary D -invariant subspaces in V.
4.9. Prove that the right shift operator on f, 00 has no one-dimensional invariant
subspace (see Remark 4.2.8).
4.10. Assume that Vis a vector space and Tis a linear operator on V. Prove that
if Wis a T -invariant subspace of V, then the map T': V/W-+ V/W given
by T'(v + W) = T(v) +Wis a well-defined linear transformation.
4.11. Let W1 and W2 be complementary subspaces of the vector space V. A re-
flection through W 1 along W2 is a linear operator R : V -+ V such that
R(w1 + w2) = w1 - W2 for W1 E W1 and W2 E W2. Prove that the following
are equivalent:
(i) There exist complementary subspaces W1 and W2 of V such that R is
a reflection through W1 along W2.
(ii) R is an involution, that is, R 2 = I.
(iii) V = JY (R - I) EB JY (R +I). Hint: We have that
1 1
v = 2,(I - R)v + (1 + R)v.
2
4.12. Let L be the linear operator on JR. 2 that reflects around the line y = 3x.
(i) Find two complementary L-invariant subspaces of V.
(ii) Choose a basis T for JR. 2 consisting of one vector from each of the two
complementary L-invariant subspaces and write the matrix representa-
tion of L in that basis.
(iii) Write the transition matrix Csr from T to the standard basis S.
(iv) Write the matrix representation of L in the standard basis.
4.13. Let
A = [0.8 0.4]
0.2 0.6 .
where(-,·) is the usual inner product on IFn. Show that the Rayleigh quotient
can only take on real values for Hermitian matrices and only imaginary values
for skew-Hermitian matrices.
4.25. Let A E Mn(<C) be a normal matrix with eigenvalues (>..1, ... , An) and corre-
sponding orthonormal eigenvectors [x 1 , ... , Xn].
(i) Show that the identity matrix can be written I = x 1 x~ + · · · + XnX~ .
Hint: What is (xix~+···+ Xnx~)xj?
(ii) Show that A can be written as A = >.. 1 x 1 x~ + · · · + An XnX~ . This is
called an outer product expansion .
4.26. t Let A, B E Mn (IF) be Hermitian, and let IFn be equipped with the standard
inner product. If [x 1 , ... , Xn] is an orthonormal eigenbasis of B, then prove
that
n
tr (AB) = L (x i, ABxi).
i =l
4.27. Assume A E Mn(IF) is positive definite. Prove that all its diagonal entries
are real and positive.
4.28. Assume A, B E Mn(IF) are positive semidefinite. Prove that
4.31. .& Assume A E Mmxn(lF) and A is not identically zero. Prove that
(i) llAll2 CTl, where CTl is the largest singular value of A;
=
c
is positive definite if and only if B > 0 and D - CH B- 1 > 0. Hint: Find a
matrix P of the form [ 6f] that makes pH AP block diagonal.
4.35. Let A E Mn(lF) be nonsingular. Prove that the modulus of the determinant
is the product of the singular values:
I det Al = rr
i=l
n
CTi ·
(iii) Use the first parts of this exercise and Theorem 4.6.8 to prove that if
A+ 6. has rank s < r, then 116.112 ;::: aro without using the Schmidt,
Mirsky, Eckart- Young theorems.
Notes
We have focused primarily on spectral theory of finite-dimensional operators because
the infinite-dimensional case is very different and has many subtleties. This is
normally covered in a functional analysis course. Some resources for the infinite-
dimensional case include [Pro08, Con90, Rud91] .
Part II
Nonlinear Analysis I
Metric Space
Topology
The angel of topology and the devil of abstract algebra fight for the soul of each
individual mathematical domain.
-Herman Weyl
179
180 Chapter 5. Metric Space Topology
point that we can get arbitrarily close to must actually be in the space- there
are no holes or gaps in the space. For example, IQ is not complete because we
can approximate irrational numbers (not in IQ) as closely as we like with rational
numbers. Many of the most useful spaces in mathematical analysis are complete;
for example, (wn, 11 · llp) is complete for any n EN and any p E [1,oo].
When normed vector spaces are also complete, they are called Banach spaces.
Banach spaces are very important in analysis and applied mathematics, and most
of the normed linear spaces used in applied mathematics are Banach spaces. Some
important examples of Banach spaces include lFn, the space of matrices Mmxn(lF)
(which is isomorphic to wnm), the space of continuous functions (C([a, b]; JR), 11·11£<"",
the space of bounded functions , and the spaces f,P for 1 :::; f, :::; 00.
Although this chapter is a little more abstract than the rest of the book so
far , and even though we do not give as many immediate applications of the ideas
in this chapter, the material in this chapter is fundamental to applied mathematics
and provides powerful tools that you can use repeatedly throughout the rest of the
book and beyond.
Example 5.1.2. Perhaps the most common metric space is the Euclidean
metric on wn given by the 2-norm, that is, d(x, y) = llx - Yll2· Unless we
specifically say otherwise, we always use this metric on lFn.
Example 5.1.3. The reader should check that each of the examples below
satisfies the definition of a metric space.
34
Metric spaces are not necessarily vector spaces. Indeed, there need not be any binary operations
defined on a metric space.
5.1. Metric Spaces and Continuous Funct ions 181
(i) We can generalize Example 5.1.2 to more general normed linear spaces.
By Definition 3.1.11, any norm I · II on a vector space induces a natural
metric
d(x,y) = ll x -y ll . (5 .1)
Unless we specifically say otherwise, we always use this metric on a
normed linear space.
(ii) For i ,g E C([a, b]; JR) and for any p E [l, oo], we have the metric
1
dP(f, g) = { (t if(t) - g(t)i' dt )' ', I :5 p < oo, (5 _2 )
These are also written as Iii - gllLP and Iii - gllL=, respectively.
(iii) The discrete metric on X is
o if x = y,
d(x, y) ={ (5.3)
1 ifx!=-y.
Thus, no two distinct points are close together-they are always the
same distance apart.
(5.4)
Example 5.1.4. Let (X, d) be a metric space. We can create a new metric
onX:
d(x, y)
p(x,y) = 1 +d(x,y)' (5.5)
where no two points are farther apart than l. To show that (5.5) is a metric
on X, see Exercise 5.4.
182 Chapter 5. Metric Space Topology
Remark 5.1.5. Every norm induces a metric space, but not every metric space is
a normed space. For example, a metric can be defined on sets that are not vector
spaces. And even if the underlying space is a vector space, we can install metrics
on it that are not induced by a norm.
Definition 5.1.6. For each point :x:0 E X and r > 0, define the open ball with
center at x 0 and radius r > 0 to be the set
Example 5.1.9. Both X and 0 are open sets. First, Xis open since B(x, r) c
X for all x E X and for all r > 0. That 0 is open follows vacuously-every
point in 0 satisfies the condition because there are no points in 0.
Example 5.1. 10 . In the Euclidean norm, B(xo, r) looks like a ball, which
is why we call it the "open ball." However, in other metrics open balls can
take on very different shapes. For example, in JR. 3 the open ball in the metric
induced by the 1-norm is an octahedron, and the open ball in the metric
induced by the oo-norm is a cube (see Figure 3.6). In the discrete metric (see
Example 5.l.3(iii)), open balls of radius one or less are just points (singleton
sets); that is, B(x, 1) = {x} for each x EX, while B(x, r) = X for all r > 1.
Example 5.1. 11. Another important example is the space C([O, l]; IF) with
the metric d(f,g) = II! - gllL= determined by the sup norm 1 · llL=· In this
space the open ball B (O, 1) around the zero function is infinite dimensional,
5_1. Metric Spaces and Continuous Functions 183
so it is hard to draw, but it consists of all functions that only take on values
with modulus less than 1 on the interval [O, l]. It contains sin(x) because
We now prove that the balls defined in Definition 5.1.6 are, in fact, open sets,
as defined in Definition 5.1.8.
Proof. Assume z E B(y, r - c). Thus, d(z, y) < r - E, so d(z, y) + E < r. This
implies that d(z, y) +d(y, x) < r, which by the triangle inequality yields d(z, x) < r,
or equivalently z E B(x, r) (see Figure 5.1). D
Example 5.1.13. * Consider the metric space (!Fn, d), where d is the Euclidean
metric. Let ei be the ith standard basis vector. Note that d(ei,ej) = J2
whenever i -# j. We can show that each basis element ei is contained in
exactly one ball in 'ef' = {B(ei, J2)}~ 1 U {B(-ei, J2)}f= 1 and that each ele-
ment x in the unit ball B(O, 1) is contained in at least one of the 2n balls in 'ef'
(see Figure 5.2 and Exercise 5.7) .
184 Chapter 5. Metric Space Topology
,,
, ---- ...... ,
'
-- -' -',.......
I
, ........ ,.,.
I
-- ....
....... e ,
\
2 I ',
I
I
I
I '\
I \
I
I
I
I
' '' I , ,,
. . ""'<- - - - , ,
\
' I
'
' ........ ____ , ' , I
By the definition of an open set, any open set can be written as a union of open
balls. We now show that any union of open sets is open and that the intersection
of a finite number of open sets is open.
Theorem 5.1.14. The union of any collection of open sets is open, and the inter-
section of any finite collection of open sets is open.
Proof. We first prove the result for unions. Let (Ga)aEJ be a collection of open
sets Ga indexed by the set J . If x E UaEJ Ga, then x E Ga for some a E J. Hence,
there exists c: > 0 such that B(x,c:) C: Ga, and so B(x,c:) C: UaEJGa . Therefore
UaEJ Ga is open.
We now prove the result for intersections. Let (Gk)~=l be a finite collection
of open sets. If x E n~=l Gk, then for each k there exists Ck > 0 such that
B(x,c:k) C Gk. Let c: = min{c:1 .. . c:n}, which is positive. Thus, B(x,c:) C Gk for
each k, and so B(x, c:) c n~=l Gk· It follows that n~=l Gk is open. D
Example 5.1.15. Note that an infinite intersection of open sets need not be
open. As a simple example, consider the following intersection of open sets in
JR with the usual metric:
The intersection is just the single point {O}, which is not an open set in JR.
Proof.
(i) By definition (E 0 )° C E 0 . Conversely, if x E E 0 , then there exists an open
ball B(x, 6) C E. By Theorem 5.1.12 every pointy E B(x, 6) is contained in
B(x, 6)° CE . Hence, B(x, 6) C E 0 , which implies x E (E 0 ) 0 •
(ii) Assume G is an open subset of E. If x E G, then there exists c > 0 such that
B(x, c) C G, which implies that B(x, c) C E; it follows that x E E 0 • Thus,
Ge E 0 •
(iv) Let (Ga.)a.EJ be the collection of all open sets contained in E. By (ii), we have
that Ga. C E 0 for all a E J. Thus, Ua.EJ Ga. C E 0 • On the other hand, if
x E E 0 , then by definition it is contained in an open ball B(x,c) CE, which
by Theorem 5.1.12 is an open subset of E. D
lf(x , y) - f(O , O)I = Ix - YI :::; lxl + IYI :::; 2ll(x, y)ll2 = 2d((x, y), 0) .
Set ting 6 = c/ 2, we have lf(x, y) - f(O , O) I < c whenever ll(x, y)ll2 < 6.
186 Chapter 5. Metric Space Topology
(ii) & If (V, II · llv) and (W, II · llw) are normed linear spaces, then every
bounded a linear transformation T E gg(v, W) is continuous at each xo E
V. Given c. > 0, we set 8 = c./(1 + llTllv,w ). Hence,
(iii) & A function f: X---+ Y is Lipschitz continuous (or just Lipschitz for
short) if there exists K > 0 such that p(f(x1), f(x2)) ::::; Kd(x1, x2 ) for
all x 1, x 2 E X. Every Lipschiitz continuous function is continuous on all
of X (set 8 = c./K).
(iv) & For fixed xo EX , the map f: X---+ JR given by f(x) = d(x,xo) is
continuous at each x E X . To see this, for any c. > 0 set 8 = c., and thus
Nota Bene 5.2.4. Recall t hat t he notat ion f- 1 (U ) does not mean that f
has an inverse. T he set f - 1 (U) = {x E X \ f (x) E U} always exists (but may
be empty), even if f has no inverse.
Proof. If UC Z is open, then g- 1 (U) is open in Y, which implies that f - 1 (g- 1 (U))
is open in X. Hence, h- 1 (U) is open whenever U is open. D
Proof. This follows from the definition of continuity by just writing out the
definition of the various open balls. More precisely, x E B(xo, 6) if and only if
d(x, x 0 ) < 6; and f(x) E B(f(xo) , c) if and only if p(f(x), f(xo)) < E:. D
Proposition 5.2. 7. Consider the space lFn with the metric dp induced by the
p -norm for some fixed p E [l , oo]. Let II · II be any norm on lFn. The function
f : lFn ---+ IR given by f(x) = l\xll is continuous with respect to the metric dp. In
particular, it is continuous with respect to the usual metric (p = 2) .
Proof. Let M = max( ll e 1 ll, ... , ll en ll ), where e i is the ith standard basis vector.
Let
q= { p~l, p E (1, oo],
oo, p = 1.
Note that f; + i= 1, and for any z = (z1, ... , Zn) = L;~=l zi ei, the triangle
inequality together with Holder 's inequality (Corollary 3.6.4) gives
188 Chapter 5. Metric Space Topology
n
f(z) = ll z ll SL lzillleill S
i=l
x2y3
(x,y) =F (0,0) ,
f(x , y) = (x2 + y2)2'
{
0, (x, y) = (0, 0) ,
is continuous at zero. To see this, note that lxl S (x 2 + y 2) 112 and IYI S (x 2 +
y 2) 112, and thus lx 2y 3 1::::; (x 2 -t-y 2) 5 12. This gives lf(x,y) -0 1::::; (x 2 +y 2) 112,
and thus lim(x ,y)-+(O,o) f (x, y) = 0.
Proof. We prove the case of the product hf. The proof of the sum is similar and
is left to the reader .
Let c > 0 be given. Since f is continuous at xo, there exists 61 > 0 so that
II! (x) - f (xo) II < 2(1h(:a)l+ l) whenever d(x, xo) < 61 . Since h is continuous, choose
5.2. Continuous Functions and Limits 189
62 > 0 so that lh(x) - h(xo)I < min(l, 2(ll JCX:l ll+l)) whenever d(x,xo) < h Note
that [h(x) - h(xo)I < 1 implies [h(x)I < ih(xo)I + 1, so we have
Nota Bene 5.2.13. Despite the misleading name, limit points of a set are
not the same as the limit of a function. In the following section we also define
limits of a sequence, which is yet another concept with almost the same name.
We would prefer to use very distinct names for these very distinct ideas, but
in order to communicate with other analysts, you need to know them by the
standard (and sometimes confusing) names.
Example 5.2.16.
(i) For any subset E of a space with the discrete metric, each point is an
isolated point of E.
(ii) Each element of the set Z x Z c JR x JR is an isolated point of Z x Z. For
any p = (m,n) E Z x Z, it is easy to see that B(p, 1) "\ {p} does not
intersect Z x z.
190 Chapter 5. Metric Space Topology
Remark 5.2 .20. An immediate consequence of the theorem is that a finite set has
no limit points.
Proof. (===?) Assume that U is open. For every x E U there exists an E > 0 such
that B(x , E) c U, which implies that B( x , E) n uc is empty. Thus, no x E U can be
a limit point of uc. In other words, uc contains all its limit points. Therefore, uc
is closed.
(<==) Conversely, assume that uc is closed and x E U. Since x cannot be a
limit point of uc, there exists E > 0 such that B(x,E) c U. Thus, U is open. 0
Nota Bene 5.3.4. Open and closed are not "opposite" properties. A set can
be both open and closed at the same t ime. A set can also be neit her open
nor closed .
For example, we already know t hat 0 and X are open , and by t he previ-
ous theorem both 0 and X are also closed sets, since X = 0c and 0 = x c. In
t he discrete metric every set is b oth open and closed. The interval [O, 1) c IR
in t he usual metric is neit her open nor closed.
Corollary 5.3.5. The intersection of any collection of closed sets is closed, and
the union of a finite collection of closed sets is closed.
Proof. These follow from Theorems 5.1.14 and 5.3.2, via De Morgan's laws (see
Proposition A. l.11). D
Remark 5.3.6. Note that the rules for intersection and union of closed set in
Corollary 5.3.5 are the "opposite" of those for open sets given in Theorem 5.1.14.
Corollary 5.3. 7. Let (X, d) and (Y, p) be metric spaces. A function f : X-+ Y is
continuous if and only if for each closed set F C Y the preimage 1- 1 (F) is closed
in X.
Proof. This follows from Theorem 5.2.3 and the fact that 1- 1 (Ec) = f- 1 (E)c; see
Proposition A.2.4. D
(i) The closed ball centered at x 0 with radius r is the set D(x0 , r) = {x E
X I d(x, xo) ::; r}. This set is closed by Corollary 5.3.7 because it
is the preimage of t he closed interval [O, r] under the continuous map
l(x) = d(x, xo); see Example 5.2.2 (iv).
(ii) Singleton sets are always closed, as are finite sets by Remark 5.2.20.
Note that a singleton set {x} can also be written as the intersection of
closed balls n~=l D(x, ~) = {x} .
(iii) The unit circle S 1 c IR 2 is closed . Note that l( x, y) = x 2 + y 2 is
continuous and the set {1} E IR is closed, so the set 1- 1 ({1}) = {(x,y) I
x 2 + y 2 = 1} is closed. In fact , for any continuous 1 : X -+ IR, any set
of the form 1- 1 ({c}) C Xis closed.a
0
-The sets l- 1 ({ c}) are called level sets because if we consider the graph {(x ,y,l(x,y)) I
(x, y) E IR. 2 } of a function l: IR. 2 -+ IR., t hen the set l- 1 ({c} ) is t he set of all points of
IR. 2 that map t o a point of height (level) c on the graph. Contour lines on a topographic
map are level sets of the function that sends each point on the surface of the earth to
its alt itude.
192 Chapter 5. Metric Space Topology
(i) E is closed.
Proof.
(i) It suffices to show that Ee is open. If we denote the set of limit points of Eby
E ' , then for any p E Ec =(EU E')c = Ee n (E')c, there exists an c: > 0 such
that B(p, c: ) '\ {p} c Ec. Combining this with the fact that p E Ee gives
B(p,c:) c Ee. Moreover, if there exists q EE' n B(p,c:), then B(p,c:) is a
neighborhood of q and therefore must contain a point of E , a contradiction.
Therefore B(p,c:) C (E')c, which implies that Ee is open.
(ii) It suffices to show that E' C F'. If p E E', then for all c: > 0, we have that
B(p,c:) n (E'\ {p}) =/:- 0, which implies that B(p,c:) n (F'\ {p}) =/:- 0 for all
c > 0. Thus, p E F' .
(iv) Let F be the intersection of all closed sets containing E. By (ii), we have
E c F, since every closed set that contains E also contains E. By (i), we
have F CE, since Eis a closed set containing E. Thus, E = F . D
E x ample 5.3.12. Let x EX and r 2". 0. Since D(x,r) is closed, we have that
B( x , r) C D(x, r) . If X = lFn, then B(x, r) = D(x, r ). However, in the dis-
crete metric, we have that B(x, 1) = {x}, whereas D(x, 1) is the entire space.
5.3. Closed Sets, Sequences, and Convergence 193
Definition 5.3.14. We say that x E X is a limit of the sequence (xi)~ 0 if for all
E> 0 there exists N > 0 such that d(x, xn) < E whenever n :::: N . We write Xk --+ x
or limk_, 00 Xk = x and say that the sequence converges to x.
Example 5 .3.16.
(i) Consider the sequence((~, n~ 1 ))~=0 in the space !R 2 . We prove that the
sequence converges to the point (0, 1) . Given c > 0, choose N so that
'{J < E. Thus, whenever n :::: N, we have
1 1 v'2
-n 2 + (n+ 1) 2 < -n <E.
Proof. Let (xn)~= O be a sequence in X. Suppose that Xn --+ x and also that
Xn --+ y i= x. For c = d(x, y) , there exists N > 0 so that d(xn, x) < ~ whenever
n:::: N. Similarly, there exists m > N with d(xm,Y) < ~· However, this implies
d(x,y) :<::: d(x,xm) +d(xm,Y) < ~ + ~ = E = d(x,y), which is a contradiction. 0
194 Chapter 5. Metric Space Topology
Nota Bene 5.3.18 . Limits and limit points are fundamentally different
t hings. Sequences have limits, and sets have limit points. The limit of a
sequence, if it exists at all, is unique but need not be a limit point of the et
{x 1, x 2, . . . } of terms in the sequence. Conversely, a limit point of t he et
{x1, x 2, . .. } of terms in the sequence need not be the limit of the sequence.
The example below illustrate these differences.
(i) T he equence
r11n if n even,
Xn = \ 1- 1/n if n odd
does not converge to any limit. Both of the points 1 and 0 are limit
point s of the et {xo, x1, x2, ... }.
(ii) The sequence 2, 2, 2, .. . (that is, t he sequence Xn = 2 for every n) con-
verges to the limit 2, but the set {x o, x 1 , x 2, ... } = {2} has no limit
points.
Proof. (==?) Assume t hat f is continuous at x * E X. Thus, given E > 0 there exists
a J > 0 such that p(f(x),f(x*) ) < c whenever d(x,x*) < J . Since Xk--+ x *, there
exists N > 0 such that d(xn,x*) < 5 whenever n 2 N. T hus, p(f(xn),f(x*)) < E
whenever n 2 N, and t herefore (f (xk ) ) ~ 0 converges to f(x*).
({==) If f is not continuous at x* E X , then t here exists E > 0 such t hat for
each J > 0 there exist s x E B (x *,J) with p(f(x) , f(x*)) 2 E . For each k E z+,
choose Xk E B(x* , t)
wit h p(f (xk), f(x*)) 2 E. T his implies that Xk --+ x *, but
(f(xk))f=o does not converge to f (x*) because p(f(xk), f(x*)) 2 E. D
Example 5.3.20. We can often use continuous functions to prove that a se-
quence converges. For example, the sequence Xn = sin( ~ ) converges to zero,
since ~ -> 0 as n --+ oo and t he sine function is continuous and equal to
zero at zero.
Unexample 5.3.21.
(i) T he function f : R. 2 --+ JR given by
xy
if (x, y) -f. (0, 0),
f( x, y)= ~2+ y2
{ if (x, y) = (0, 0)
5.4. Completeness and Uniform Continuity 195
is not cont inuous at the origin because if Xn = (1 /n, l /n) for every
n E z+, we have f (x n) = 1/ 2 for every n , but f (limn-+oo Xn ) = 0.
(ii) .&. T he derivative map D(f ) = f' on the set lF [x] of polynomials is
not cont inuous in the £ 00 -norm on C([O, l ]; lF). For each n E z+ let
f n(x) = x: . Note that ll f nll L= = ~ ; therefore, fn ~ 0. And yet
ll D(fn)llu"' = 1, so D(fn) does not converge to D(O) = 0. Since D does
not preserve limits, it is not continuous at the origin. See Exercise 5.21
for a generalization.
Proof. The function f(x) = d(x, y) is continuous (see Example 5.2.2 (iv)), so the
result follows from Theorem 5.3.19. D
Example 5.4.2.
11/m - 1/nl = l(n - m)/mnl < ln/mnl = 11/ml :::; 1/N < c.
Unexample 5.4.3.
(i) Even when the difference Xn - Xn- l goes to zero, the sequence (xn)~=l
need not be Cauchy. For example, the sequence given by Xn = log(n)
in JR satisfies Xn - Xn-1 = log(n/(n - 1)) = log(l + 1/(n - 1)) -+ 0,
but the sequence is not Cauchy because for any m we may take n = km
and then lxn - xml = log(n/m) = log(k). This difference can be made
arbitrarily large, and thus does not satisfy the Cauchy criterion.
(ii) The sequence given by f n(x) = nxn is not a Cauchy sequence in the
space C([O, 1]; JR) of continuous functions on [O, 1] with the metric in-
duced by the sup norm. To see this, we note that
whenever m, n 2 N . D
Nota Bene 5.4.5. Not all Cauchy sequence are convergent. For example,
in the space X =JR '\. {O} with the usual metric d(x, y) = lx-yl, the sequence
(1/n)~=l is Cauchy, and it does converge in JR, but it doe not converge in X
because its limit 0 i not in X.
54. Completeness and Uniform Continuity 197
Proof. Let (xk )k°=o be a Cauchy sequence, and choose c = l. Thus, there exists
N > 0 such that d(xn , Xm) < 1 whenever m , n 2: N. Hence, for any fixed x E X
and any m 2: N, we have that
d( xn , x ) :=:; d(xn , Xm) + d(xm , x) :=:; 1 + d(xm , x) ,
whenever n 2: N. Setting
M = max{d(x1,x), ... , d(xN-1,x), 1 + d(xm,x)}
gives d(xk , x) :=:; M for all k E N. D
We now need the idea of a subsequence, which is, as the name suggests, a
sequence consist ing of some, but not necessarily all, the elements of t he original
sequence. Here is a careful definition.
Nota Bene 5.4.5 gives some examples of Cauchy sequences that have no limit.
In some sense, these indicate a hole or gap in the space, leaving it incomplete. This
motivates the following definition.
Unexample 5.4.12.
(i) The set Q of rational numbers with the usual metric d(x , y) = Ix -yl is
not complete. For example, let (xn);;:='=o be the sequence
(ii) The space JR"- {O} is not complete (see Nota Bene 5.4.5).
(iii) The vector space C([O, 2]; JR) with the L 1-norm 11111£1 = f0 lf(t) I dt is
2
( t)-{tn, t'.Sl,
9n - 1, t 2'. 1.
Given c > 0 let N 2'. l/c. Since every 9n is equal to every 9m on the
interval [l, 2], if m, n > N, then
1
ll9n - 9mllL1 = fo ltn - tmj dt
c
}
= {o, l,
t < 1,
t ::::: 1.
We do not prove this here, but it follows easily from the monotone
convergence theorem (Theorem 8.4.5).
Remark 5.4.13. In Section 9.1.2 we show that every metric space can be uniquely
extended to a complete metric space by creating equivalence classes of Cauchy
5.4. Completeness and Uniform Continuity 199
sequences; that is, two Cauchy sequences are equivalent if the distance between
them converges to zero. This is also the idea behind one construction of the real
numbers JR from Q. In other words, JR can be constructed as the set of all Cauchy
sequences in Q modulo this equivalence relation (see also Vista 9.1.4).
Theorem 5.4.14. The fields JR and C are complete with respect to the usual metric
d(x,y) = lx - yl.
Theorem 5.4.16. For every n E N and p E [1, oo], the linear space lFn with the
norm 1 · llP is complete.
Remark 5.4.17. While the previous theorem shows that lFn is complete in the
p-metric, Corollary 5.4.25 shows that lFn is complete in any metric that is induced
by a norm (but not necessarily in metrics that are not induced by a norm, like the
discrete metric).
Proof. The proof that bounded linear transformations are uniformly continuous is
Exercise 5.24. Conversely, Exercise 5.21 shows that any continuous linear transfor-
mation is bounded. D
Recall that Cauchy sequences are not preserved under continuity (see
Example 5.4.10). The following theorem says that they are, however, preserved
under uniform continuity.
Proof. Given any finite-dimensional normed linear space (Z, II· II), Corollary 2.3.12
guarantees there is an isomorphism of vector spaces f : Z -t Fn. We can make
!Fn into a normed linear space with the Euclidean norm II · 1 2 . By Remark 3.5.13,
every linear transformation of finite-dimensional normed linear spaces is bounded,
and thus f and f - 1 are both bounded linear transformations. Moreover,
Proposition 5.4.22 guarantees that bounded linear transformations are uniformly
continuous.
Given any Cauchy sequence (zk)~ 0 in Z, for each k let Yk = f(zk) E !Fn. By
Theorem 5.4.24, the sequence (Yk)k=O must also be Cauchy, and thus has a limit
y E Fn, since (Fn, I · llz) is complete. For each k we have f - 1 (Yk) = zk, and so
limk---+oo Zk = limk---+oo f - 1 (Yk) = f- 1 (y) exists, since f - 1 is continuous. D
Th e single-variable case
The completeness of JR relies upon the following fundamental property of the real
numbers called Dedekind's property or the least upper bound property. For a proof
of this property, we refer the reader to [HS75].
Theorem 5.4.27. Every nonempty subset of the real numbers that is bounded above
has a supremum.
Lemma 5.4.28. Every sequence (Yn)~=O in JR that is bounded above and is mono-
tone increasing {that is, Yn ::::; Yn+l for every n E N) has a limit.
Proof. Since the set {Yn I n E N} is bounded above, it has a supremum (least
upper bound) by Dedekind's property. Let y = sup{yn I n E N}. If E > 0, then
202 Chapter 5. Metric Space Topology
there exists some m > 0 such that IYm - YI < € . If not, then y - c/2 would also be
an upper bound for {Yn I n E N} and y would not be the supremum.
Since the sequence is monotone increasing, we must have Ym ::; Yn ::; y for
every n > m. Therefore, IY - Ynl :::; IY - Yml < c for all n > m, so y is the
desired limit. D
Higher dimensions
Now we prove Theorem 5.4.15, which says that if ((Xi, di)): 1 is a finite collection
of complete metric spaces, then the Cartesian product X = X 1 x X 2 x · · · x Xn is
complete when endowed with the p-metric (5.4) for 1 ::; p::; oo.
c > dP(x•
~'
x
m
) = °'"' d(x(i) x(m))p
n
( L_...; i i ' i
) l/p
> (d·(x(i) x(m))p)l/p
- J J ' J
= d·(x(£) x(m))
J J ' J
i =l
for every j. In either case, each sequence (x~k))'k::o c Xj is Cauchy and converges
to some Xj E Xj .
Define x = (x1, ... ,xn)· We show that (xk)r=o converges to x. Given c; > 0,
choose N so that
di (xi(e) , xi(m ) ) :::; dP( xe , Xm ) < n +
c ,
1
whenever €, m ::'.'.: N . Letting f, ---+ oo gives di(xi, x~m)) :::; n~l. When p = oo, this
The next-to-last inequality follows from the fact that aP + bP :::; (a+ b)P for any
nonnegative numbers a, b and any p E [1, oo) .
For all p we now have dP(x , xm) < c; , whenever m :::; N. It follows that the
Cauchy sequence (xk)'k::o C X converges and that X is complete. D
5.5 Compactness
Recall from Corollary 5.3. 7 that for a continuous function f : X ---+ Y and for a
closed subset Z c Y, the preimage f- 1 (Z ) is closed. But if W c X is closed,
then the image f (W) is not necessarily closed. In this section we discuss a property
of a set called compactness, which seems just slightly stronger than being closed,
but which has many powerful consequences. One of these is that continuous func-
tions map compact sets to compact sets, and this guarantees that every real-valued
continuous function on a compact set attains both its minimum and its maximum .
These lead to many other important consequences that you will use in this book
and far beyond.
The definit ion of a compact set may seem somewhat strange-it is given in
unfamiliar terms involving various collections of open sets-but we prove the Heine-
Borel theorem (see Theorem 5.5.4 and also 5.5.12), which gives a simple description
of compact sets in !Rn as those that are both closed and bounded.
Throughout this section we assume that (X, d) and (Y, p) are metric spaces.
Definition 5.5.1. A collection (GOl.)Ol.EJ of open sets is an open cover of the set E
if E C LJOl.EJ GO!. . A set E is compact if every open cover has a finite subcover; that
is, for every open cover (G01.)01.EJ there exists a finite subcollection (GOl.)Oi.EJ', where
J' C J is a finite subset, such that E C UOl.EJ' GO!..
204 Chapter 5. Metric Space Topology
Proof. Let F be a closed subset of a compact set K, and let 'ti= (Ga)aEJ be an
open covering of F . Thus, 'ti U {Fe} is an open covering of K, which has a finite
subcovering {Fe, Ga 1 , ••• , Gan}. Hence, (Gak)k=l is a finite subcover of F. D
Proof. First we show that every n-cell is compact, that is, every set of the form
[a, b] = {x E JR.n I a::; x::; b} (meaning ak::; Xk::; bk for all k = 1, . .. ,n).
Suppose (Ga)aEJ is an open cover of Ii = [a, b] that contains no finite sub-
cover. Let c = atb be the midpoint of a and b, meaning each Ck = ak!bk . The
intervals [ak, ck] and [ck, bk] determine 2n n-cells, at least one of which, denoted I2,
cannot be covered by a finite subcollection of (Ga)aEJ. Subdivide hand repeat. We
have a sequence (h)'f:: 1 of n-cells such that In+l C In, where each In is not covered
by any finite subcollection of (Ga)aEJ and x , y E In implies llx-yll2 ::; 2-n llb-all2·
By choosing Xk E h, we have a. Cauchy sequence (xk)'f:: 0 that converges to
some x, since JR.n is complete. However, x E Ga for some o, and since Ga is open,
it contains an open ball B(x, r) for some r > 0. There exists an N > 0 such that
2-Nllb - all2 < r, and thus In C B(x,r) C Ga for all n 2:: N . This gives a finite
subcover of all these In, which is a contradiction. Thus, [a, b] is compact.
Now let E be any closed and bounded subset of JR.n. Because E is bounded,
it is contained in some n-cell [a, b] . Since E is closed and [a, b] is compact,
Proposition 5.5.2 guarantees that E is also compact. D
Example 5.5.5. * The Heine-Borel theorem does not hold in general (infinite-
dimensional) spaces. Consider the vector space JF 00 c €2 (see Example l. l.6(iv))
defined to be the set of all infinite sequences (x1, x2, ... ) with at most a finite
number of nonzero entries.
5.5. Compactness 205
The unit sphere in this space (with the Euclidean metric) is closed and bounded,
but not compact. Indeed, Example 5.1.13 and Exercise 5.7 give an example
of an open cover of the unit ball that has no finite subcover.
Proposition 5.5.6. The continuous image of a compact set is compact; that is, if
f: X -+ Y is continuous and KC X is compact, then f(K) CY is compact.
Proof. The image f(K) is compact, hence closed and bounded in R Because it
is bounded, its supremum and infimum both exist. Let M be the supremum, and
let (xn)~= O be a sequence such that f (xn) -+ M. Since K is compact, there is a
subsequence (xnJ ~o converging to some value x E K. Since it is a subsequence,
we must have f (xnJ -+ M, but continuity of f implies that f (xnJ -+ f(x); see
Theorem 5.3.19. Therefore, M = f(x ) E J(K). A similar argument shows that the
infimum lies in f (K). D
Example 5.5.8 . .& Recall from Example 3.5.4 and F igures 3.5 and 3.6 that
different metrics on a given space define different open balls. For example, the
1-norm unit sphere 5 1 = {x E lFn I llx lli = 1} is really a square when n = 2
and an octahedron when n = 3, whereas the 2-norm unit sphere is really a
circle when n = 2 and a sphere when n = 3.
The 1-norm unit sphere 5 1 is both closed and bounded (with respect to
both the 1-norm and the 2-norm), so it is compact. Consequently, for any
continuous function f : lFn -+IR, the image f (51 ) contains both its maximum
and minimum.
In particular, if II· I is any norm on lFn, then by Proposition 5.2 .7 the map
f (x) = ll x ll is continuous (with respect to both the 1-norm and the 2-norm) ,
206 Chapter 5. Metric Space Topology
and hence there exists Xmax , Xm i n E S1 such that supxESi llxll = ll x maxl l and
infxES 1 ll x ll = llxminll-
This example plays an important role in the proof of the remarkable
Theorem 5.8.7, which states that the open sets defined by any norm on a
finite-dimensional vector space are the same as the open sets defined by any
other norm on the same space.
Proof. Let c > 0 be given. For each x E K, there exists Ox > 0 such that
p(J(x),f(y)) < c/2 whenever d(x,y) <Ox · Let (B(x, "2))xEK be an open cover of
K. Since K is compact, there exists a finite subcover (B(xk, ";k
))k=l' So, given
any y E K, there exists k E {l, 2, ... , n} such that d(y, xk) < ~Oxk.
Leto= min{Oxk}k=i· Ify, z EK with d(y, z) < o/2, then we have
Definition 5.5. 10 .
(i) A collection~ of sets in X has the finite intersection property if every finite
subcollection of~ has a nonempty intersection.
(ii) The space X is sequentially compact if every sequence (xk)~ 0 c X has a
convergent subsequence.
(iii) The space Xis totally bounded if for all c >0 the cover~= (B(x,c))xEX
has a finite subcover.
(iv) A real number co is a Lebesgue number of an open cover (Ga)a EJ if for all
x E X and for all c < co the ball B (x , c) is contained in some G °'.
Theorem 5.5.11. Let (X, d) be a metric space. The following are equivalent:
(i) X is compact.
5.5. Compactness 207
(ii) Every collection <ef of closed sets in X with the finite intersection property has
a nonempty intersection.
(iv) X is totally bounded and every open cover has a positive Lebesgue number
{which depends on the cover).
Proof.
(i) ==* (ii) Assume X is compact. Let <ef = (Fa)aEJ be a collection of closed sets
with the finite intersection property. If naEJ Fa = 0, then (Fi)aEJ is an open
cover of X. Hence, there exists a finite subcover ( FiJ ~=i · But this implies
that n~=l Fa. = 0, which is a contradiction.
1 .
c(x) = "2 sup{ c5 > 0 j 30:: E J with B(x , 8) C Ga}·
Since x is an interior point of Ga, we have c(x) > 0 for all x E X . Define
E* = infxEX c(x). This is clearly a Lebesgue number of the cover, and we
must show that€* > 0.
Since E* is an infimum, there exists a sequence (xk)k°=o c X so that E(xk) ---+
E* . Because X is sequentially compact, there is a convergent subsequence.
Replacing the original sequence with the subsequence, we may assume 35 that
xk ---+ x for some x.
Let G /3 be a member of the open cover such that B(x, E(x)) c G /3. Since
xk ---+ x , there exists N 2'. 0 such that d(x, Xn) < c:<;) whenever n 2'. N.
If z E B(xn, c:<;) ), then d(x, z) :::; d(x, Xn) + d(xn, z) < c:<;) + c:<;) = c(x),
which implies that z E B(x, c(x)) C G13. Thus, B(Xn, c:<;)) C G13, and
E(xn) 2'. c:~x) > 0 for all n 2'. N. This implies that €* 2'. c:~x) > 0.
Now suppose X is not totally bounded. There is an E > 0 such that the
cover (B(x, c) )xEX has no finite subcover. We define a sequence (xk)k°=o as
follows: choose any x 0 ; since B(x0 ,c) does not cover X, there must be an
x 1 E B(x0,c)c. Since the finite collection (B(xo,E),B(x1,E)) does not cover,
there must be an x 2 E X t hat is not in the union of these two balls. Continuing
in this manner , we construct a sequence (xk)k°=o so that d(xk, xz) 2'. E for all
k =fl. Thus, (xk)k°=o has no convergent subsequence, and Xis not sequentially
compact, which is a contradiction.
35 Since c: (x ) is not known to be continuous, we can't assume c:(xn) -r c:(x ).
208 Chapter 5. Metric Space Topology
(iv)===? (i) Let 'ff= (Go:)o:EJ be an open cover of X with Lebesgue number c:*.
Since X is totally bounded, it can be covered by a finite collection of balls
(B(xk, c:))k=l where c: < c:*. Each ball B(xk, c:) is contained in some G°'< of
the open cover, so the finite collection (Go:<)k=l is a subcover. Thus, X is
compact. D
5.5.4 * Subspaces
If d is a metric on X, then d induces a metric on every Y C X; that is, restricting
d to the set Y x Y defines a function p : Y x Y -+ [O, oo) that itself satisfies all
the conditions for a metric on the space Y. We say that Y inherits the metric p
fromX.
Remark 5.5.15. Theorem 5.1.12 applied to the space Y shows that the ball By(x, r)
is open in Y. But it is not necessarily open in X, as we see in the next example.
Proposition 5.5.17. Let (X, d) be a metric space, and let Y C X be a subset with
the induced metric p, so that (Y, p) is itself a metric space. A subset E of (Y, p) is
open if and only if there is an open subset U c X such that E = U n Y. Similarly,
a subset F of (Y, p) is closed if and only if there is a closed set C c X such that
F = CnY .
Proof. If Eis an open set in (Y, p) , then around every point x E E, there is a ball
By(x , rx) CE. Let U = UxEE Bx(x , rx) C X. Since U is the union of open balls
in X, it is open in X. For each x E Ewe certainly have x E U, thus E c Un Y.
But we also have Bx(x, rx) n Y = By(x, rx) c E, so Un Y c E.
Conversely, if U is open in X , then for any x E U n Y , we have some ball
Bx(x, rx) c U. Therefore, By(x, rx) = Bx(x, rx) n Y c U n Y , so Un Y is open
in (Y, p).
The statement about closed sets follows from taking the complement of the
open sets. D
Example 5.5.19. The integers Z as a subspace of JR, with the usual metric,
has the property that every element is an open set. Thus, the family !!7 of all
open sets is the power set of Z, sometimes denoted 2z . This is the most open
210 Chapter 5. Metric Space Topology
Now that we have defined an induced metric on subsets, there are actually
two different ways to define compactness of any subset Y C X . The first is as a
subset of X with the usual metric on X (that is, in terms of open sets of X). The
second is as a subspace, with the induced metric (that is, in terms of open sets of
Y) . Fortunately, these are equivalent.
Definition 5.6.1. For any sequence of functions Un)':'=o from a set X into a
metric space Y, we can evaluate all the functions at a single point x of the domain,
which gives a sequence Un(x))':'=o CY . If f or every choice of x EX, the sequence
Un(x))'::::'=o converges in Y, then we can define a new function f by setting f(x) =
limn-+oo fn(x). In this case we say that the sequence converges pointwise, or that f
is the pointwise limit of Un) ·
for all n. In the next proposition we see that if there were a uniform limit, it
would have to be the same as the pointwise limit.
Example 5.6. 7. Most of t he normed linear spaces that we work with in ap-
plied mathematics are Banach spaces. We showed in Theorem 5.4. 16 that
(lFn , 11 · llp) is a Banach space. Below we show that (C([a, b]; IR), 11 · llL=) is also
a Banach space. Additional examples include the spaces £P for 1 ::::; p ::::; oo
(see Example l.1.6(vi)).
Remark 5.6.12. If a series converges absolutely in X, then for every c > 0 there
is an N such that the sum 2=%':n llxkll < c whenever n :'.'.'. N.
Proof. It suffices to show that the sequence of partial sums (sk)k=O converges in
X. Let c > 0 be given. If the series 2::%': 0 Xk converges absolutely, then there exists
N such that 2=%':n llxk I < c whenever n :'.'.'. N. Thus, we have
whenever n > m :'.'.'. N. This implies that the sequence of partial sums is Cauchy,
and hence it converges, since X is complete. D
5.7. The Continuous Linear Extension Theorem 213
Example 5.6.14. For each n E .N let fn E C([O, 2]; JR) be the function fn( x) =
xn /n!. The sum Z:::::~=O f n converges absolutely because the series
00 00 00
L llfnllL= = n=O
n=O
L sup
xE [0, 2]
lxnl/n!= n=O
L 2n/n!= e 2
converges in R
Theorem 5.6.15. If a sum 'L:~=O Xk converges absolutely to x EX, then for any
rearrangement of the terms, the rearranged sum converges absolutely to x. That is,
if f: .N---+ .N is a bijection, then 'L:~=O Xf(k) also converges absolutely to x.
Proof. For any E. > 0 choose N > 0 such that Z:::::~= N l xnll c./2.
< Since N
is finite, there must be some M :::: N so that the set {1, 2, ... , N} is a subset of
{f(l), .. ., f(M)}. For any n > M let En = {f(l),. . ., f(n)} \_ {1, 2,. . ., N} C
{N+l ,N+ 2,. .. }. We have
l x- txf(k)l
k=O
l = l x- k=O
t x k + t x k - txf(k) ll
k=O k=O
Theorem 5.7.1. Let (X, 11 · llx) be a normed linear space, and let (Y, 11 · llY) be a
Banach space. The space @(X ; Y) of bounded linear transformations from X into
Y is a Banach space when endowed with the induced norm II · llx,Y. In particular,
the space X* = @(X;JF) of bounded linear functionals is a Banach space.
(i) Define A : X --+ Y as follows. Given any nonzero x E X and c > 0, there
exists N > 0 such that llAm-An ll x,Y < c/llxllx, whenever m,n 2'. N. Define
Yk = Akx for each k E N. Thus,
(ii) We now show that A E @(X; Y). First, we see that A is linear since
To show that A is a bounded linear transformation, again fix some c > O and
choose an N such that II Am - An llx,Y < c when m, n 2'. N. Thus,
(5.7)
5.7. The Continuous Linear Extension Theorem 215
It follows t hat
By Proposit ion 5.4.7, the sequence (Ak)r=o is bounded; hence, there exists
M > 0 such that ll An llx ,Y::; M for each n EN. Thus, llAx ll Y::; (c+M) ll x ll x,
and
llAxllY
ll A ll x ,Y =sup - - - ::; (c + M).
x;60 11 X 11 X
Therefore, A is in @(X ; Y).
(iii) We conclude the proof by showing that Ak ---* A with respect to the norm
II · ll x,Y · By (5 .7) we have ll Axll~;xll Y ::; E whenever n : :'.'. N and x f 0 . By
taking the supremum over all x f 0 , we have that llA-An ll x ,Y::; E, whenever
n :::'.'. N. D
f I ~~ I : ; f
k=O k=O
ll~)lk = ell All < oo.
Theorem 5.7.5. Let (X, II · llx) be a Banach space. For any set S, the space
(L 00 (S; X), II · llL=) is a Banach space.
Proof. It suffices to show that the space is complete. Let (fk)f;= 0 c L 00 (S; X) be
a Cauchy sequence. For each fixed t E S, the sequence (fk( t) )f;= 0 is Cauchy and
thus converges. Define f(t) = limk-+oo fk(t) .
It suffices to show that II! - fnllL= ---+ 0 as n---+ oo and that llJllL= < oo. As
mentioned in Corollary 5.3.22, we can pass the limit through the norm. Specifically,
given c > 0, choose N > 0 such that llfn - f mllL= < c/2 whenever m, n 2': N. Thus,
for each s E S, we have
For z E Z , define T(z) = limk-+oo T(sk)· We now show this is well defined.
Let (sk)k=O be any other sequence in S converging to z. For any c > 0, by uniform
continuity ofT there is a 6 > 0 such that JJ T(a)-T(b) Jlx < c/ 2 whenever JJa-bJJz <
6. Choose K > 0 such that llsk - zllz < 6 and llsk - zllz < 6 whenever k :'.:: K.
Thus, we have
ll T(sk) -T(z)llx ::::; JJT(sk) -T(sk)llx + llT( sk) - T(z) llx < ~ +~ = c
whenever k :'.:: K. Therefore, T(sk) ---+ T(z), and the value of T(z) is independent
of the choice of (sk)k°=o·
It remains to prove that Tis linear, that ll T ll = llTll, and that Tis unique.
The linearity of T follows from the linearity of T. If (sk)k=O c S converges to z E Z
and (sk)~ 0 c S converges to z E Z, then for a, b E lF we have
To see the norm ofT, note first that if (sk)k=O c S converges to z E Z, then
ll T (z)ll = II lim T(sk)ll = lim llT( sk) ll ::::; lim llTll ll sk ll = ll T ll llzll ·
k-too k-+oo k-too
Therefore, we have
ll T ll = sup llT(z)ll ::::; llTll ,
zEZ llzll
z#O
The next proposition tells us that these two results hold for bounded linear
operators on an arbitrary Banach space- not just for matrices. The difficulty in
proving this comes from the fact that we have neither a determinant function nor
an analogue of Cramer's rule.
Proposition 5.7.7. Let (X, II· II) be a Banach space and define
GL(X) ={A E ,qg(x) I A- 1 E ,qg(x)}.
The set GL(X) is open in ,qg(X), and the function Inv(A) = A- 1 is continuous on
GL(X) .
Proof. Let A E GL(X) . We claim that B(A,r) c GL(X) for r = llA- 111- 1. This
shows that GL(X) is open.
To see the claim, choose LE B(A,r) and write L = A(I - A- 1 (A-L)). Since
llA - L ii < r, we have that
(5.8)
Given c > 0, set o = min( 211 A" i 11 2, 21TJ- 11 ), so that whenever llL -All <owe have
1
2
1 - llA- 1(A - L)JI < '
and thus
The proof of the previous proposition actually gives some explicit bounds on
the norm of inverses and the size of the open ball in GL(X) containing the inverse
of some operator. This is useful for studying pseudospectra in Chapter 14.
5.8. Topologically Equival ent Metrics 219
Definition 5.8.1. Let X be a set with two metrics da and db. Each metric induces
a collection of open sets, and this collection is called a topology. We denote the
set of all da -open sets by 5';,, and call it the a-topology. Similarly, we denote the
set of all db -open sets by 3b and call it the b-topology. We say that da and db are
topologically equivalent if they define the same open sets on X, that is, if 5';,, = 3i,.
Example 5.8.2.
(i) Let !!7 be the topology induced on a set X by the discrete metric (see
Example 5.l.3(iii)). For any point x E X and for any r < 1, we have
B(x, r) = {x }. Since arbitrary unions of open sets are open, this means
that any set is open with respect to this metric, and the topology on X
induced by this metric is the entire power set 2X of X.
220 Chapter 5. Metric Space Topology
(ii) The 2-norm on lFn induces the usual Euclidean metric and the Euclidean
topology. The open sets in this topology are what most mathematicians
usually mean when they say "open" without any other statement about
the metric (or the topology). In particular, open sets in IR 1 with the
Euclidean topology are infinite unions and finite intersections of open
intervals (a , b).
Example 5.3.12 shows that the Euclidean topology on lFn is not topo-
logically equivalent to the discrete topology.
(iii) The space C([O, 1]; IR) with the sup norm defines a topology via the
metric doo(f,g) = llf - gllL= = SUPxE[O,l] lf(x) - g(x)I .
We may also define the L 1 -norm and its corresponding metric as d1 (f, g) =
II! - gll1= f01lf(t) - g(t)I dt .
Every open set of the L 1 topollogy is also an open set of the sup-topology
because for any f E C([O, 1]; IR) and for any c: > 0, the ball B 00 (f, c:/2) C
B 1 (f,c:) . To see this, observe that for any g E B 00 (f, c:/ 2) we have
II! - gllu = f01If - gl dt::::; fci1c:/2dt = c:/2 < c:.
In Unexample 5.8.6 we show that the L1 -metric is not topologically
equivalent to the L 00 -metric.
T heorem 5 .8. 3. Let X be a set with two metrics, da and db. The metrics da and
db are topologically equivalent if and only if for all x E X and for all c: > 0 there
exist ba , bb > 0 such that
and (5.10)
where B a and Bb are the open balls defined by the metrics da and db, respectively.
P roof. If da and db are topologically equivalent, then every ball Ba (x , c:) is open
with respect to db, and by the definition of open set (Definition 5.1.8) there must
be a ball Bb (x ,b) contained in Ba(x,c:) . Similarly, every ball Bb(x,c:) is open with
respect to da and there must be a ball Ba(X,"f) contained in Bb(x,c:).
Conversely, let U C X be an open set in the metric space (X, da)· For each
x E U there exists c: > 0 such that Ba(x ,c:) C U, and by hypothesis there is
a bb > 0 such that Bb(x,bb) C Ba(x,c:) C U . Hence, U is open in the metric
space (X, db)· Interchanging the roles of a and b in this argument shows that
every set that is open in (X, db) is also open in (X , da) · Hence, the topologies are
equivalent. D
Theorem 5.8.4. Let X be a vector space with two norms, II · Ila and II · lib· The
metrics induced by these norms are topologically equivalent if and only if there exist
constants 0 < m :::; M such that
(5 .11)
for all x E X.
Proof. (===> ) If (5.11) holds for some x E X, then it also holds for every scalar
multiple of x. Therefore, it suffices to prove that (5.11) holds for every x E B(O, 1).
Let da and db be the metrics on X induced by the norms II· Ila and II· lib, respectively.
If da and db are topologically equivalent, then by Theorem 5.8.3 there exist 0 < E1 <
1 and 0 < E11 < 1 such that
(~)If (5.11) holds for all x EX, then it also holds for all (x - y) EX; that
is, for all x, y E X we have
and
Therefore, by Theorem 5.8.3 (with Oa = mE and Ob = c/M) the two norms are
topologically equivalent. 0
Example 5.8.5. Recall the following inequalities on lFn from Exercise 3.17:
Theorem 5.8. 7. All norms on a finite -dimensional vector space are topologically
equivalent.
Remark 5.8.8. The previous theorem does not hold for infinite-dimensional spaces,
as shown in Unexample 5.8.6
Example 5.9.2. The map f : (0, 1) -+ (1, oo) C JR given by f(t) = l/t is
a homeomorphism: it is clearly bijective and continuous, and its inverse is
1- 1 (s) = 1/s, which is also continuous.
Example 5.9.5. Open sets are preserved under homeomorphism, as are closed
sets. Compactness is defined only in terms of open sets, so it is also preserved
under homeomorphism. Theorem 5.3.19 guarantees that convergence of se-
quences is preserved by continuous functions, and therefore convergence is a
topological property.
Corollary 5.9.7. Let (X,d) be a metric space. If pis another metric on X that is
topologically equivalent to d, then a sequence (xn);:::'=o in X converges to x in (X, d)
if and only if it converges toxin (X, p) .
map i : ( X, da) -+ ( X, db) is uniformly continuous (and its inverse is also uniformly
continuous), so Cauchy sequences and completeness are preserved by topologically
equivalent norms.
Propos it ion 5.9.8. If (X, I · Ila) is a normed linear space, and if 1 · lib is another
norm on X that is topologically equivalent to I · Ila, then the identity map i : (X, 11 ·
Ila) -+ (X, II · llb) is uniformly continuous with a uniformly continuous inverse.
P roof. It is immediate from the definition of topologically equivalent norms that
the identity map is a bounded linear transformation, and hence by Proposition 5.4.22
it is uniformly continuous. The same applies to its inverse. D
R e m ark 5 .9.9. It is important to note that the previous result only holds for
normed linear spaces and metrics induced by norms. It does not hold for more
general metrics.
Connectedness
Defin ition 5.9.10. A metric space Xis disconnected if there are disjoint nonempty
open subsets U and V such that X = U UV . In this case, we say the subsets U and
V disconnect X. If X is not disconnected, then it is connected.
Rem ark 5 .9 .11. If X is disconnected, then we can choose disjoint open sets U
and V with X = U U V . This implies that U = vc and V = uc are also closed.
Hence, a space X is connected if and only if the only sets that are both open and
closed are X and 0.
Example 5.9.12.
(i) As we see later in this section, the line JR. is connected. But the set
JR"' {O} is disconnected because it is the union of the two disjoint open
sets (-oo, 0) and (0, oo).
5.9. Topological Properties 225
(ii) If X is any set with at least two points and d is the discrete metric on
X , then (X, d) is disconnected. This is because every set in t he discrete
topology is both open and closed; see Example 5.8.2(i).
Proof. Suppose Y is not connected. There must be nonempty disjoint open sets U
and V satisfying Y = U UV. Since f is continuous and surjective, the sets f- 1 (U)
and f- 1 (V) are also nonempty disjoint open sets satisfying X = f - 1 (U) U f- 1 (V).
This is a contradiction. D
Consequences of connectedness
Our first important consequence of connectedness is a corollary of Theorem 5.9.13.
Proof. If no such z exists, then f- 1 ( ( -oo, c)) and f - 1 ( ( c, oo)) are nonempty and
disconnect X, which is a contradiction. D
The intermediate value theorem leads to our first example of a fixed-point the-
orem. Fixed-point theorems are used throughout mathematics and are also widely
used in economics. We encounter them again in Chapter 7.
Corollary 5.9.16 (One-Dimensional Brouwer Fixed-Point Theorem). If
f: [a, b]
--+ [a, b] is continuous, then there exists x E [a, b] such that f(x) = x .
Proof. We assume that f(a) > a and f (b) < b, otherwise f(a) = a or f(b) = b,
and the conclusion follows. Hence, the function g(x) = x - f(x) satisfies g(a) < 0
and g(b) > 0, and is continuous on [a, b]. Thus, by the intermediate value theorem,
there exists c E (a, b) such that g(c) = 0, or in other words, f(c) = c. D
y=x
b
y = f (x)
r-x··
a
a b
Path connectedness
Connectedness is defined in terms of not being able to separate a space, but as
mentioned in the introduction to this section, it may seem intuitive that you should
be able to get from any point in a connected space to any other point by traveling
within the space. This second property is actually stronger than the definition of
connectedness. We call this stronger property path connectedness.
Definition 5.9.19. A subset E of a metric space is path connected if for any
x, y E E there is a continuous map ''l : [O, 1] -+ E such that 1(0) = x and 1(1) = y.
Such a I is called a path from x toy. See Figure 5.4.
Proposition 5.9.20. The interval [O, 1] c IR is connected (in the usual topology).
Proof. Suppose that the interval has a separating pair U UV = [O, l]. Take u E U
and v E V. Without loss of generality, assume u < v. The interval [u, v] is a subset
of [O, 1], so it is contained in U UV. The set A= [u, v] n U = [u, v] n vc is compact,
because it is a closed subset of the compact interval [u, v] . Therefore A contains its
largest element a. Also a < v because v tj. U.
5.10 . Banach-Valued Integration 227
Similarly, the least element b of [a, v] n Vis contained in [a, v] n V . Since b rf. A
we must have a< b, which shows that the interval (a, b) C [u, v] is not empty. But
(a, b) n U = (a, b) n V = 0; hence, not every element of [O, l] is contained in U U V,
a contradiction. D
Proof. Assume that X is path connected but not connected. Hence, there exists
a pair of nonempty disjoint open sets U and V such that X = U UV. Choose
x E U and y E V and a continuous map ry : [O, l ] ---+ X such that ry(O) = x and
ry(l) = y. Thus, the sets ry - 1 (U) and ry- 1 (V) are nonempty, disjoint, and disconnect
the connected set [O, l], which is a contradiction. D
37 The converse to this theorem is false. There are spaces that are connected but not path con-
nected. For an example of such a space, see [Mun75, Ex. 2, Sect. 25].
228 Chapter 5. Metric Space Topology
Nata Bene 5.10.1. The approach we use to define the integral is unusual.
A ide from Dieudonne's books [Bou76, Die60], we know of no textbook that
treats the integral this way. But it has many advantages over the more com-
mon approaches, including being much simpler. We believe it is also a better
preparation for the ideas of Lebesgue integration. The construction of the
integral given here is called the regulated integral, and although it is similar
to the Riemann integral in many ways, it i not identical to it. Nevertheless,
it is straightforward to show that whenever the two con tructions are both
defined, they must agree.
(5.13)
Let S( [a, b]; X) denote the set of all step functions mapping [a, b] into X.
Proposition 5.10.3. The set S([a, b]; X) of step functions is a subspace of the
normed linear space of bounded functions (L 00 ([a, b]; X), II· llv'° ).
Proof. To see that S([a, b]; X) is a subset of L 00 ([a, b]; X), note that a step function
f of the form (5.13) has finite sup norm since l f llL= = supk llxkllx-
It suffices to show that S([a, b]; X) is closed under linear combinations; that
is, given o:,(3 E F and f,g E S([a,b];X), we show that o:f + (3g E S([a,b];X). Let
5.10. Banach-Valued Integration 229
• •
a b
and
f(t) = (~x~:Il. [u;_ 1 ,u;)) +xe:Il. [ue_ ,ue] and g(t) = (~y~:Il.[u;-1, u ;)) +ye:Il. [ue-
1 1 ,ue]'
where each x~ and y~ lies in X. Thus, the sum takes the form
Definition 5.10.4. The integral of a step function f E S( [a, b]; X) of the form
(5.13) is defined to be
N
I(f) = Lxi(ti - ti- 1). (5 .15)
i= l
Proposition 5.10.5. The integral map I : S([a, b]; X) --+ X is a bounded linear
transformation with induced norm lli ll = (b - a).
I: S([a, b]; X) -t X,
with l i l = l l ll = (b - a).
Proof. This follows immediately from the continuous linear extension theorem
(Theorem 5.7.6) by setting S = S([a, b] ; X) and Z = S([a, b]; X), since bounded
linear transformations are always uniformly continuous, by Proposition 5.4.22. D
Definition 5.10.7. For any Banach space X and any function f E S( [a,b];X),
we write
lb f(t) dt
to denote the unique linear extension I(f) of Theorem 5.1 0.6. In other words,
lba
f(t) dt = I(f) = lim l(sn) = lim
n---+oo n---+oo
lb
a
Sn(t) dt,
Theorem 5.10.8. Continuous function s lie in the closure, with respect to the
uniform norm, of the space of step functions
Proof. Any f E C([a, b]; X) is uniformly continuous by Theorem 5.5.9. So, given
c: > 0, t here exists o > 0 such that llf(s) - f(t)ll < c: whenever Is -ti < o. Choose n
sufficiently large so that (b-a)/n < o, and define a step function fs E S([a, b]; X) as
Proof.
(i) This is just a restatement of lllll = (b - a) on the space S([a,,B];X) in
Proposition 5.10.5.
(ii) The function t r-+ II! (t) II is an element of S([a, b]; IR.), so the integral
makes sense. The rest of the proof is Exercise 5.55.
J: llf (t) II dt
(iii) Let (sk)~ 0 be a sequence of step functions in S([a, b]; X) that converges to
f E S([a, b]; X) in the sup norm. Let lab be the integral map on S([a, b]; X) as
given in (5 .15), and let IOl./3 be the corresponding integral map on S([a, ,BJ; X).
Since the product skll[Oi.,/3] vanishes outside of [a,,B], its map for I0/.13, with
restricted domain, has the same value as that of lab· In other words, for all
k EN we have
lb Sk(t)Jl.[0!.,/3] dt = .l/3 Sk(t) dt.
lb
I a
f(t) dtll :::; lb
a
llf(t)ll dt:::; (b - a) sup llf(t)ll·
tE[a,b]
In other words, the result in (ii) is a sharper inequality than that of (i).
Proposition 5.10.14. If f E S([a, b]; R_n) C L'x)([a, b], R.n) is written in coordinates
as f(t) = (f1(t), ... , fn(t)), then for each i we have fi E S([a, b]; IR.) and
Proof. Since I : S([a, b]; X) ---+ X is continuous, if Sn ---+ f, then J(sn) -t J(f) by
Theorem 5.2 .9. Thus, it suffices to prove the proposition for step functions. But
for a step function s( t) = (s1 (t), ... , Sn (t)) the proof is straightforward:
1
a
b
s(t) dt =
n-1
L s(ti)(ti - ti_ 1 )
i=O
n-1
= L(s1(ti), ... ,sn(ti))(ti -ti-1)
i=O
Exercises
Note to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
first and second line are for Section 1, the exercises between the second and third
lines are for Section 2, and so forth.
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *). We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with .&. are especially important and are likely to be used later
in this book and beyond. Those marked with tare harder than average, but should
still be done.
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
5.1. If di and d 2 are two given metrics on X, decide which of the following are
metrics on X, and prove your answers are correct.
(i) di+ d2.
(ii) di - d2.
(iii) min( di, d2)·
(iv) max( di , d2)·
5.2. Let d 2 denote the usual Euclidean metric on JR 2 . Let 0 = (0, 0) be the origin.
The French railway metric dsNCF in JR 2 is given by 38
Explain why French railway might be a good name for this metric. (Hint:
Think of Paris as the origin.) Prove that dsNCF really is a metric on lR 2 .
Describe the open balls B(x, c:) in this metric.
38The official name of the French national railway is Societe N ationale des Chemins de Fer, or
SNCF for short.
234 Chapter 5. Metric Space Topology
5.9. Let (X,d) be a metric space, and let B c X be nonempty. For x EX, define
= y = 0,
f(x, y) = {o x2 - Y2
if x
otherwise.
Jx2 + y2
Prove that f is continuous at (0, 0) (using the Euclidean metric d(v, w) =
llv - wlb in JR 2 , and the usual metric d(a, b) = la - bl in JR).
5.12. Let
if x = y = 0,
f(x, y) = {o
2x 2y
otherwise.
x4 + y2
Define <P(t) = (t, at) and 'lf;(t) = (t, t 2 ). Show that
(i) limt-ro f(<P(t)) = O;
(ii) limt-ro f('lf;(t)) = 1.
What does this say about the continuity of f? Explain your results.
Exercises 235
5.13. Prove the claim made in Example 5.2.14 that {(x,y) E IR 2 I x 2 + y 2 :::; 1} is
the set of all limit points of the ball B (0, 1) C IR 2 .
5.14. Prove that for each integer n > 0, the set «:Jr is dense in !Rn in the usual
(Euclidean) metric.
5.21.t .& Prove that unbounded linear transformations are not continuous at the
origin (see Definition 3.5.10 for the definition of a bounded linear transforma-
tion). Use this to prove that they are not continuous anywhere. Hint: If Tis
unbounded, construct a sequence of unit vectors (x k)k°=o' where llT(x k) 11 > k
for each k EN. Then modify the sequence so that Theorem 5.3.19 applies.
5.22. Let (X, d) be a (not necessarily complete) metric space and assume that
(x k)k°=o and (Yk)k°=o are Cauchy sequences. Prove that (d(xk, Yk))k°=o
converges.
5.23. Which of the following functions are uniformly continuous on the interval
(0, 1) C IR, and which are uniformly continuous on (0, oo )?
(i) x 3 .
(ii) sin(x)/x.
(iii) x log(x).
Prove that your answer to (i) is correct.
5.24 . .& Prove Proposition 5.4.22. Hint: Prove that a bounded linear transforma-
tion is Lipschitz.
5.25 . .& Prove Lemma 5.4.26.
5.26. Let B C !Rn be a bounded set.
(i) Let f : B --+ IR be uniformly continuous. Show that the set f (B) is
bounded.
236 Chapter 5. Metric Space Topology
(ii) Give an example to show that this does not necessarily follow if f is
merely continuous on B .
5.27. Let (X, d) be a metric space such that d(x, y) < 1 for all x, y E X , and let
f : X -t IR be uniformly continuous. Does it follow that f must be bounded
(that is, that there is an M such that f(x) < M for all x EX)? Justify your
answer with either a proof or a counterexample.
5.28.* Prove Proposition 5.4.31.
5.29. Let X = (0, 1) x (0, 1) c IR 2 , and let d(x, y) be the usual Euclidean metric.
Give an example of an open cover of X that has no finite subcover.
5.30. Let K c !Rn be a compact set and f : K -> !Rn be injective and continuous.
Prove that 1- 1 is continuous on f (K).
5.31. If Uc !Rn is open and KC U is compact, prove that there is a compact set
D such that Kc D 0 and DC U .
5.32. A function f : IR -t IR is called periodic if there exists a number T > 0 such
that f(x+T) = f(x) for all x E IR. Show that a continuous periodic function
is bounded and uniformly continuous on IR.
5.33 . .&. For any metric space (X, p) with CCX and D C X nonempty subsets,
define
d(C, D) = inf{p(c,d) I c E C,d ED}.
5.41. Show that the sum L::=l (- 1)n /n converges, and find a rearrangement of the
terms that diverges. Use this to construct an example of a series in M2(IR)
that converges, but for which a rearrangement of the terms causes the series
to diverge.
5.46. Let (X, d) be a metric space. Prove: If the function f : [O, oo) -+ [O, oo) is
strictly increasing and satisfies f (0) = 0 and f (a + b) :::; f (a) + f (b) for all
a,b E [O,oo), then p(x,y) := f(d(x,y)) is a metric on X. Moreover, if f is
continuous at 0, then (X, p) is topologically equivalent to (X, d).
5.47. Prove that the discrete metric on Fn is not topologically equivalent to the
Euclidean metric. Use this to prove that the discrete metric on Fn is not
induced by any norm.
5.48. Prove the claim in the proof of Theorem 5.8.7 that, given any norm 11 · 11
on a vector space X and any isomorphism f : X -+ Y of vector spaces, the
function 11 · ll t on Y defined by ll Yll t = ll J- 1 (y)ll is a norm on Y.
5.49. Let X be a set, and let .4 be the set of all metrics on X. Prove that
topological equivalence is an equivalence relation on .4.
5.50. Let II · Ila and II · ll b be equivalent norms on a vector space X , and assume
that there exists 0 < c < 1 such that Bb(O, c) C Ba(O, 1). Prove that
llxlla :S ll x ll b
c
for all x E Ba(O, 1). Hint: If there exists x E Ba(O, 1) such that ll x lla >
ll x llb/c, then choose a scalar a so that y = ax satisfies llYllb/c < 1 < ll Yll a·
Use this to get a contradiction to the assumption Bb(O,c) C Ba(O, 1).
238 Chapter 5 Metric Space Topology
5.55 . Prove Proposition 5.10.12(ii). Hint : First prove the result for step functions,
then for the general case.
5.56. Prove Proposition 5.10.12(iv).
5.57. Prove Proposition 5.10.12(v). Hint: Use the c-6 definition of continuity.
5.58. Define a function f(x): [-1, 1] -t IR by
1 x = 0,
f(:c) = '
{ 0, xi= 0.
It can be shown that this function is Riemann integrable and has Riemann
integral equal to 0. Prove that f(x) is not in the space S([-1, 1]; IR), so the
integral of Definition 5.10. 7 is not defined for this function.
5.59. You cannot always assume that interchanging limits will give the expected
results. For example, Exercise 5.45 shows that differentiation and infinite
summation do not always commute. In this exercise you show a special case
where integration and infinite summation do commute.
(iii) Let d C C([a, b]; IR) be the set of absolutely convergent power series on
[a, b]. Prove that dis a vector subspace of C([a, b]; IR).
(iv) Prove that the series L~o ck(bk+l -ak+ 1 )/(k+1) is absolutely conver-
gent if the series L ~o CkXk is absolutely convergent on [a, b].
(vi) Use the continuous linear extension theorem (Theorem 5.7.6) to prove
that I = Ton d. In other words, prove that if f(x) = L::%°=ockxk Ed,
then
I(f) = l b
f(x) dx = t;
<Xl
Notes
Some sources for topology include the texts [Mun75, Morl 7]. For a brief compari-
son of the regulated, Riemann, and Lebesgue integrals (for real-valued functions),
see (Ber79] .
Differentiation
In the fall of 1972 President Nixon announced that the rate of increase of inflation
was decreasing. This was the first time a sitting president used the third derivative
to advance his case for re-election.
-Hugo Rossi
241
242 Chapter 6. Different iation
The limit is called the derivative off at x and is denoted by f'(x) . If f(x) is
differentiable at every point in (a, b), we say that f is differentiable on (a, b).
Remark 6.1.2. The derivative at a point xo is important because it defines the best
possible linear approximation to f near xo. Specifically, the linear transformation
L : JR ---+ JR given by L(h) = f'(x 0 )h provides the best linear approximation of
f(xo + h) - f(x 0 ) . For every c; > 0 there is a neighborhood B(O, 8), such that
the curve y = f(x 0 + h) - f(x 0 ) lies between the linear functions L(h) - c:h and
L(h) + c:h, whenever his in B(O, 8) .
We can easily generalize this definition to a parametrized curve.
Definition 6.1.3. A curve "( : (a, b) ---+ ]Rn is differentiable at to E (a, b) if the
following limit exists:
. 1(to+h)-1(to)
1lm . (6.2)
h-+0 h
Here the limit is taken with respect to the usual metrics on JR and JRn (see
Definition 5.2.8). If it exists, this limit is called the derivative of 1(t) at to and
is denoted 1 ' (to) . If "! is differentiable at every point of (a, b) , we say that "f is
differentiable on (a, b) .
Proposition 6.1.5. A curve "! : (a, b) ---+ ]Rn written in standard coordinates as
1(t) = ["11 ( t) "fn (t)] T is differentiable at to if and only if "ti (t) is differentiable
at to for every i . In this case, we ha·ve
Proof. All norms are topologically equivalent in JRn, so we can use any norm to
compute the limit. We use the oo-norm, which is convenient for this particular
proof.
If the derivatives of each coordinate exist, then for any c; > 0 and for each i
there exists a oi such that whenever 0 < !hi < oi we have
This shows that 1' (to) exists and is equal to [rWo) I~ (to)] T.
Conversely, if 1' (to) = [Y1 Yn] T exists, then for any i and for any c; >0
there exists a o> 0 such that if 0 < lhl < o, then
ri(to + h) _
h
,i(to) _ ·I< ll'(to +
Yi -
h) - 1(to) _ [
h Y1
l
1(t) + 1'(t)
/ 1' (t )
Figure 6.1. The derivative 1' (t) of a parametrized curve r : [a, b] --+ JRn
points in the direction of the line tangent to the curve at 1(t) . Note that the tangent
vector1'(t) itself {see Definition 6.1. 7) is a vector {blue) based at the origin, whereas
the line segment (red) from 1(t) to 1(t) +1'(t) is what is often informally called the
''tangent" to the curve.
Definition 6.1. 7. Given a differentiable curve I : (a, b) --+ JRn, the tangent vector
to the curve r at to is defined to be 1' (to). See Figure 6.1 for an illustration.
Example 6.1.8. The curve 1(t) = [cost sin t] T traces out a circle of radius
one, centered at the origin. The tangent curve is 1' (t) = [-sin t cost] T . The
acceleration vector is 1" (t) = [- cost - sin t] T and satisfies 1" (t) = -1( t).
Using the product and chain rules for single-variable calculus, we can easily
derive the following rules.
Proposition 6.1.9. If the maps f, g : JR --+ ]Rn and <p : JR --+ JR are differentiable
and (-, ·) is the standard inner product on ]Rn, then
244 Chapter 6. Differentiation
(i) (f+g)'=f'+g' ;
(ii) (t.p !)' = + t.p f';
t.p' f
Proof. The proof follows easily from Proposition 6.1.5 and the standard differen-
tiation rules from single-variable calculus. See Exercise 6.2. D
Definition 6.1.10. Let f : JRn -+ JRm. Given x, v E JRn , the directional derivative
off at x with respect to v is the limit
Remark 6 .1.11. In the next section we prove Theorem 6.2 .15, which says that
for any fixed x, the function ¢(v) = Dvf(x) is a linear transformation in v, so
Dv 1 +v 2 f(x) = Dv,f(x) + Dv 2 f( x). This is an important property of directional
derivatives.
Thus, the partial derivatives are zero despite the function's failure to be con-
tinuous there.
Remark 6.1.15. In the previous definition the ith coordinate is the only one that
varies in the limit. Thus, for real-valued functions (that is, when m = 1), we can
think of this as a single-variable derivative of a scalar function with the only variable
being the ith coordinate; the other coordinates can be treated as constants. For
vector-valued functions (when m > 1), we may compute Dd using the usual rules
for differentiating if we treat Xj as a constant for j =/:- i.
f (x + h) - f (x) --
12
~
ol~ h
[
lim
h~O llhll
39
Depending on the students' background in multivariable calculus and the amount of class time
available, the instructor may wish t o skip directly to the general Frechet derivative in Section 6.3.
6.2. The Frechet Derivative in JR.n 247
It turns out that any norm gives the same answer for the limit, since all
norms are topologically equivalent on JR.n. Remark 6.3.2 gives more details
about different norms and their effect on the derivative.
-~] ,
2y
3x 2
Df(x)v =
[6
is linear in v. But
Df(ax) =
[
3a2 x2
ag
2ay
Ol
ax i- a [3xy
0
2
~i = aDf(x),
2y
Division by h makes sense because h is a scalar. But this limit is zero if and
only if
r 1(x+h)-1(x) '( ) - 0
h~ h -1 x - ,
which holds if and only if
Remark 6.2.9. The total derivative D f (x o), if it exists, defines a linear function
that approximates f in a neighborhood of xo. More precisely, if L(h) = D f (xo)h,
then for any c: > 0 there is a 5 > 0 such that the function f (x 0 + h)- f (x 0 ) is within
c: ll h ll of L(h) (that is, llf(xo + h) - f(xo) - L(h) ll :::; c:llhll) whenever h E B(O, 5).
Proof. Let L1 and L2 be two linear transformations satisfying (6.3). For any
nonzero v E JRn, as t ---+ 0 we have that
ll L1v - L2v ll I (f(x + tv) - J(x) - L2tv) - (f(x + tv) - f(x) - L1tv) 11
ll v ll lt lll v ll
Theorem 6.2.11. Let U C IRn be an open set, and let f : U ---+ JRm be given by
f = (!1, ... , fm)· If f is differentiable on U, then the partial derivatives Dkfi(x)
exist for each x E U, and the matrix representation of the linear map D f (x) in the
standard basis is the Jacobian matrix
_ .
1
I ! (x + h) - f (x) - D f (x)hll
O - h~ llhll
= lim II! (x + he1) f (x) - hD f (x)e1 II
-
h~O lhlll e1 ll
= lim llf(x1, · · ·, Xj + h, · · ·, Xn) - f(x1, ... , Xn) - hJj ll.
h~ O lhl
Thus, each component also goes to zero as h ---+ 0, and so
which implies
Remark 6.2.12. We often denote the Jacobian matrix as D f(x), even though the
Jacobian is really only the matrix representation of D f (x) in the standard basis.
250 Chapter 6. Differentiation
T heorem 6 .2.14. Let U C JRn be an open set, and let f : U -t JRm be given by
f = (Ji, . .. , f m) . If each Djfi(x) exists and is continuous on U, then D f (x) exists
and is given in standard coordinates by (6.4).
Proof. Let x E U, and let J(x) be the linear operator with matrix representation
in the standard basis given by (6.4). To show that J(x) is D f (x), it suffices to
show that (6.3) holds. Since all norms are topologically equivalent, we may use the
oo-norm. Therefore, it suffices to show that for every c: > 0 there is a o > 0 such
that for every i we have
whenever 0 < fix - yff < o. Here [J(x)(x - y)]i denotes the ith entry of the vector
J(x)(x -y) .
For any o > 0 such that B (x, o) c U, consider y E B (x, o) with y =f. x .
Note that
Continuing in this manner, we have f.i,j E [x 1, y1] (or in [y1, x 1] if y1 < x 1) such that
fi(Y) - fi(x) = Difi(~i,1, Y2, · · ·, Yn)(Y1 - X1) + D2fi(x1, ~i,2, y3, . . . , Yn)(Y2 - x2)
+ · · · + Dnfi(X1, X2, · · ·, Xn-1, ~i,n)(Yn - Xn)
n
L D1fi(x1,. . ., Xn)(Yj - Xj),
j=l
6.2. The Frechet Derivative in JRn 251
Since each Djfi(x) term is continuous, we can choose 6 small enough that
whenever 0 < ll x - Yll < 6. Hence, J(x) satisfies the definition of the derivative,
and D f (x) = J (x) exists. D
The following theorem shows that the total derivative may be used to compute
directional derivatives.
But ll tv ll < 6 if and only if lt l < 6ll vll- 1 ; so when ltl < 6ll v ll - 1, we know a(t) <
cl lv ll· Thus, limt-+o a(t) = 0 and (6.5) holds. D
__.!_] 1
D(' ')f(x,y)=[y 2 +3x 2 y
72'72
2xy+x 3 ]
[~
V2
= M(y 2 +3x 2 y+2xy+x3 ),
v2
Remark 6.3.2. Topologically equivalent norms (on either X or Y) give the same
derivative. That is, if II-Ila and 11 · llb are two topologically equivalent norms on X and
if II · llr and II · lls are two topologically equivalent norms on Y, then Theorem 5.8.4
guarantees that there is an M and an m such that
ll f(x + h) - f(x) - D f(x)hllr ~ Mllf(x + h) - f(x) - D f(x)hlls
and llhlla :'.'.'. mllhllb · Thus, we have
O < llf (x + h) - f (x) - D f(x)hllr < Mllf(x + h) - f(x) - D f(x)hlls
- llhlla mllhllb
So if f has derivative D f (x) at x with respect to the norms II · lls and II - lib, it must
also have the same derivative with respect to II · llr and II - Ila -
6.3. The General Frechet Derivative 253
L(f) = 1 1
tf (t) dt
is linear inf, so by Example 6.3.3 we have DL(f)(g) = L(g) for every g EX.
Example 6.3.5. If X = C([O, l] ; JR) with the £=-norm, then the function
Q: X-+ JR given by
Q(f) = 1 1
tf 3 (t) dt
is not linear. We show that for each f E X the derivative DQ(f) is the linear
1
transformation B : X -+ JR defined by B(g) = 0 3tf 2 (t)g(t) dt. To see this , J
compute
= lim 11 1
t((f + h) 3 (t) -
3 2
f (t) - 3f (t)h(t)) dtl
h -+0 [[h[[L 00
11
1
2 3
= lim t(3f(t)h (t) +h (t)) dtl
h-+0 [[h[[L 00
< lim
[[h[[Eoo 1l 0
[t(3f(t) + h(t))[ dt
= 0.
- h-+0 [[h[[L 00
Proof. Leth= x-x 0 and c = 1. Choose o> 0 so that whenever 0 < llx - xo[lx < o,
we have
llf(xo + h) - f(xo) - D f(xo)hl[Y < 1,
I -- xo IIx
Ix
or, alternatively,
gives
where the case of x = Xo gives equality. Thus, in the ball B( xo , o) the function f
is locally Lipschitz at xo with constant L = llDf(xo)llx,Y + 1. D
Proof. Let u be a unit vector. For any c > 0 choose osuch that
llf (x + h) - f (x) - D f (x)hllY
6
llhllx <
40 T his
should not be confused with the property of being locally Lipschitz on U, which is a stronger
condition requiring that llf(x) - f(z) llY :::; Lllx - zllx for every x , z E B(xo, 8).
6.3. The General Frechet Derivative 255
DJ( ) II = ll D f (x)h ll Y
ll x u Y ll h ll x
< llf(x + h) - f (x) - D f(x)hllY + llf(x + h) - f(x) llY
- ll h ll x
II! (x + h) - f (x) - D f (x)hllY Lllhllx
::::; ll h ll x + llh ll x
::::; c+L.
Since c was arbitrary, we have the desired result. D
Proof. The proof is similar to that for Proposition 6.1.5 and is Exercise 6.15. D
Definition 6.3.12. Let ((Xi, II · lli ))~ 1 be a collection of Banach spaces. Let
f : X 1 x X2 x · · · x Xn -t Y , where (Y, II · llY) is a Banach space. The ith partial
derivative at (x 1, x2, . .. , Xn) E X 1 x X2 x · · · x Xn is the derivative of the function
g: Xi -t Y defined by g(xi) = f(x1, . . .,Xi-1,Xi,Xi+l,. · .,xn) and is denoted
Dd(x1, X2 , .. . , Xn) ·
Example 6.3.13. In the special case that each X i is JR and Y = JRm, the
definition of partial derivative in Definition 6.3.12 is the same as we gave
before in Definition 6.1.13.
Conversely, if all the partial derivatives of f exist and are continuous on the set
UC X 1 x X 2 x · · · x Xn, then f is continuously differentiable on U.
Proof. When~= 0 the relation (6.9) is automatically true. When ~ of. 0, dividing
by ll ~ llx shows that this is equivalent to t he limit in the definition of the derivative,
except that this inequality is not strict, where the definition of limit has a strict
o
inequality. But this is remedied by choosing a such that (6.9) holds with c:/2
instead of t:. D
6.4.1 Linearity
We have already seen that the derivative D f (x)v is not linear in x , but by definition
it is linear in v . The following theorem shows that it is also linear in f.
Proof. Choose t: > 0. Since f and g are differentiable at x , there exists o> 0 such
that B(x, o) c U and such that whenever ll ~llx <owe have
6.4. Properties of Derivatives 257
and
Thus,
Remark 6.4.3. Among other things, the previous proposition tells us that the set
C 1 (U; Y) of continuously differentiable functions on U is a vector space.
Choose c > 0. Since f and g are differentiable at x , they are locally Lipschitz
at x (see Proposition 6.3.7), so there exists B(x, 8x) C U and a constant L > 0
such that
lf(x + ~ ) - f(x)I :=:: Lll~ll, (6.11)
c ll ~ ll
lf(x + ~) - f(x) - D f(x)~ I :=:: 3(ll g(x) ll + l) , (6.12)
and
cll~ll
lg(x + ~) - g(x) - Dg(x)~ I ::; 3( ll f(x) ll + L )' (6.13)
whenever 11 ~ 11 < 8x. If we set 8 =min{ 8x, 3 L(ll Dg(x) ll +l), 1}, then whenever 11 ~ 1 1 < 8,
we have that
258 Chapter 6. Differentiation
(iii) Let w(x) = (w 1(x), . .. , wm(x)) T be a differentiable function from !Rn to !Rm,
and let
bn(x) b12(x) ... b1m(x)l
b21 (x) b22(x) ... b2m(x)
B(x ) = .
.
. .
[ . .. ..
bki(x) bk2(x) bkm(x)
be a differentiable function from !Rn to Mkm(lF). If H : !Rn --+ JRk is given by
H(x) = Bw, then
w T (x)DbT (x)l
DH(x) = B(x)Dw(x) + ; ,
[
w T(x)Dbk (x)
where bi is the i th row of B.
Theorem 6.4. 7 (Chain Rule). Assume that (X, II ·II x), (Y, II ·II y), and (Z , I ·II z)
are Banach spaces, that U and V are open neighborhoods of X and Y, respectively,
and that f: U---+ Y and g: V---+ Z with f(U) CV. If f is differentiable on U and
g is differentiable on V, then the composite map h = g o f is also differentiable on
U and Dh(x) = Dg(f(x))Df(x) for each x EU.
< cl l ~llx
iif(x + ~) - f(x) - Df(x)~i iY _ 2( il Dg(y) iiY,Z + l)
and
llf(x+~)-f(x) llY :S L ll ~ llx·
Since g is differentiable at y, there exists B(y,by) CV such that whenever ll 77 llY <
by we have
llg(y + 77) - g(y) - Dg(y)77 llz :S cl~Y ·
Note that
where 77(~) := f(x + ~) - f(x). Thus, whenever ll ~ llx < min{ bx, by/ L}, we have
that ll77( ~) 11 Y :S Lll~llx <by. It follows that
ax ax]
D f (x, y) = [ ~; ~~]
Dg(p, q) = [~~op ~~aq and ,
260 Chapter 6. Differentiation
ah
[ap ah]
aq -- [£1ax £1]
ay
ax
ap
[ !21!.
ap
~~i
[}:,'J.
aq
-
-
[af
ax ax+
ap £1!21!.
ay ap
qj_ax
ax aq
+ af!2Ji.]
ay EJq .
In other words, the directional derivative Dvf(x) is the product of the deriva-
tive Df(x ) and the tangent vector v. This is an alternative proof of
Theorem 6.2.15.
and bin the set Uthe line segment £(a, b) is also contained in U, then we say that
U is convex. See Figure 6.2 for an illustration. Convexity is a property that is used
widely in applications. We treat convexity in much more depth in Volume 2.
Figure 6.2. In this figure the set V is not convex because the line segment
£(a, b) between a and b does not lie inside of V. But the set U is convex because
for every pair of points a, b E U the line segment £( a, b) between a and b lies in U;
see Remark 6. 5. 2.
Proof. Let a., (3 E (a, b) with a.< (3 . Given c > 0 and t E (a., (3), there exists bt > 0
such that llf(t + h) - f(t)llx :s; clhl whenever lhl < bt. Since [a., (3] is compact, we
can cover it with a finite set of overlapping intervals {(ti - 8i,ti + 8i)}~ 1 , where,
without loss of generality, we can assume a. < ti < t2 < · · · < tn < (3. Choose
points Xo, x 1 , .. . , Xn so that
i=l
= c({3 - a.).
Since c > O is arbitrary, as are a., (3 E (a, b), it follows that f is constant on (a, b).
Since f is continuous on [a, b], it must also be constant on [a, b]. D
262 Chapter 6. Differentiation
d
dt la
rt f(s) ds = f(t) . (6.15)
Proof.
(i) Let c; > 0 be given. There exists o> 0 such that llf(t + h) - f(t)llx < c
whenever lhl < o. Thus,
Corollary 6.5 .5 (Integral Mean Value Theorem). Let f E C 1 (U; Y). If the
line segment £( x *, x) = {(1 - t )x* + tx I t E [O, 1]} is contained in U, then
1
f(x) - f(x*) = fo D f(tx + (1 - t)x*)(x - x*) dt.
Moreover, we have
i
d lg(d)
f(g(s))g'(s) ds = f(T) dT. (6.19)
c g(c)
lim
n ---+ oo
lb
a
f n dt = lb (
a
lim f n) dt.
n---+oo
l
1
-
- -
l/n--n ·
11 dx f llJllL 00 - JJgn llL 00
Nevertheless, we can prove the following important result about uniform con-
vergence of derivatives in a finite-dimensional space.
Theorem 6 .5.11. Let X be finite -dimensional Banach space. Fix an open ball
U = Bx(x*, r) c X with a sequence Un)r:'=o c C 1 (U; Y) such that Un(x*))r:::'=o c
Y converges. If (D f n)r:'=o converges uniformly on compact subsets to g E C(U;
88(X; Y)), then the sequence Un)r:'=o converges uniformly on compact subsets to a
function f E C 1 (U; Y), and g =DJ .
The idea of the proof is fairly simple-define the limit function using t he
integral mean value theorem, and then use the fact that uniform limits commute
with integration. The result can also be extended to any path-connected U .
Because fn(x*) -+ z, there is an N > 0 such that llfn(x*) - zllY < c/2 if n 2: N,
and since D f n -+ g uniformly on K , there is an M > N such that
c
sup ll Dfn(c) - g(c)llx,Y < -
cEK 2r
whenever n > M. Combined with the fact that llhllx < r, Equation (6.21) gives
c c c c
llfn(x) - f(x) llY < 2 + r llhllx < 2 + r r = c
2 2
whenever n 2: M. Since the choice of N and M was independent of x, we have
llfn - f ll L= :::; con K. Therefore, fn-+ f uniformly on all closed balls B(x*, p) C U ,
and therefore on all compact subsets of U.
Finally, we must show that D f(x) = g(x) for any x in U. For any c > 0 we
must find a osuch that llf (x + h) - f(x) - g(x)hllY :::; cllhllx , whenever llhllx < o.
By the integral mean value theorem, we have
fn(x + h) - fn(x) = 1 1
Dfn(x + th)hdt.
1
llf(x + h) - f(x) - g(x)hl lY = 111 g(x + th)hdt - g(x)ht
= 111
1
(g(x +th) - g(x)) h dtt.
Continuity of g on the compact set B(x, a) implies g is uniformly continuous there,
o o
and hence there exists a with 0 < < a such that llg(x +th) - g(x)l lx,Y < c
whenever llhllx < o. Therefore,
llf(x + h) - f(x) - g(x)h llY :::; c ll h ll x ,
as required. D
Example 6.6.3. .&.If f : !Rn --+ IR is differentiable, then for each x E !Rn we
have Df(x) E @(!Rn;IR) . By the Riesz representation theorem
(Theorem 3.7.1), we have @(!Rn; IR) ~ !Rn, where a vector u E !Rn corre-
sponds to the function v f--t (u, v) of @(!Rn; IR). In the standard basis on !Rn,
it is convenient to write an element of @(!Rn; IR) as a row vector u T, so that
the corresponding linear transformation is just given by matrix multiplication
(to indicate this we write @ 1 (1Rn; IR)~ (!Rn)T). This is what is meant when
Theorem 6.2 .11 says that (in the standard basis) Df(x) can be written as a
row vector D f (x) = [Dif (x) · · · Dnf(x)].
If D f : !Rn --+ @(!Rn; IR) 9:~ (!Rn) T is also differentiable at each x E
!Rn, then D 2 f(x) E @(!Rn;@ 1 (1Rn; IR)) ~ @(!Rn; (!Rn)T). Theorem 6.2.11
still applies, but since D f (x) is a row vector, the second derivative in the u
direction D 2 f(x)(u) E @1 (1Rn;IR) is also a row vector. In the standard basis
we have D 2 f(x)(u) = uTHT E @1 (1Rn;IR) ~ (!Rn)T and D 2 f(x)(u)(v) =
u T HT v, where
H = (D [Dif(x)
[
D1Dif(x)
D1D2f(x)
D1Dnf(x)
Dnf(x)] T)
DnDif(x)l
DnD2f(x)
DnDnf(x)
2
B":C .
8 f
ax.,.,ax.,.,
l
The matrix His called the Hessian off. The next proposition shows that H
is symmetric.
If Y = lF and Xi = lF for every i, then U C lFn, and in this case we often write
Di 1 Di 2 • · • Dik f as
Proposition 6.6.5. Let f E C 2 (U; Y), where Y is finite dimensional. For any
x E U and any v, w in X, we have
D 2 f(x)(v, w) = D 2 f(x)(w, v). (6.22)
If X = X1 x · · · x Xn , then this says
(6.23)
for all i and j. In the case that X = lF x · · · x lF = wn and Y = lFm with f : X --+ Y
given by f = (/1 , ... , fn), then this is equivalent to
0 2 fk 0 2 fk
(6.24)
Proof. Since Y is finite dimensional, we may assume that Y ~ lFm and that
f = (/1, ... , fm)· The usual norm on cm is the same as the usual norm on JR 2m,
via the standard map (x1 + iy1, ... , Xm + iym) M (x1, Y1, ... , Xm, Ym)i therefore,
we may assume that lF = JR. Moreover, it suffices to prove the theorem for each fk
individually, so we may assume that Y = JR 1.
For each x E U let 9t(x) = f(x+tv) - f(x), and let Ss,t(x) = 9t(x+sw) - gt(x).
By the single-variable mean value theorem, there exists a <Ts,t E (0 , s) such that
Ss,t(x) = Dgt(x + <T 8 ,tw)(sw).
But we have
Dgt(x + <T 8 ,tw)(sw) = D f(x + <T8 ,tW + tv)(sw) - D f(x + <T 8 ,tw)(sw),
so we may apply the mean value theorem to get Ts,t E (0, t) such that
Ss ,t(x) = D 2 f(x + <T 8 ,tW + Ts,tv)(sw, tv).
Swapping the roles of tv and sw in the previous argument gives T~ ,t E (0, t) and
<T~ ,t E (0, s) such that
Remark 6.6. 7. Proposition 6.6.5 shows that (6.25) has repeated terms. If we
combine these, we can reexpress (6.2~i) as
kl . . . . .2 .
k
Dvf = L ·
. vJ 1 vJ 2
1·1 .. . - 1 1 2
. +·J2 +··· +·Jn= k]l -)2·
Jl Jn·
• • • vJ,n .,_ DJ'
1
DJ2 • • • DJ,.,_f
n' (6.26)
where the sum is taken over all nonnegative integers j 1 , . .. ,Jn summing to k.
Pi
20
p4 Ps
20
15
10
- 3 - 2 -1 0 1 2 3 - 4 -3 - 2 - 1 0 1 2 3 -4 -3 - 2 - 1 0 1 2 3
Figure 6.3. Plots of the Taylor polynomials Po, ... ,p5 for f(x) = ex. The
function f is plotted in red, and the Taylor polynomials are plotted in blue. Each
polynomial is a good approximation off in a small neighborhood of x = 0, but
farther away from 0 the approximation is generally poor. As the degree increases,
the neighborhood where the approximation is good increases in size, and the quality
of the approximation improves. See Theorem 6. 6. 8 for the one-dimensional Taylor
theorem over~ and Theorem 6.6.9 for the general case.
Theorem 6.6.9. Let f E Ck(U; Y). If x E U and h E X are such that the line
segment e(x, x + h) is contained in U, then
D2 f(x)h(2) Dk-1 f(x)h(k-1)
f(x + h) = f(x) + D f(x)h + ! + .. · + (k _ l)! + Rk, (6.28)
2
where the remainder is given by
where
R
k-1
= ( {1 (1 - t)k-2 Dk-1f(x +th)
lo (k - 2)!
dt) h(k-1).
Note that
- vk-1 f (x)h(k-1)
Rk-1 - (k _ l)! +
( fl . k
D f(x +uh)
(
lo
r1 (1(k- _t)k-2
lu )! dt
)
du
)
h
(k)
2
- vk-1 f (x)h(k-1) ( fl _(l - u)k-1 k ) (k)
- (k _ l)! + lo (k _ l)! D f(x +uh) du h .
Thus, (6.28) holds, and the proof is complete. D
Remark 6.6.10. For each k ~ 0, neglecting the remainder term gives the degree-k
Taylor polynomial approximation of f. If f is smooth and the remainder Rk can
be shown to go to zero as k ~ oo, then the Taylor series converges to f.
where f (0) = 1,
It follows that
Corollary 6.6.14. If II Dk f (x +th) II < M for all t E [O, 1], then the remainder
Rk is bounded by
272 Chapter 6. Differentiation
(6.31)
Exercises
Note to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
firs t and second line are for Section 1, the exercises between the second and third
lines are for Section 2, and so forth.
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *). We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with .& are especially important and are likely to be used later
in this book and beyond. Those marked with tare harder than average, but should
still be done.
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
x3y
(x , y) -/:- (0, 0),
f(x,y) = x6+y2'
{ 0, (x, y) = (0, 0).
Show that the partial derivatives of f exist at (0, 0) but are discontinuous
there. Hint : To show the discontinuity, consider a sequence of the form
(a/n, b/n 3 ) .
6. 2. Prove Proposition 6.1.9.
6.3. Let A c IR 2 be an open, path-connected set, and let f : A -+ IR. If for each
a E A we have Dif(a) = D2f(a) = 0, prove that f is constant on A. Hint:
Consider first the case where A= B(x, r) is an open ball.
6.4. Define f: IR 2 -+ IR by f(x,y) = xyex+y .
(i) Find the directional derivative off in the direction u = (~, t) at the
point (1, -1) .
(ii) Find the direction in which f is increasing the fastest at (1, -1).
2
6.5.t Let A c IR be an open set and f : A-+ IR. If t he partial derivatives off
exist and are bounded on A, then prove that f is continuous on A.
Exercises 273
xy x2 + =I- 0
f (x, y) = x2 + y , y ,
{ 0, x 2 +y = 0.
Show that the partial derivatives exist at (0, 0) but that f is discontinuous
at (0, 0) , so f cannot be differentiable there.
6.8. Let f : JR 2 -+ JR be defined by
x[y[ (x ) -L (0 0)
f (x,y ) = J x 2 + y2' 'Y r ' '
{ 0, (x,y) = (0,0) .
Show that f has a directional derivative in every direction at (0, 0) but is not
differentiable there.
6.9. Let f : JR 2 -+ JR be defined by
f(x, y) = { (x2 + y2) sin ( Jx2l+ y2) , (x, y) =f. (0, 0),
0, (x, y) = (0, 0).
Show that f is differentiable at (0, 0). Then show that the partial derivatives
are bounded near (0, 0) but are discontinuous there.
6.10. Use the definition of the derivative to prove the claims made in Remark 6.2.6:
(i) Using row vectors to represent (lRm)*, if A E Mnxm(lR) defines a func-
tion g : ]Rn -+ (JRm)* given by g(x) = x TA , then Dg : ]Rn -+ (JRm)* is
the linear transformation v e-+ v TA.
(ii) Using column vectors to represent (JRm)*, prove that the function g is
given by g(x) = AT x and that Dg is the linear transformation v e-+ AT v .
These are both special cases of Example 6.3.3, which says that for any linear
transformation L: U-+ Y, the derivative DL(x) is equal to L for every x.
6.12. Let X = C([O, 1]; JR) with the sup norm, and let the function T : X -+ JR
be given by T(f) = f0 t 2 f2(t) dt. Let L : X -+ JR be the function L(g) =
1
of the Frechet derivative to prove that DT(f) (g) = L(g) for all f E X.
6.13. Let X be a Banach space and A E @(X) be a bounded linear operator on X .
Let At : JR-+ @(X) be scalar multiplication of A by t, and let E : JR-+ @(X)
be the exponential of At, given by E(t) = eAt (see Example 5.7.2). Use the
definition of the Frechet derivative to prove that
DE(t)(s) = AeAts
for every t, s ER You may assume without proof that E(t + s) = E(t)E(s).
6.14. Prove Corollary 6.3.8.
6.15. Prove Proposition 6.3.11.
6.16. &.
Prove Proposition 6.4.6.
6.17. Let 1(t) be a differentiable curve in JRn. If there is some differentiable function
F: ]Rn-+ JR with F('Y(t)) = C constant, show that DF("f(t))T is orthogonal
to the tangent vector 1' (t).
6.18. Define f : JR 2 -+ JR 2 and g : JR 2 -t JR 2 by
f (x) = II Ax - bll§,
Conclude by Lemma 6.4.l that the Frechet derivative of Sis given by (6.32).
6.22 . Prove Corollary 6.5.5. Hint: Consider the function g(t) = f(ty + (1 - t)x).
6.23. Prove Corollary 6.5.6. Hint: Consider the function F(t) = J;(c) j(T) dT.
(i) First prove the result assuming the chain rule holds even at points where
g(s) = c or g(s) = d.
(ii)* Prove that the chain rule holds for the extension of D(Fog) to points
where g(s) = c or g(s) = d.
6.24. Prove the claims in Remark 6.5.8: Given an open subset U of a finite-
dimensional Banach space and a sequence Un)':=o in C(U; Y), prove the
following:
(i) The sequence Un)':=o is uniformly convergent on compact subsets in U
if and only if the restriction of Un)':=o to the ball B(x, r) is uniformly
convergent for every closed ball B(x, r) in U.
(ii) If U = B(xo, R), then prove that Un)':=o is uniformly convergent on
compact subsets if and only if for every 0 < r < R the restriction of
Un)':=o to the closed ball B(x0 , r) centered at xo converges uniformly.
Hint: For any compact subset K, show that d(K, uc) > 0, and use this
fact to construct a closed ball B(x0 , r) c U containing K.
(iii)* The sequence Un)':=o is Cauchy on U (see Definition 6.5.7) if and only
ifthe restriction of Un)':=o to the ball B(x, r) is Cauchy for every closed
ball B(x,r) in U.
(iv)* If U = B(xo, R), then prove that Un)':=o is Cauchy on U if and only
if for every 0 < r < R the restriction of Un)':=o to B(xo, r) is Cauchy.
6.25. For each integer n ::::= 1, let fn: [-1, 1]--+ ~be fn(x) = J.,& + x2.
( i) Prove that each f n is differentiable on (-1 , 1).
(ii) Prove that Un)':=o converges uniformly to f(x) = !xi.
(iii) Prove that f is not differentiable at 0.
(iv) Explain why this does not contradict Theorem 6.5.11.
276 Chapter 6. Differentiation
6.26. For any a> 0, for x in the interval (a,oo), and for any n EN, show that
.
hm -
dn ( 1 -
-
e-Nx) = (-1) n -n!- .
N-'too dxn X xn+l
6.27 . .&. Let U c X be the open ball B(xo, r). Assume that Un)r::'=i is a se-
quence in ( C 1 (U; Y), 11 · llL=) such that the series 2:.':=o D f n converges abso-
lutely (using the sup norm) on all compact subsets of U . Assume also that
2:.':=o fn(xo) converges (as a series in Y). Prove that the sum 2:.':=o fn con-
verges uniformly on compact subsets in U and that the derivative commutes
with the sum: C:Xl 00
Dl:fn = LDfn·
n=O n=O
There are fixed points through time where things must always stay the way they are.
This is not one of them. This is an opportunity.
-Dr. Who
Fixed-point theorems are among the most powerful tools in mathematical anal-
ysis. They are found in nearly every area of pure and applied mathematics. In
Corollary 5.9.16, we saw a very simple example of a fixed-point theorem, namely,
if a map f : [a, b] -+ [a, b] is continuous, then it has a fixed point. This result
generalizes to higher dimensions in what is called the Brouwer fixed-point theorem,
which states that a continuous map on the closed unit ball D(O, 1) C lFn into itself
has a fixed point. The Brouwer fixed-point theorem can be generalized further to
infinite dimensions by the Leray- Schauder fixed-point theorem, which is important
in both functional analysis and partial differential equations.
Most fixed-point theorems only say when a fixed point exists, and do not give
any additional information on how to find a fixed point or how many there are. In
this chapter, we study the contraction mapping principle, which gives conditions
guaranteeing both the existence and uniqueness of a fixed point. Moreover, it also
provides a way to actually compute the fixed point using the method of successive
approximations. As a result , the contraction mapping principle is widely used in
pure and applied mathematics and is the basis of many computational algorithms.
Chief among these are the Newton family of algorithms, which includes Newton's
method for finding zeros of a function, and several close cousins called quasi-Newton
methods. These numerical methods are ubiquitous in applications , particularly in
optimization problems and inverse problems.
The contraction mapping principle also gives two very important theorems:
the implicit function theorem and the inverse function theorem. Given a sys-
tem of equations, these two theorems give straightforward criteria that guarantee
the existence of a function (an implicit function) solving the system. They also
allow us to differentiate these implicit functions without ever explicitly writing the
functions down. These two theorems are essential tools in differential geometry, op-
timization, and differential equations, as well as in applications like economics and
physics.
277
278 Chapter 7. Contraction Mappings and Applications
Our first theorem tells us that a class of functions called contraction mappings
always have a unique fixed point.
D efinition 7 .1.3. Assume Dis a subset of a normed linear space (X, II· II) . The
function f : D -+ D is a contraction mapping if there exists 0 ::; k < 1 such that
Remark 7.1.4. It is easy to see that contraction mappings are continuous. In fact,
they are Lipschitz continuous with constant k (see Example 5.2.2(iii)).
Ex ample 7 .1.6. Letµ be a positive real number, and let T: [O, 1] -t [O, 1]
be defined by T(x) = 4µx(l - x). This function is important in population
dynamics. For any x, y E [O, 1], we have -1::; (1 - x -y)::; 1, and therefore
Unexample 7.1.7. Let f: (0, 1)-+ (0, 1) be given by f(x) = x 2 . For any x
we have lf(x) - f(O)I = lx 21<Ix - OI; but f is not a contraction mapping on
(0, 1) because for any distinct x, y E (1/2, 1) we have lf(x) - f(y)I = lx 2-y 21=
Ix - YI Ix +YI > Ix - YI·
(7.2)
We first prove that the sequence is Cauchy. Since f is a contraction on D, say with
constant k, it follows that
Hence,
Given E > 0, we can choose N such that l~k l x1 - xo l < E. It follows that
l xn - xm l < E whenever n > m 2 N. Therefore, the sequence (xn);;:'°=o is Cauchy.
Since X is complete and D is closed, the sequence converges to some x E D.
To prove that f(x) = x, let E > 0 and choose N > 0 so that llx - xn l < c/2
whenever n 2 N. Thus,
Remark 7.1.9. The contraction mapping principle can be proved for complete
metric spaces instead of just Banach spaces without any extra work. Simply change
each occurrence of l x -y l to d(x,y) in the proof above.
Remark 7.1.10. The proof of the contraction mapping principle above gives an
algorit hm for finding the unique fixed point, given by (7.2). This is called the
method of successive approximations. To use it, pick any initial guess x o E D, and
the sequence f(x 0 ), j2(x0 ), .. . converges to the fixed point. Taking the limit of
(7. 3) as n -t oo shows that the error of the mth approximation fm( x o) is at most
km
i-k l x1 - Xo II·
b -b I
lf(x)-f(y)I= -1 Ix-y+--
2 x y
:::; ~2 Ix - YI i1 - ~
xy
I ·
So when the domain of f is restricted to [y'bj2, oo) we have
1 x -yl I1 - b- I = -1 lx-yl
lf(x)-f(y)l<-l
- 2 b/ 2 2 .
Example 7.1.13. Recall that (C([a, b]; IF) , II · llL=) is a Banach space. Con-
sider the operator L: C([a,b];IF)---+ C([a,b];IF), given by
Thus, if i>- 1 < Md-a), then L is a contraction on C([a, b]; IF), and there exists
a unique function f(x) E C([a, b]; JR) such that
Definition 7.2.1. If Dis a nonempty subset of a normed linear space (X, II· llx)
and B is some arbitrary set, then the function f : D x B --+ D is called a uniform
contraction mapping if there exists 0 ::; >. < 1 such that
llf(x2, b) - f(x1 , b)llx :S >-llx2 - xillx (7.4)
for all x1,x2 ED and all b EB.
1
lf(x1, b) - f(x2, b)I = -
2
lx1 - x21 [1 - _ b_
X1X2
l
1
::; 2 lx1 - x2I ·
Therefore, f is a uniform contraction mapping on D x B.
Remark 7.2.8. The previous lemma shows that for each y E V, there exists a
unique fixed point Z(y) E a6'(Y; X) satisfying
Proof. If we knew that Dg existed, t hen for each y E V we would have Dg(y) E
a6'(Y; X), and applying the chain rule to (7.5) would give
That means that Dg(y) would be a fixed point of the function ¢ : a6'(Y; X) x
V ---+ a6'(Y; X) defined in Lemma 7.2.7. By Lemma 7.2.7, the map ¢ is a uniform
contraction mapping, and so there exists a function Z : V ---+ a6'(Y; X) , defined by
setting Z(y) equal to the unique fixed point of¢(., y). By Lemma 7.2.6 the map
Z is continuous; therefore, all that remains is to show that Z(y) = Dg(y) for each
y . That is, for every € > 0 we must show there exists B(y, c5) C V such that if
llkllY < c5, then
llg(y + k) - g(y) - Z(y)kllx ~ EllkllY · (7.9)
We now prove (7.9). For any h EX and k E Y, let 6(h, k) be given by
If j is C 1 , then for all 'T/ > 0, there exists B(y, c5o) c V such that
requirement g(y) +h(k) E U for (7.11) to hold. Since g is continuous, the function h
is continuous, and thus there exists B(y, 5) C B(y, 80) C V such that llh(k) llx ::; 80
whenever ll k ll Y < 8. Moreover, we have
which yields
Choose 7/::; 1
2'\ and recall that llDif(g(y),y)ll ::; .>-. Combining these, we find
2
Setting M = ll D2f(gi~{)ll+l-.>- and simplifying (7.11) yields
algorithms that are not obviously built from Newton's method really have Newton's
method at their core, once you look carefully [HRW12, Tap09] .
The idea of Newton's method is simple: if finding a zero of a function is
difficult, replace the function with a simpler approximation whose zeros are eas-
ier to find. Zeros of linear functions: are easy to find, so the obvious choice is a
linear approximation. Given a differentiable function f : X ---+ X, the best linear
approximation to f at Xn is the function
7.3.1 Convergence
Before treating Newton's method and its variants, we need to make a brief digression
into convergence rates. These are discussed much more in Volume 2.
An iterative process produces a sequence (xn)~=O of approximations. If we
are approximating x, we expect the sequence to converge to x. Better algorithms
produce a sequence that converges more rapidly.
for each n EN. The sequence is said to converge quadratically, when there exists a
constant k ;::: 0 (not necessarily less than 1) such that
En+l :S ks;,
for all n EN.
If a sequence of real numbers converges linearly with rate µ, then with each
iteration the approximation adds about - log 10 µ digits of accuracy. Quadratic
convergence means that with each iteration the approximation roughly doubles the
number of digits of accuracy. This is much better than linear convergence.
The quasi-Newton method of Section 7. 3.3 converges linearly. In Theorems
7.3.4 and 7.3.12 we show that Newton's method converges quadratically.
7.3. Newton's Method 287
Lemma 7.3.2. Let f : [a, b] -+ JR be C 2 and assume that for some x E (a, b) we
have f(x) = 0 and f'(x) -:/= 0 . Under these hypotheses there exists > 0 such that o
the map
f (x)
¢(x) = x - f'(x)
By the mean value theorem, for any [x, y] c [x - o, x + o] there exists c E [x, y] such
that
f (xn)
Xn+l = Xn - f'(xn) for all n E N (7.14)
Proof. Since f is C 2 the derivative f' is locally Lipschitz at the point x with some
constant L (see Proposition 6.3.7), so there exists a 81 such that lf'(x+c:)- f'(x)I <
L lc: I whenever lc:I < 81 . Leto< 81 be chosen as in the previous lemma. Choose any
initial x 0 E [x - o, x + o] and iterate. By the lemma, the sequence must converge
to x.
Let En = Xn - x for each n EN. By the mean value theorem f(x + En - 1) =
f(x) + f'(x + 1JC:n- 1)En-1 for some 17 E [O, 1] (convince yourself that this still holds
288 Chapter 7. Contraction Mappings and Applications
Figure 7.1. Newton's method takes the tangent line (red) to the curve
y = f (x) at the point (Xn , f (Xn)) and defines Xn+ 1 to be the x -intercept of that line.
Details are given in Theorem 7.3.4.
f (1; + cn-1) I
lcn J = Jc n-1 - j'('-
.r + cn-1 )
= I f'(x + cn-1)cn- 1 - f(x + cn_i) I
+ cn- 1)
f'(x
< I f'(x + cn-i) - f'(x + r]cn-1) J lc _ I
- f'l'-X + cn-1 ) n 1
L(l-17)Jcn-1IJ J
::::; lf'(x + cn_i)J cn- 1
2
:S: MJcn-1J ,
(7.15)
X1 = 2.500000000000000,
X2 = 2.0fiOOOOOOOOOOOOO,
7.3. Newton's Method 289
X3 = 2.000609756097561,
X4 = 2.000000092922295,
X5 = 2.000000000000002.
Notice how quickly this sequence converges to 2.
Remark 7.3.6. If the derivative off vanishes at x, then Newton's method is not
necessarily quadratic and may not even converge at all! If f'(x) = 0, we say that f
has a multiple zero at x. For example, if f is a polynomial and f'(x) = 0, then f
has a factor (over <C) of the form (x - x) 2 .
a - x~_ 1
Xn = Xn-l + 2 (7.16)
3xn-l
X1 = 12.002384259259259,
X2 = 12.002383785691737,
X3 = 12.002383785691718.
In Richard Feynman's book Surely You 're Joking, Mr. Feynman! [FLH85],
he tells a story of an abacus master who challenges him to a race to solve
various arithmetic problems. The man was using his abacus, and Feynman was
using pen and paper. After easily beating Feynman in various multiplication
problems, the abacus master challenged him to a division problem, which
turned out a tie. Frustrated that he had not won the division contest, the
abacus master challenged Feynman to find the cube root of 1729.03. This
was a mistake, because computing cube roots on an abacus is hard work,
but Feynman was an expert in algorithms, and Newton's method was in his
arsenal. He also knew that 123 = 1728 since there are 1728 cubic inches in a
cubic foot. Then using (7.16), he carried out the following estimate
1729 03 1728
-V1729.03 R:j 12 + · -2 = 12 + l.0 3 R:j 12.002. (7.17)
3. 12 432
Feynman won this last contest easily, finding the answer to three decimal
places before the abacus master could find one.
The algorithm (7.16) would have allowed Feynman to compute the cube
root to 5 decimals of accuracy in a single iteration had he computed the
290 Chapter 7 Contraction Mappings and Applications
Remark 7.3.8. It is essential in Newton's method (7.14) that the initial point xo
be sufficiently close to the zero. If it is not close enough, it is possible that the
sequence will bounce around and never converge, or even go to infinity.
Theorem 7.3.10. Let (X, II · II) be a Banach space and assume f : X --+ X is C 1
on an open neighborhood U of the point x. If f(x) = 0 and D f(x) E 86'(X) has a
o
bounded inverse, then there exists > 0 such that
D f(x) - 1 , which we can then use in the contraction mapping above to produce an
iterative algorithm.
The following lemma provides a useful tool for approximating D f (x)- 1 and
is also important for proving convergence of Newton's method.
Lemma 7.3.11. Let (X, II · II) be a Banach space and assume that g: X-+ .9.e(X)
o
is a continuous map. If g(x) has a bounded inverse, then there exists > 0 such
that llg(x) - 1 11<2llg(x)- 1 ll whenever ll x - xii < o.
whenever ll x - x ii < o. D
(7.19)
Proof. Choose o > 0 as in Lemma 7.3.11, and such that B(x, o) c U. Let
x 0 E B(x, o), and define Xn for n > 0 using (7.19). We begin by writing the integral
remainder of the first-order Taylor expansion
1
f (xn) - f (x) = fo D f (x + t(xn - x) )(xn - x) dt
1
= Df(x)(xn - x) + fo (Df(x + t(xn - x)) - Df(x))(xn - x) dt.
292 Chapter 7. Contraction Mappings and Applications
Assume that k is the Lipschitz constant for D f on U. By the previous line, we have
1
II! (xn) - f (x) - D f(x)(xn - x)I :::; fo l Df (x + t(xn - x)) - D f(x) 1 l xn - xii dt
r1
:::; J kllx + t(xn - x) - xii llxn - xii dt
. 0
1 k
Remark 7.3.13. In the previous theorem, we proved that when the initial point
xo is "sufficiently close" to the zero x, then Newton's method converges. But
to be useful, we also need to know whether a given starting point will converge.
This is answered by the Newton-Kantorovich theorem, which is a generalization of
Lemma 7.3.2 to vector-valued functions. It says that the initial value x 0 produces
a convergent sequence if
(7.20)
Here K is the Lipschitz constant for the map D f : U -+ ,qg( X) . The proof of the
Newton-Kantorovich theorem is not beyond the scope of the text, but it is tedious,
so we do not reproduce it here.
Observe that
Df (x) = [12x2 + 4y2 8xy ]
8xy 4x 2 + 12y 2 '
7.4. The Implicit and Inverse Function Theorems 293
Df(x) - l = 1
12(x2 + y2)2
[x -2xy
+ 3y
2 2
- 2xy ]
3x 2 + y2 ·
3] = ~[a].
2 3
x1 = xo-Df(xo) -1 f(xo) = [a]
a - 1a 4 [ _4a
48 2a 2 4a- 8a 3 2a/3 3 a
It is clear that in general Xn = (~) n [ ~], which converges quickly to the zero
at (0, 0).
Example 7.4.1. Consider the level set {F(x, y) = 9} of the function F(x, y) =
x 2 + y 2 . In a neighborhood of the point (xo , Yo) = (0, 3) we can define y as a
function of x, namely, y = .Jg - x 2 . However, we cannot define y as a function
of x in a neighborhood around the point (3, 0) since in any neighborhood of
(3, 0) there are two points of the level set of the form (x, ±.J9 - x 2 ) with the
same x-coordinate. This is depicted in Figure 7.2
The implicit function theorem tells us when we can implicitly define one
or more of the variables (in our example the variable y) as functions of other
variables (in our example the variable x).
(0, 3)
/
/
' \ /
/
' \
I \ I
I \ I
I I
(3,0)
for many problems just knowing it exists and knowing its derivative is enough. The
implicit function theorem not only tells us when the function exists, but also how
to compute its derivative without computing the function itself; see (7.22).
To prove the implicit function theorem we construct a uniform contraction
mapping using a generalization of the quasi-Newton method (7.18) and then apply
the uniform contraction mapping principle (Theorem 7.2.4).
whenever x E U0 . Applying the triangle inequality and the inequality (6.18) from
the integral mean value theorem (Corollary 6.5.5), we have
whenever (x, y) E Uo x Vo, and thus G: Uo x Vo --+ Vo. Moreover, for x E Uo and
y 1 ,y2 E Vo, we apply the mean value inequality (6.18) again to get
This implies that G(x, ·) is a uniform contraction, so for each x there is a unique y
satisfying G(x, y) = y. By Theorem 7.2.4, this defines a Ck function f : U0 --+ V0
o
satisfying G(x, f(x)) = f(x) for all x E Uo. Since JIG(x, y) - Yoll < on Uo x Vo,
we can restrict the codomain to Vo and simply write f : Uo --+ Vo. It follows
that F(x, f(x)) = 0 for all x E U0 , which, together with uniqueness, gives (7.21).
Differentiating and solving for D f (x ) gives (7.22). D
Example 7.4.3. The level set in Example 7.4.1 is a circle of radius 3, centered
at the origin. By the implicit function theorem, as long as D2F(xo, Yo) =
2yo i=- 0, there exists a unique C 1 function f(x) in a neighborhood of the
point (xo,Yo) satisfying F(x , f(x)) = 0.
Setting y = f(x) and differentiating the equation F(x , y) = 0 with re-
spect to x gives
Remark 7.4.4. The previous example is a special case of a claim often seen in
a multivariable calculus class. For any function F(x, y) of two variables, if the
equation F(x, y) = 0 defines y implicitly as a function of x, then the derivative
dy / dx is given by
8F
dy ax
- 8F. (7.23)
dx ay
The implicit function theorem tells us that y is a function of x when ~~ =/=- 0, and
(7.23) is a special case of the formula (7.22).
Given (xo, yo, zo) = (1, -1, 2) ES, we compute D3F(xo, Yo, zo) = -5 =/=- 0. By
the implicit function theorem, the surface S can be written explicitly as the
graph of a function z = z(x, y) in a neighborhood of (xo, y0 , z0 ).
Furthermore, we can find the partial derivatives of z by differentiating
F(x, y, z(x, y)) = 0, which gives
Substituting xo, yo, zo and solving for the partial derivatives of z, we get
xu 2 + yzv + x 2 z = 3,
xyv 3 + 2zu - u 2 v2 = 2.
2 2
F x = [ xu + yzv + x z - 3 ]
( 'y) xyv + 2zu - u 2 v2 - 2 ·
3
D 2F(x ) - [ 2xu
o, Yo - 2z - 2uv 2
yz
3xyv 2 - 2u 2v
]I
x o,Yo
[~ i] '
which is nonsingular (and thus has a bounded inverse) . Therefore, by the
implicit function theorem, we have that y(x) = (u(x), v(x)) is a C 1 function
in an open neighborhood of x 0 satisfying F(x,y(x)) = 0.
This is called the Jacobian determinant of the functions Ji, h , ... , fn·
Consider the system
f (x, y, z) = 0,
g(x,y , z) = 0,
Dif (x, y(x), z(x)) + D2f(x , y(x), z(x))y'(x) + D3f(x, y(x) , z(x))z'(x) = 0,
D 1 g(x, y(x), z(x)) + D2g(x, y(x), z(x))y'(x) + D3g(x , y(x), z(x))z'(x) = 0.
298 Chapter 7. Contraction Mappings and Applications
Moreover, from Cramer's rule (Corollary 2.9.24) we can solve for the deriva-
tives y'(x) and z'(x) to get
Theorem 7.4.8 (Inverse Function Theorem). Assume that (X, II · llx) and
(Y, II · llY) are Banach spaces, that U and V are open neighborhoods of xo E X
and y 0 E Y, respectively, and that f : U --t V is a Ck map for some k E z+
satisfying f(xo) = Yo · If D f(xo) E B&(X; Y) has a bounded inverse, then there
exist open neighborhoods Uo C U of xo and Vo C V of Yo, and a unique Ck function
g : Vo --t Uo that is inverse to f . In other words, f (g(y)) = y for all y E Vo and
g(f(x)) = x for all x E Uo. Moreover, for all x E Uo, we have
Proof. Define F(x,y) = f(x) -y. Since D1F(xo,Yo) = Df(xo) has a bounded
inverse, the implicit function theorem guarantees the existence of a neighborhood
U1 x Vo c U x X of the point (xo, Yo) and a Ck function g : Vo --t U1 such that
f(g(y)) = y for ally E Vo, which implies that g is injective (see Theorem A.2.19).
By restricting the codomain of g to Uo = g(Vo), we have that g is bijective. By
Corollary A.2.20, this implies that f : Uo --t Vo and g : Vo --t Uo are inverses of each
other. Note that U0 = U1 n f- 1(V0 ), which implies that U0 is open. Finally, (7.25)
follows by differentiating f(g(y)) = y. D
Example 7 .4. 9. The function f : ~ --t ~ given by f (t) = cos( t) has D f (t) =
- sin(t), which is nonzero whenever t -=f. k7r for all k E Z. The inverse function
theorem guarantees that for any point t -=f. kn there is a neighborhood U0 c ~
oft, a neighborhood Vo c ~ of cos(t), and an inverse function g : Vo --t U0 .
There cannot be a global inverse function g : ~ -+ ~ because f is not injective,
7.4. The Implicit and Inverse Function Theorems 299
and the image of f lies in [-1, 1]; so the inverse function can only be defined
on neighbor hoods in ( -1, 1).
As an example, fort E (0, 7r) we can take neighborhoods Uo = (0, 7r) and
Vo = (-1, 1) and let g(x) = Arccos(x). If x = cos(t), then the inverse function
theorem guarantees that the derivative of g is
1 -1 -1
Dg(x) = Df(t)- 1 = . ()
-sm t Jl - cos 2 (t) ,11 - x2 ·
Nota Bene 7.4.10. Whenever you use the inverse function theorem, there
are likely to be a lot of negative exponents flying around. Some of these
denote function inverses, and some of them denote matrix inverses, including
the reciprocals of a scalars (which can be thought of as 1 x 1 matrices) .
If the inverse function is g = l- 1 , then the derivative Dg(y) = D(f- 1 ) (y)
of the inverse function is the matrix inverse of the derivative:
where l- 1 on the left means the inverse function g, but the exponent on the
right means the inverse of the matrix D l(x).
Of course the inverse of a matrix is the matrix representing the inverse
of the corresponding linear operator, so these exponents really are denoting
the inverse function in both cases- the real problem is that many people
confuse the linear operator D l (x) : X ---+ Y (the function we want to take the
inverse of) with the nonlinear function D l : U ---+ ~(X, Y), which often has
no inverse at all.
You can verify that this agrees with the result of finding g explicit ly and then
differentiating.
Theorem 7.4.12. The inverse and implicit function theorems are equivalent.
Proof. Since we used the implicit function theorem to prove the inverse func-
tion theorem, it suffices to prove the implicit function theorem, using the inverse
function theorem. Let G : U x V -+ X x Z be given by G(x, y) = (x , F(x, y)) .
Note that
and
Ji,= .6.tioc,
and we let t = [ti t4r and x = [x y z e] T. Finally, let
F(t,x) = 0.
We treat this as a system of four equations with four unknowns (x , y, z, and£).
Suppose we wish to determine the change in x if we perturb t, or, conversely,
suppose we want x to be determined with a certain degree of precision. How much
error cant have? The implicit function theorem is perfectly suited to give this kind
of information. It states that x is a function of t if D2F(t, x) is invertible, and
in that case we must have Dx(t) = - D 2F(t,x(t)) - 1D 1F(t,x(t)). Written more
explicitly, we have
[a'
8t1
.§JI...
8t1
oz
8t1
ae
8t1
ox
8t2
.§JI...
8t2
oz
8t2
ae
8t2
ox
ot3
.§JI...
ot3
oz
ot3
ae
ot3
ax
ot4
.§JI...
ot4
oz
ot4
ae
at.
=-
[ilox
8F2
ax ay
QEi
ox
0F4
ax ay
Qft
ay
8F2
QEi
f)y
0F4
Qft
oz
8F2
8Z
QEi
oz
8F4
8Z
illrae
8F2
Eff
f)p,
Eff
8F4
Eff
8F1
at;
fJ.EJ.
8t1
0F3
at;
8F4
at;
8F1
at;
fJ.EJ.
8t2
8F3
8t2
8F4
at;
8F1
ot3
fJ.EJ.
ot3
ap,
at;
8F4
at;
at.
®J
fJ.EJ.
ot4
QEi
at.
{!_fl
ot4
.
Elementary calculations show that if the times are perturbed, the change in x is
approximately
OX OX OX ox
.6.x;:::;::: 8.6.t1
ti
+ 8.6.t2
t2
+ 8.6.t3
t3
+ 8.6.t4.
t4
Moreover, one can show that if all of the .6.ti values are correct to within a nano-
second (typical for the clocks used on satellites), then the coordinates will be correct
to within 3 meters. For further information, see [N JN98] .
7.5 Conditioning
If the answer is highly sensitive to perturbations, you have probably asked the wrong
question.
-Nick Trefethen
302 Chapter 7. Contraction Mappings and Applications
Example 7.5.1. Consider the function y = x/(1 - x). For values of x close
to 1, a small change in x produces a large change in y. For example, if the
correct input is x = 1.001, and if that is approximated by x = 1.002, the
actual output of ii = 1/ (1 - 1.002) = -500 is very different from the desired
output of y = 1/(1 - 1.001) = -1000. So this problem is ill conditioned near
x = 1. Note that this error has nothing to do with round-off errors in the
algorithm for computing the values- it is entirely a property of the problem
itself.
But if the desired input is x = 35, then the correct output is y = -1.0294,
and even a bad approximation to the input like x = 36 gives a good approxi-
mate output ii= -1.0286. So this problem is well conditioned near 35.
'() .
x = 1im
r;, sup llf(x+h)-f(x)ll .
8 -+ 0 + llhll<8 llhll
7.5. Conditioning 303
Proposition 7.5.3. Let X and Y be Banach spaces, and let Uc X be an open set
containing x. If f : U ---+ Y is differentiable at x, then
Proof. By Lemma 6.4.1, for every E > 0, if llhll is sufficiently small, then
In most settings, relative error is more useful than absolute error. An error
of 1 is tiny if the true answer is 10 20 , but it is huge if the true answer is 10- 20 .
Relative error accounts for this difference. Since the condition number is really
about the size of errors in the output, the relative condition number is usually a
better measure of conditioning than the absolute condition number.
Definition 7.5.4. Let X and Y be normed linear spaces, and let f: X---+ Y be a
function. The relative condition number off at x E X is
This leads to a general rule of thumb that, without any error in the algorithm
itself, we should expect to lose k digits of accuracy if the relative condition
number is lOk.
304 Chapter 7. Contraction Mappings and Applications
llDf(x)ll (7.28)
11:(x) = l f(x) ll/llxll
Ex ample 7.5.8.
(i) Consider the function f(x) = 1 _".'x of Example 7.5 .1. For this function
Df(x) = (1 - x)- 2 , and hence by (7.28) we have
llDf(x)ll l~I I 1 I
11: = llf(x)ll/llxll = Ii.".'x I / lxl = 1 - x .
This problem is well conditioned when x is far from 1, and poorly con-
ditioned when ll - xi is small.
Proposition 7.5.9. Define P : lFn+l x lF --t lF by P(a, x) = L:~o aixi. For any
given b E pn+l and any simple root;<: of the polynomial p( x) = P(b, x), there is a
neighborhood U of b in lF'n+l and a continuously differentiable function r : U --t lF
with r(b) = z such that P(a,r(a)) = 0 for all a E U. Moreover, the relative
7.5. Conditioning 305
condition number of r as a function of the ith coefficient ai, at the point (b, z) , is
(7.29)
Proof. A root z of a polynomial pis simple if and only if p' ( z) -:/=- 0. Differentiating P
at (b, z) with respect to x gives DxP(b, z) = 2:~ 1 ibizi- l = p'(z). Because p'(z) is
invertible, the implicit function theorem guarantees the existence of a neighborhood
U of a and a unique continuous function r : U --+ IF such that r(b) = z and such
that P(a,r(a)) = 0 for all a EU. Moreover, we have
Combining this with (7.28) shows that the relative condition number of r as a
function of the ith coefficient ai is given by (7.29). D
15 18 (-210)
K, = ~ 3.0 x 10 10 .
p'(15)
(i) Given A E Mn(IF), what is the relative condition number off (x) =Ax?
(ii) Given x E !Fn, what is the relative condition number of g(A) =Ax?
x
x
2 x
x x
-1 x x
-2 x
x
x
-3
-5 0 5 10 15 20 25
Figure 7.3. The blue dots are the roots of the Wilkinson polynomial w(x)
plotted in the complex plane. The red crosses are the roots of the polynomial per-
turbed by 10- 7 in the x 19 -coefficient. As described in Example 7. 5.10, the roots
are very sensitive to tiny variations in this coefficient because the relative condition
number is very large .
Theorem 7.5.11.
K, = llAll ~ 1
llAxll <- llAllllA- 1 . (7.30)
Moreover, if the norm II· II is the 2-norm, then equality holds when x is a right
singular vector of A corresponding to the minimal singular value.
(ii) Given x E lFn, the relative condition number of g(A) =Ax satisfies
(7.31)
(7.32)
Moreover, for the 2-norm, equality holds when b is a left singular vector of A
corresponding to the maximal singular value.
7.5. Conditioning 307
Nota Bene 7.5.13. Although 11,(A) is called the condition number of the ma-
trix A, it i not the condition number (as given in Definition 7.5.4) of most
problems associated to A. Rather, it is the supremum of the condition num-
bers of each of the various problems in Theorem 7.5.11; in other words, it is
a sharp uniform bound for each of those condition numbers. Also the prob-
lem of finding eigenvalues of A has an entirely different condition number (see
Section 7.5.4) .
Proposition 7.5.14. Let .A be a simple eigenvalue {that is, with algebraic multi-
plicity 1) of A with right eigenvector x and left eigenvector yH, both of norm l. Fix
308 Chapter 7. Contraction Mappings and Applications
(7.33)
D F_[A + tE - µI
~,µ - 2~H
-~]
0 .
Exercise 7.31 shows that D~,µF(O, x,, >.) is invertible. The implicit function the-
orem now applies to guarantee the existence of a differentiable function f (t)
(~(t), µ(t)) E lFn x <C, defined in a neighborhood U oft= 0 such that
(E-µ'(O))x-t-(A->.J)((O)] _ [O]
[ 2xH<;'(O) - 0 .
which gives
,_ '(O)-yHEx _ 1_
/'i,-µ - H :::::: H '
y x y x
where the last inequality follows from the fact that llEll = 1. In the special case
that E = yxH, it is immediate that the inequality is an equality. D
7_5_ Conditioning 309
Proof. If A is normal, then the second spectral theorem (or rather the analogue
of Corollary 4.4.9 for normal matrices) guarantees that there is an orthonormal
eigenbasis of A. Since >- is simple, one of the basis elements x corresponds to .A, and
by Remark 4.3.19 xH is a corresponding left eigenvector. Thus, yHx = xHx = 1,
and Pi,~ l. D
A= [o.~Ol 1000]
1 '
which has eigenvalues {O, 2}. Take as right eigenvector x = [l -0.00l] T and
left eigenvector yH = [-0.001 l] , and note that llxlloo = ll Ylloo = l. Setting
E = [~ ~]'
we have by Proposition 7.5.14 that there is a continuous function µ(t) such
that µ(O) = 0 and µ(t) is an eigenvalue of A+tE near t = 0. Moreover, (7.33)
gives the absolute condition number Pi, for µ at t = 0 as
, yHEx 1
K = -- = =500.
yHx 2 X lQ- 3
A + tE = [0l 10100]
Exercises
Note to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
first and second line are for Section 1, the exercises between the second and third
lines are for Section 2, and so forth.
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *). We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with .& are especially important and are likely to be used later
in this book and beyond. Those marked with t are harder than average, but should
still be done.
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
7.1. Consider the closed, complete metric space X = [O, oo) with the usual metric
d(x, y) = Ix - YI· Show that the map
f('r)
. -- x + v'x2+1
2
satisfies d(f(x), f(y)) < d(x, y) for all xi- y EX and yet has no fixed point.
Why doesn't this violate the contraction mapping principle?
7.2. Consider the sequence (xk)k"=o defined by the recursion Xn = ...ja + Xn-l,
with Oi > I and xo = l. Prove that
Use this to prove that a contraction mapping can have at most one fixed point.
Exercises 311
Use this to prove that (r (x) );;:'=0 is a Cauchy sequence. Finally, prove that
if x is the limit of this sequence, then f (x) = x, and for any integer n > 0
Kn
d(r(x), x) ::::; _ Kd(x, f(x)).
1
7.6. Prove Theorem 7.1.14. Hint: This follows the same ideas used to prove
Theorem 7 .1.8; define a sequence via the method of successive approximations
and prove convergence.
7.7. Let f: ~ x ~-+~be defined by f(x, y) = cos(cos(x)) +y. Show that f(x, y)
is a C 00 uniform contraction mapping. Hint: Consider using the mean value
theorem and the fact that Isin(x)I ::::; sin(l) < 1 for all x E [- 1, l].
7.8. Let Cb([O,oo);~) = {g E C([O,oo);~): llgllL= < oo} bethe set of continuous
functions with bounded sup norm. Given µ > 0 let Tµ : Cb([O, oo ); ~) x
Cb([O,oo);~)-+ C([O,oo);~) be given by
Essentially the same argument as in Exercise 7.8(i) shows that (Ca, II· Ila) is
a Banach space.
(i) Fix z E Ca, and define a map Tz: Ca x Ca -t C([O, oo); JR) by
e(a-l)sx(s) ds.
Prove that for each f EC,~ the map Tz,J is an operator on Ca.
(ii) Fix k < 1 and let B = {f E Ca: llflla < k}. Prove that Tz: Ca xB -t
Ca is a uniform contraction.
7.11. Suppose (X, 11 · llx) and (Y, 11 · llY) are Banach spaces, UC X and V c Y are
open, and the function f : U x V -t U is a uniform contraction with constant
0 ::::; >. < l. In addition, suppose there exists a C such that
7.13. (i) Using Newton's method, derive the square root formula (7.15) in
Example 7.3.5.
(ii) Derive the cube root formula (7.16) in Example 7.3.7. Then compute
the long division in (7.17) by hand to get 5 decimals of accuracy.
7.14. The generic proof of convergence in Newton's method requires a sufficiently
close initial guess x 0 . Prove that the square root solver in (7.15) converges
as long as x -/=- 0.
7.15. Although Newton's method is very fast in most cases, there are situations
where it converges very slowly. Suppose that
- l/x 2 if x-/=- 0,
f(x) = { ~ if x = o.
The function f can be shown to be C 00 , and 0 is the only solution off (x) = 0.
Show that if xo = 0.0001, it takes more than one hundred million iterations of
the Newton method to get below 0.00005. Prove, moreover, that the closer Xn
is to 0, the slower the convergence. Why doesn't this violate Theorem 7.3.4?
7.16. Let F: JR 2 -t JR 2 be given as
2
F( x,y) = [x-y2 +8+ cosy ]
y - x + 9 + 2 cos x ·
Using the initial guess x 0 = (1r,~), compute the first iteration of Newton's
method by hand.
Exercises 313
7.17. Determine whether the previous sequence converges by checking the Newton-
Kantorovich bound given in (7.20). Hint: To compute the operator norms
ll DJ(xo) - 1112 and ll Df(x) - Df(y)l l2 you may wish to use the results of
Exercise 3.28.
7.18. Consider the points (x, y) E IR 2 on the curve cos(xy) = 1 + tan(y). Find
conditions on x and y that guarantee x is locally a function of y, and find
dx/dy.
7.19. Find the total Frechet derivative of z as a function of x and y on the surface
{(x, y, z) E IR 3 I x 2 + xyz + 4x 5 z 3 + 6z + y = O} at the origin.
7.20. Show that the equations
sin(x + z) + ln(yz 2) = 0,
ex+z + yz = 0
f(x, y) 3 + xg(x, y) - y = 0,
g(x, y) 3
+ yf(x, y) - x = 0.
7.22. The principal inverse secant function f(x) = Arcsec(x) has domain (- oo, - 1)
U(l, oo) and range (0, 7r / 2) U (7r /2, 7r). Using only the derivative of the secant,
basic trigonometric properties, and the inverse function theorem, prove that
the derivative of f is
df 1
dx lxl vx 2 - 1 ·
7.23. Let S: M 2(IR) --+ M2(1R) be given by S(A) = A 2. Does Shave a local inverse
in a neighborhood of the identity matrix? Justify your answer.
7.24.* Denote the functions f : IR 2 --+ IR 2 and g : IR 2 --+ IR 2 in the standard bases as
f(x) = (f1(x1,x2),h(x1,x2)),
g(x) = (g1(x1, x2), g2(x1, x2)) ,
where x = (x 1,x 2 ). Prove: If f and g are C 1 and satisfy f(g(y)) =y for all
y, then
7.26. Find the relative condition number at each x 0 in the domain of the following
functions:
(i) ex.
(ii) ln(x).
(iii) cos(x).
(iv) tan(x).
7.27. Finish the proof of Theorem 7.5.11 by showing that in the 2-norm, if x is
a right singular vector of A associated to the minimal singular value, then
equality holds in (7.30).
7.28. Given (x, y) E C 2 , consider the problem of finding z such that x 2 + y 2 - z 3 +
z = 0. Find all the points (x, y) for which z is locally a function of (x, y).
For a fixed value of y, find the relative condition number of z as a function
of x. What is this relative condition number near the point (x, y) = (0, 0)?
7.29. Give an example of a matrix A with condition number K::(A) > 1000 (assuming
the 2-norm). Give an example of a matrix B with condition number K::(B) = 1.
Are there any matrices with condition number less than 1? If so, give an
example. If not, prove it.
7.30. Proposition 7.5.9 gives sufficient conditions for the roots of a polynomial to
be a continuous function of the coefficients.
(i) Consider the roots of the polynomial f(a,x) = x 2 +a, where a,x ER
If a = 0, then xo = 0 is a root, but if a > 0, then f has no roots in R
Why doesn't this contradict Proposition 7.5.9?
(ii) The quadratic formula gives an explicit formula for all the roots of a
quadratic polynomial as a function of the coefficients. There is a simi-
lar but much more complicated formula for roots of cubic and quartic
polynomials, but Abel's theorem guarantees that there is no general al-
gebraic formula for the roots of a polynomial of degree 5 or greater.
Why doesn't this contradict Proposition 7.5.9?
7.31.* Prove that the derivative D~ , µF(O, x, >.) of
F(t c
,..,,µ
) = [(A+ tE)~-1- µ~] '
~H~
Notes
For more on the uniform contraction mapping principle and further generalizations,
see [Chi06, Sect. 1.11].
A readable description of the Newton-Kantorovich bound is given in [Ort68].
Kantorovich actually proved the result in two different ways, and an English trans-
lation of his two proofs is given in [Kan52, KA82].
Our treatment of conditioning is inspired by [TB97, Dem97, GVL13]. For
more on conditioning in general, see [TB97, Dem97]. For more on conditioning of
the eigenvalue problem, see [GVL13, Sect. 7.2.2].
Exercises 7.4-7.5 are based on the paper [Pa107]. Exercise 7.15 comes from
[Ans06].
Pa rt Ill
Nonlinear An alysis II
Integration I
319
320 Chapter 8. Integration I
But unfortunately the function limn-+oo f n need not be Riemann integrable, even if
the individual terms f n are all Riemann integrable.
Instead of limits in the L 00 -norm (uniform limits), we often need to consider
limits in the L 1-norm (given by llfllu = J llfll). The space of Riemann-integrable
functions is not complete in the L 1-norm, so there are L 1-Cauchy sequences that
do not converge in this space. To remedy this, we need to extend the space to a
larger one that is complete with respect to the L 1 -norm , and then we can use the
continuous linear extension theorem (Theorem 5.7.6) to extend the integral to this
larger space. The result of all this is a much more general theory of integration. 42
If the functions being integrated take values in X = JR, then this construction is
known as the Daniell integral. We will usually call it the Lebesgue integral, however,
because it is equivalent to the Lebesgue integral, and that name is more familiar to
most mathematicians. If X is a more general Banach space, then this construction is
called (or, rather, is equivalent to) the Bochner integral. For simplicity, we restrict
ourselves to the case of X = JR for most of this chapter and the next, but much of
what we do works just as well when X is a general Banach space.
It is important to keep in mind that on a bounded set all Riemann-integrable
and regulated-integrable functions are also Lebesgue integrable, and for these func-
tions the Riemann and regulated integrals are the same as the Lebesgue integral.
Thus, to compute the Lebesgue integral of a continuous function on a compact set,
for example, we can just use the usual techniques for finding the regulated integral
of that function.
In this chapter we begin by extending the definition of the regulated integral
to multivariable functions, and then by giving an overview of the main ideas and
theorems of Lebesgue integration (in t he next chapter we give the details and com-
plete the proofs). The majority of the chapter is devoted to some of the main tools
of integration. These include three important convergence theorems (the monotone
convergence theorem, Fatou's lemma, and the dominated convergence theorem),
Fubini's theorem, and a generalized change of variables formula. The three conver-
gence theorems give useful conditions for when limits commute with integration.
Fubini's theorem gives a way to convert multivariable integrals into several single-
variable integrals (iterated integrals), which can then be evaluated in the usual ways.
Fubini also shows us how to change the order in which those iterated single-variable
integrals are evaluated, and in many situations changing the order of integration can
greatly simplify the problem. Finally, the multivariable change of variables formula
is analogous to the single-variable version (Corollary 6.5 .6) and is also very useful.
42
The approach to integration in these next two chapters is inspired by, but is perhaps even more
unusual than, what we did with integration in Chapter 5 (see Nota Bene 5.10.1) .
8.1. Multivariable Integration 321
Definition 8.1.1. Let a , b E lRn, with a= (a1 , ... , an) and b = (b1, .. . , bn)· We
denote by [a, b] the closed n -interval (or box)
b1 -- t(4)
1
3)--
b
2) _,_
1) _f-
O) _ ,_
a
I I I I
I I I I
(0) (1) (2) (3) (4)
- b1 -- t1
Rl ,3 R2,3 R3,3 R4 ,3
Rl ,1 R2,1 R3 ,1 R4,1
a
[a,b] = LJ R1 .
IE 9
To simplify notation we write I E 9 to denote that the index I lies in the product
of the indices of 9, that is, I E {l , ... , ki} x · · · x {l, . . . , kn}. We use this notation
repeatedly when discussing step functions .
Throughout the rest of this section, assume that (X, II · II) is a Banach space.
Definition 8.1.4. For any set E C JRn the indicator function :Il.e of E is the
function
l, z EE,
:Il.e (z ) ={
0, z rf. E.
Here x1 EX for each IE 9 , and the subintervals R1 C [a, b] are determined by the
index and the partition, as described in Definition 8.1.3. More generally, for any
E C JRn we consider a function s : E -+ X to be a step function if it is zero outside
of an interval [a, b] and the restriction sl [a,b] to [a, b] is a step function on [a, b].
Proposition 8.1.5. The set S([a , b]; X) of step functions is a subspace of the
normed linear space of bounded functions (L 00 ([a, b], X), II· llLoo ).
f(s) = r
l[a,b]
s= L x1>-.(R1) .
JE.9'
The proof of the next proposition is very similar to the single-variable case
(Proposition 5.10.5).
Proposition 8.1.8. For any compact interval [a, b] C Rn, the integral operator f :
S([a, b]; X) --+ X is a bounded linear transformation with norm llfll = >-.([b - a]).
Definition 8.1.10. For any [a,b] cRn we denote the set S([a,b]; X) by ~([a,b]; X).
The functions in ~([a, b]; X) are called regulated-integrable functions . For any
f E ~([a, b];X) , we call the linear transformation f(f) in Theorem 8.1.9 the
integral of f , and we usually denote it by
r
J [a,b]
f = f(f).
(iii) Let llfll denote the function t f---t ll f(t) ll from [a, b] to JR.. We have
r
II J[a,b]
111 < r
J[a,b]
1 111.
(iv) If llf(t)ll ~ llg(t)ll for every t E [a , b], then f[a,b] llfll ~ f[a,b] llgll ·
{ Sk 2: -c.
J[a,b]
r
J[a,b]
h = lim r
k--+= J[a,b]
Sk > 0.
-
Remark 8.1.12. As in the single-variable case, one can easily check that the
Riemann construction of the integral defines a bounded linear transformation on
&t'([a, b]; X) that agrees with our definition on step functions , and hence by the
uniqueness part of the continuous linear extension theorem must agree with our
construction on all of &t'([a, b]; X). The Riemann construction does work for a
slightly larger space of functions than &t'([a, b]; X); however, we need the integral
to be defined on yet more functions, because in applications we must often move lim-
its past integrals, but many limits of Riemann-integrable functions are not Riemann
integrable. This is discussed in more depth in Section 8.2.
Definition 8.1.13. For any function f: E-+ X , the extension off by zero is the
function
z EE,
fliE(z) = {f(z),
0, z ~ E.
326 Chapter 8. Integration I
Unexample 8.1.14. Even the set consisting of a single point p E [a, b) CIR
has an indicator function lip that is not in &r'([a,b];IR). To see this, note first
that step functions on any interval [a, b] C IR are all right continuous, meaning
that
lim s(t) = s(to)
t-tt;l•
for every to E [a, b). Moreover, by Exercise 8.4 the uniform limit of a sequence
of right-continuous functions is right continuous, and thus every function
in &r'([a, b]; IR) is right continuous. But lip is not right continuous, so lip '/.
&r'([a, b]; IR) .
There are many important sets E and functions f for which we would expect
to be able to define the integral IE
f, but for which the regulated integral (and also
the more traditional Riemann construction) does not work. In the next section we
discuss this problem and its solution in more depth.
lim r
n-too J[a,b]
fn = r lim fn
J[a,b] n-too
= r
J[a,b]
f.
llfllu = r
J[a,b]
11111 .
8.2. Overview of Daniell - Lebesgue Integration 327
Nota Bene 8.2.3. Alt hough we previously defined t he integral for Banach-
valued functions, and alt hough most of t he result s of t his chapter hold for
general Banach-valued functions, for the rest of t he chapter we rest rict our-
selves to JR-valued functions (that is, to t he case of X = JR) just to keep t hing
simple.
Note that t he case of real-valued functions can easily be used to describe
the case where f takes values in any finite-dimensional Banach space. For a
complex-valued funct ion f = u +iv, simply define t he integral to be the sum
of two real-valued integrals
r
J[a ,b]
f = ( r
J [a,b]
Ji, ... , r in).
J [a ,b]
(i) How to const ruct a vector space L 1 ([a, b ]; JR) containing both S( [a , b ]; JR) and
&i'( [a , b ]; JR) as subspaces.
(ii) How to define integrat ion and the L 1-norm on L 1 ([a, b ]; JR) in such a way that
The full proofs of t hese constructions and their properties are given in Chapter 9.
328 Chapter 8. Integration I
Nota Bene 8.2.4. If X =IR, then the integral, as we have defined it here, is
often called the Daniell integral. We will usually call it the Lebesgue integral,
however, because it is equivalent to the Lebe gue integral, and that name is
more familiar to most mathematicians.
It is also common to u e the phrase Lebesgue integration when talking
about integration of function in L 1 ([a, b ];IR), but it i important to note that
there is really just one "integration" going on here. In particular, ~([a , b]; IR) C
L 1 ([a, b]; IR) , and Lebesgue integration restricted to ~([a , b]; IR) is just what
you have always called integration since your first introduction to calculu .
That is, Lebesgue integration is essentially just a way to extend regular old
Riemann integration to a much larger collection of functions.
Unfortunately, 1 · lls' is not a norm because there are nonzero elements (sn)~=O of
S' that have ll(sn)~=olls' = 0. But the set
We define the space L 1 ([a, b]; JR) to be the completion of S([a, b]; JR) in the
1
L -norm; that is,
This is guaranteed to be complete, but we have two wrinkles to iron out. The first
wrinkle arises from the way the completion is defined . We want L 1 ([a, b];JR) =
S([a, b]; JR) to consist of functions, not equivalence classes of sequences of functions.
So, for each element of L 1 ([a, b]; JR), that is, for each equivalence class of L 1-Cauchy
sequences, we must give a well-defined function. To do this, we take the pointwise
limit of a Cauchy sequence in the equivalence class. We prove in Section 9.3 that
for each equivalence class there is at least one sequence that converges pointwise
to a function. So we have an associated function arising from each element of
L 1 ([a, b];JR).
But the second wrinkle is that two different sequences in the same equivalence
class can converge pointwise to different functions (see Unexample 8.2.5). Thus,
unfortunately, there is not a well-defined function for each element of L 1 ([a, b]; JR).
One instance of this is given in Unexample 8.2.5.
Unexample 8.2.5. Let Sn E S([O , l]; JR) be the characteristic function of the
box [O, 2~ ), and let (tn)~=O be the zero sequence (tn = 0 for all n E N).
The sequence (sn)~=O converges pointwise to the characteristic function of
the singleton set {O}, but the sequence of L 1 -norms llsnllu converges to 0.
Integrating gives II Sn - tn II Ll -+ 0, and so these two Cauchy sequences are in
the same equivalence class in L 1 ( [O, l]; JR), but their pointwise limits are not
the same.
a bounded linear operator on all of L 1 ([a, b];IR). This is the Daniell or Lebesgue
integral.
Nota Bene 8.2. 7. Beware that although it is equivalent to the more tradi-
tional definition of £ 1 ([a, b]; IR), our definition of £ 1 ([a, b]; IR) is very different
from the definition that you would see in a tandard course on integration. In
most treatments of integration, L 1 ([a, b];IR) is defined to be the set of (equiv-
alence classes a.e. of) measurable functions on [a, b] for which llfllu is finite.
But for us, L 1 ([a, b]; IR) i the set of (equivalence classes a.e. of) functions that
are almost everywhere equal to the pointwise limit of an £ 1 -Cauchy sequence
of step functions on [a, b].
Finally, the following proposition and its corollaries show that every sequence
that is Cauchy with respect to the £ 00 -norm is also Cauchy with respect to the
£ 1-norm; so a'([a, b]; IR) is a subspace of L 1 ([a, b]; IR), and the new definition of
integration, when restricted to a'( [a, b]; IR), agrees with our earlier definition
of integration.
Proof. We have
where the suprema are both taken over all f E a'([a, b]; IR). Thus, if llTllu,x < oo,
then we also have llTllLoo,x < oo. D
8.3. Measure Zero and Measurability 331
Corollary 8.2.11. Let f : &t'([a, b]; JR)-+ JR be a linear transformation that, when
restricted to step functions S ( [a, b]; JR), agrees with the integral yr : S ( [a, b]; JR) -+
R If f is a bounded transformation with respect to the L 1 -norm, then f must be
equal to the integral yr: &t'([a, b]; JR)-+ JR defined in Theorem 8.1.9.
This suggests that for any set A of measure zero, any subset B c A should also
have measure zero, and if (Ck)k°=o is a sequence of sets whose measure goes to zero
(>..(Ck)-+ 0 ask-+ oo), then nkEN Ck should have measure zero.
The second property we expect is that the measure of a union of sets should
be no bigger than the sum of the measures of the individual pieces:
Definition 8.3.1. A set A C )Rn has measure zero if for any c > 0 there is a count-
able collection of n -intervals (h)'f:"=o' such that AC U'f:"=o h and L:%':o .A(h) < c.
Proof. Item (i) follows immediately from the definition. The proof of (ii) is
Exercise 8.11.
For (iii) assume that (CkhEN is a countable collection of sets of measure zero.
Assume that E > 0 is given. For each k E N there exists a collection (Ij,k)jE N of
intervals covering Ck such that L,jEN>..(Ij,k) < c/2k+ 1 . Allowing k also to vary, the
collection (Ij ,k)jE N,k EN of all these intervals is a countable union of countable sets,
and hence is countable. Moreover, we have ukEN ck c Uj,kEN Ij,k and
Ex ample 8.3.3. The Cantor ternary set is constructed by starting with the
closed interval Co = [O , 1] C JR and removing the (open) middle third to get
C 1 = [O, 1/ 3] U [2/3, l]. Repeating the process, removing the middle third of
each of the preceding collection of intervals, gives C2 = [O, 1/9] U [2/9, 1/3] U
[2/3, 7/ 9] U [8/9, l]. Continuing this process gives a sequence of sets (Ck)~ 0 ,
such that each Ck consists of 2k closed intervals of total length (2/3)k.
Define the Cantor ternary set to be the intersection Coo = nkENck.
Since [O, 1] is compact and each Ck is closed, C 00 is nonempty (see
Theorem 5.5.11). One can act ually show that this set is uncountable (see, for
example, [Cha95, Chap. 1, Sect. 4.4]).
To see that C 00 has measure zero, note that each Ck contains C 00 , and
Ck is a finite union of closed intervals of total length (2/3)k . Since (2/3)k can
be made arbitrarily small by choosing large enough values of k, the set C 00
satisfies the conditions of Definition 8.3. l.
Definition 8.3.4. We say that functions f and g are equal almost everywhere and
write f = g a.e. if the set {t I f(t) "/=- g(t)} has measure zero in JRn.
Ex ample 8 .3.5. Integration gives the same result for any two functions that
are equal almost everywhere, so when we need to integrate a function that is
messy or otherwise difficult work with, but all the "bad" parts of the function
are supported on a set of measure zero, we can replace it with a function that
is equal to t he first function almost everywhere, but is (hopefully) easier to
integrate.
For example, the Dirichlet function defined on JR by
f(t) = { ~ if t is rational,
it t is irrational
8.3. Measure Zero and Measurability 333
is equal to zero almost everywhere, because the set where f (t) =/= 0 is countable,
and hence has measure zero. Since f = 0 a.e., for any interval [a, b] C JR we
have J: f = J: 0 = 0.
Proposition 8.3.6. The relation = a. e. defines an equivalence relation on the set
of all functions from [a, b] to JR.
fn = nll[o,~] ·
The sequence Un(O));:o=O does not converge , but for all x =/= 0 the sequence
Un(x));:o=O converges to zero. Hence, fn(x)-+ 0 a.e.
8.3.2 Measurability
We often wish to integrate functions over a set that is not a compact interval.
Unfortunately, we cannot do this consistently for all sets and all functions. The
functions for which we can even think of defining a sensible integral are called
measurable functions, and the sets where we can sensibly talk about the possibility
of integrating functions are called measurable sets.
}A
r J = l[a,b]
r fll.A ·
334 Chapter 8. Integration I
Define L 1 (A;JR) to be the set of functions f: A-----t JR such that j]_A E L 1 ([a, b];JR).
The next proposition and its corollary show that the integral over A is well
defined; that is, it is independent of the choice of interval [a, b] .
r
J[a,b]
j:[A = r
J[c,d]
fRA. (8.2)
Proof. (==?) If fRA E L 1 ([a, b];IR'.), then there is a sequence (sn)~=O of step
functions on [a, b] such that (sn)~=O is L 1-Cauchy on [a, b] and Sn ----+ fRA a.e.
Extending each of these step functions by zero outside of [a, b] gives step functions
tm on [c, d], and tm ----+ j]. A a.e.
From the definition of the integral of a step function , we have
r
J[c,d]
[tn - tm[ = r
J[a,b]
[sn - Sm[ and r
J[c,d]
tn = r
J[a,b]
Sn
for all n, m E N. Hence, (tn)~=O is L 1-Cauchy on [c, d], the function f]_A E
L 1 ([c, d]; JR) , and (8.2) holds.
(¢:=) If fRA E L 1 ([c , d]; IR), then there is a sequence ( tn )~=O of step functions
on [c, d] such that (tn)~=O is L 1-Cauchy on [c, d] and tn ----+ f RA a.e. Multiplying
each of these by n[a,b] gives step functions Sn= tnli[a,b]·
Now we show that (tnli[a,bJ)~=O is L 1-Cauchy on [a, b]. Given c > 0, choose
N > 0 such that [[tn -tm[[u < c (on [c,d]) whenever n,m > N . The L 1-norm of
Sn - Sm on [a, b] is
r
J[a,b]
[sn - Sm[= r
J[a,b]
[tn - tm[li[a,b] :::; r
J [c,d]
[tn - tm[ < c.
Thus, (sm) ~=O is L 1-Cauchy on [a, b], and hence f]_A E L 1 ([a, b]; IR).
Finally, (sn)~=O also defines an L 1-Cauchy sequence on [c, d] converging to
fRA a.e. with f[a,b] Sn = f[c ,d] Sn for every n. Therefore (sn)~=O and (tn)~=O de-
fine the same element of L1([c,d];IR), and they have the same integral. Hence,
(8.2) holds. D
Corollary 8.3.12. If A c [a, b] n [a', b'] is measurable, then f]_A E L 1 ([a, b]; JR)
if and only if fliA E L 1 ([a', b'];JR). Moreover,
r
J[a,b]
f 1lA = r
J[a' ,b']
f]_A· (8.3)
8.4. Monotone Convergence and Integration on Unbounded Domains 335
Nota Bene 8.3.13. Most sets that you are likely to encounter in applied
mathematics are measurable, including all open and closed sets and any count-
able unions and intersections of open or closed sets. But not every subset of
JRn is measurable. We do not provide an example here, but you can find
examples in [Van08, RFlO].
lim r
k-+oo J(a,b]
fk = r lim fk ·
J[a,b] k-+oo
But a sequence of functions that converges pointwise does not necessarily converge in
the £ 1 -norm (see Unexample 8.4.3) , and its pointwise limit is not always integrable.
The three convergence theorems give us conditions for identifying when a pointwise-
convergent sequence is actually £ 1 -Cauchy.
After discussing some basic integral properties, we state and prove the mono-
tone convergence theorem. We conclude the section with an important consequence
of the monotone convergence theorem, namely, integration on unbounded domains.
Definition 8.4.1. For any set A and any function f: A ---t JR, define
if f (a) 2: 0,
and r(a) = { ~ f(a) if f(a) S 0,
if f(a) S 0 if f(a) 2: 0.
Proposition 8.4.2. For any f, g E L 1([a, b]; JR) we have the following:
(iv) If h : JR.n ---+ JR is a measurable function (see Definition 8.3.9), and if lhl E
L 1 ([a, b];JR), then h E L 1 ([a, b];JR) .
(v) If llgllL= ::; M < oo, then Jg E L 1 ([a, b]; JR) and llfgllL1 ::; MllfllL1.
This sequence converges pointwise to the zero function, but J[O,ll fk = 1 for
all k EN, so
lim r
k---+oo J[o,1]
fk = 1 # 0 = r lim fk.
J[o,11 k---+oo
In the rest of this section we discuss the monotone convergence theorem, which
guarantees that if the integrals of a monotone sequence are bounded, then the
sequence must be L 1-Cauchy.
{ fk::;M (8.4)
J[a,bl
for all k E N, then (Jk)~ 0 is L 1 -Cauchy, and hence there exists a function f E
L 1 ([a, b];JR) such that
The same conclusion holds if(fk)'k:o c L 1 ([a, b];IB.) is almost everywhere monotone
decreasing on IB.n and there exists M E IB. such that
{ fk ?_ M.
J[a,b] (8.5)
= ( L- la,b/m) - (L -la,b/e)
< c.
1
Thus (fk)'k=o is L -Cauchy, as required. The monotone decreasing case follows
immediately from the previous result by replacing each fk by - f k and replacing M
by -M . D
Remark 8.4.6. Notice that the sequence in Unexample 8.4.3 is not monotone in-
creasing, so the theorem does not apply to that sequence.
{ f :::;M
}Ek
Jf A
= lim
k-+oo
r
j Ek
f.
338 Chapter 8. Integration I
We write L 1 (A; JR) to denote the set of equivalence classes of integrable functions
on A (modulo equality almost everywhere) .
Nota Bene 8.4.8. Exercise 8.18 hows that L 1 (A; JR) is a normed linear
space with the £ 1 -norm and the very important property that a function
g is integrable on A if and only if lgl is integrable on A.
lo
f
00
e-x dx = lim r
n- HX> lo
e-x dx = lim (1 - e-n) =
n-+oo
1.
f+( x) = {f(x) , x 2 -1 ,
0, x ~ -1 ,
has an integral that is unbounded on the intervals [-n, n]; that is,
r
1(-n,n]
f+=jn(l+ x )/(l+x 2 ) dx
-1
l
. [-n,n]
f = r
} _n
(1 + x)/( l + x 2 ) dx = 2 Arctan(n) ---+ 7r as n---+ oo .
1 [-n,n2]
fdx = Arctan (n 2 ) + Arctan(n) + - log -+-2
1 (1 +
n ) ---+
2 1 n
4
oo as n ---+ oo.
Below we show that the definition of integrable and the value of the integral
on unbounded domains do not depend on the choice of the sequence (Ek)~ 0 of
measurable subsets, that is, they are well defined.
A= LJ Ek = LJ E~ .
k=O k= O
If there exists an M E IR such that for all k we have f ek f :::; M, then we also have
f e,,, f:::;
M for all k, and
lim r
k-+oo} Ek
f = lim
k-+oo} E~
r f.
Proof. Since each Ek is bounded and measurable, there exists a compact interval
[a, b] with Ek C [a, b], and such that liek E L 1 ([a, b];IR). Similarly, for every m we
may choose [a',b'] containing E:n and such that lie;,, E L 1 ([a' , b'];IR). Let [c,d]
be a compact interval containing both [a, b] and [a', b']. By Proposition 8.3.11 we
have that liek and lie;,, are in L 1 ([c, d]; IR), and by Proposition 8.4.2(v) the products
lieklie;,, and lie.lie;,,! are in L 1 ([c, d ];IR) . Therefore, the restriction of liek and
liekf to E:n both lie in L 1 (E:n ; IR) . Trading the roles of E:n and Ek in the previous
argument implies that ]_E'k and ]_E'k f are in L 1 (Em; IR).
If f ek f:::; M for all k, then
1E~
f]_Em = 1 flie~ 1
Em
:::;
Em
f :::; M
r
} E~
f = r lim f]_E-m
} E~ m-+oo
= lim
m-+oo } E~
r f]_Em :::; M. (8.6)
Assume now that L = limk-+oo fek f and L' = limk-+oo f e~ f. Since the sequences
Uek f)'t'=o and Ue~ n~o are nondecreasing, they satisfy f e. f :::; L for all k and
340 Chapter 8. Integration I
JE' f
k
::; L'
for all k. Taking M = L and taking the limit of (8.6) as k ~ oo gives
L' ::; L. Similarly, interchanging t he roles of Ek and E~ and setting M = L'
gives L::; L'. Thus L = L', as required . D
Nota Bene 8.5.1. Being L 1 -Cauchy only guarantees convergence in the space
L 1 ([a, b ];JR). In general an L 1-Cauchy sequence of functions in &&'([a, b ]; JR)
does not converge in t he subspace &:~ ( [a, b]; JR) or in the space of Riemann inte-
grable function . T hat is, t he monotone convergence theorem, Fatou 's lemma,
and the dominat ed convergence t heorem are usually only u eful if we work in
L 1 ([a , b ];JR) .
These always exist (although they may be infinite), even if t he limit does not.
Fatou's lemma tells us when the Jim inf of a sequence (not necessarily con-
vergent) of nonnegative integrable functions is integrable, and it tells us how the
integral of the Jim inf is related to the Jim inf of the integrals. Fatou's lemma is also
the key tool we use to prove the dominated convergence theorem.
Jim inf r
k-+oo J[a,b]
f k < oo,
then
(i) (liminfk-+oofk) E L 1 ([a,b];JR) and
(ii) f[a ,b] Jim infk-+oo fk ::; lim infk-+cxi f[a,b] fk.
8.5. Fatou's Lemma and the Dominated Convergence Theorem 341
Proof. First we show that the infimum of any sequence (!£)~ 1 of almost-
everywhere-nonnegative integrable functions must also be integrable. For each
k E N let 9k be the function defined by gk(t) = min{fi(t), f2(t) , .. . , fk(t)}.
The sequence (gk)k=O is a monotone decreasing sequence of almost-everywhere-
nonnegative functions with limk-+ oo gk = g = infmEN f m· Since every 9k is almost
everywhere nonnegative, we have f (a ,b] 9k 2 0. By the monotone convergence the-
orem (Theorem 8.4.5), we have infmEN fm = limk-+oo9k = g E L 1 ([a, b];JR).
Now for each k E N let hk = infe;:::k f e E L 1 ([a, b]; JR). Each hk is almost ev-
erywhere nonnegative, and the sequence is monotone increasing, with limk-+oo hk =
liminfk-+oo fk < oo. Moreover, for each n E N we have hn :::; fn, so f(a,b] hn <
f[a ,b] f n and , taking limits, we have
r
J[a,b]
hn:::; lim
k-+oo J[a,b]
rhk = liminf r
k-+oo J(a,b]
hk:::; liminf
k-+oo J[a,b]
rfk < 00 .
Remark 8.5.3. The inequality in Fatou's lemma should not be surprising. For
intuition about this, consider the situation where the sequence (fk) consists of only
two nonnegative functions f 0 and Ji . This is depicted in Figure 8.4. In this case
inf(fo , Ji) = min(fo, Ji) :::; Jo and inf(fo , Ji) = min(fo, Ji) :::; Ji,
so by Proposition 8.4.2(i) , we have
r
J(a,b]
inf(fo , Ji) :::; r
J[a,b]
Jo and r
J [a,b]
inf(fo, Ji) :::; r
J[a,b]
Ji ..
Jo
This implies
I 9- f lim fk = f lim hk
l[a,b] l[a,b] k~oo l[a,b] k~oo
= f liminf hk
l(a,b] k~oo
:::; lim supk~oo f hk
l[a,b]
= f g-liminf f fk,
l[a,b] k~oo l[a,b]
8.5. Fatou's Lemma and the Dominated Convergence Theorem 343
Repeating the previous argument with hk = g + fk gives the other direction, which
gives (8.7). D
Remark 8.5.6. The dominated convergence theorem guarantees that we can inter-
change limits if we can find an integrable function that dominates the sequence
almost everywhere. Note that in Unexample 8.4.3 no integrable function dominates
all the terms fk·
Example 8.5.7. For each n EN, let fn: JR.--+ JR. be given by
We wish to evaluate
Proposition 8.5.8. If (fk)~ 0 is any sequence of functions in L 1([a, b]; JR.) such
that
00
2:.:fk=2=
la,b] k=O
00 1
k = O [a,b]
fk.
344 Chapter 8. Integration I
Proof. By Exercise 8.16 we have that for almost every x E [a, b] the series
.Z:::: ~=O lfk(x)j converges, and the resulting function .Z::::~=O
lfkl is integrable. There-
fore, for almost every x E [a, b] the series .Z::::~=O fk(x) converges absolutely, and
hence it converges, by Proposition 5.6 .13.
The partial sums of .Z::::~=O fk are all dominated by the integrable function
.Z::::~=O Ifk j, so by the dominated convergence theorem the series .Z::::~=O f k is
integrable and
L:
00
fk(:r) = L:
1 fk ·
00
D
la,b] k=O [a,b]
k=O
is integrable on X.
8.6. Fubini's Theorem and Leibniz's Integral Rule 345
Remark 8 .6.2 . We often write Ix (fy fx(Y) dy) dx instead of Ix F(x) dx. We call
this an iterated integral.
Not a B ene 8.6 .3. The reader should beware that t he obvious converse of
Fubini is not true. Integrability of fx and of F = fy f x(Y) dy is not sufficient
to guarantee that f is integrable on X x Y. However , with some addit ional
condit ions one can sometimes st ill ded uce t he integrability of f. For more
details see [Cha95, Sect . IV.5.11].
We prove Fubini's theorem in Section 9.4. In the rest of this section we focus
on its implications and how to use it.
Fubini's t heorem allows us to reduce higher-dimensional integrals down to a re-
peated application of one-dimensional integrals, and these can often be computed by
standard techniques, such as the fundamental theorem of calculus (Theorem 6.5.4).
l cos(x)y2 dxdy =
2
for./ (fo
1
cos(x)y 2 dy) dx
r
l xx Y
f (x, y ) dxdy =
JYxX
r f(y, x) dydx.
Corollary 8.6.6 is useful because changing the order of integration can often
simplify a problem substantially.
1_: (1 2
tan(3x
2
- x + 2) sin(y) dx) dy.
1_:(1 2
tan(3x 2 - x + 2) sin(y) dx) dy = 1 2
2
tan(3x -x+2) (/_: sin(y) dy) dx
triangle obtained by cutting the unit square (0, 1) x (0, 1) in half, diagonally.
Interchanging the order of integration, using Corollary 8.6.6, gives
Now for each value of y, we have (x, y) E S if and only if 0 :::; x :::; y, so the
integral becomes
fol lay eY2 dx dy,
which is easily evaluated to f01 yeY 2 dy = ~ (e - 1).
Proof. Fix some Xo E X. Using the second form of the fundamental theorem of
calculus (6.16) and Fubini's theorem, we have
we have
'ljJ(x) - 'ljJ(xo) r g(z) dz .
lxo
= (8.9)
Using the first form of the fundamental theorem of calculus (6.15) to differentiate
(8.9) with respect to x gives
d
dx 'ljJ(x) = g(x). D
Example 8.6.10. Consider the problem of finding the derivative of the map
J1
F(x) = 0 (x 2 + t) 3 dt. We can solve this without Leibniz's rule by first using
standard antidifferentiation techniques:
1
F(x) = .fo (x
2
+ t) 3 dt
= (x2+t)411
4 0
(x2 + 1)4 x8
4 4
F'( x) =lor1 ox
8
(x2 + t)3 dt
1
= fo 2 2
3(x + t) (2x) dt
= (x 2 + t) 3 (2x)I~
= 2x(x 2 + 1) 3 - 2x 7 .
Remark 8.6.11. In the previous example it is not hard to verify that the two
answers agree, but in many situations it is much easier to differentiate under the
integral sign than it is to integrate first and differentiate afterward.
We can use Theorem 8.6.9 and the chain rule to prove a more general re-
sult. This generalized Leibniz formula. is important in the proof of Green's theorem
(Theorem 10.5.15).
Corollary 8.6.12. Let X and A be open intervals in IR, and let f: X x A --+ IR
be continuous with continuous partial derivative ¥x
at each point of X x A . If
a, b : X --+ A are differentiable functions and 'lj.;(x) = J:g;
f(x, t) dt, then 'lj.;(x) is
differentiable and
d lb(x) of(x t)
-d 'lf.;(x) =
0X
' dt - a'(x)f(x, a(x)) + b'(x)f(x, b(x)). (8.10)
X a(x)
d lcos(x)
F'(x) = -d Arctan(x+t)dt
X sin(x)
-- lcos( x) l .
+ (x + t) 2 dt - cos(x) Arctan(x + sm(x) )
sin (x) 1
- sin(x) Arctan(x + cos(x) )
= (1 - sin(x)) Arctan(x + cos(x)) - (1 + cos(x) ) Arctan(x + sin(x)).
8.7. Change of Variables 349
b(x)
G'( x) =
la(x)
rg'(rx +st) dt - a'(x)g(rx + sa(x))
b(x)
+ b'(x)g(rx + sb(x))
Jc
d f(g(s))g'(s) ds = [9(d)
}g(c)
f(u) du,
8.7.1 Diffeomorphisms
The type of function we use for a change of variables is called a diffeomorphism.
Definition 8. 7.1. Let U and V be open subsets of JRn. We say that W : U---+ V is
a diffeomorphism if w is a C 1 bijection such that w- 1 is also C 1 .
Example 8. 7 .3.
D x _ [-sin(x) 0 ]
I( 'y) - 0 cos(y)
has a nonzero determinant for all (x,y) E (0, 7r/2) x (0, 7r/2), so the
inverse function theorem guarantees that 1- 1 is C 1 .
Unexample 8.7.4.
(i) The map g: (-1,1) 1--+ (-1, 1) given by g(x) = x 3 is not a diffeomor-
phism. Although it is bijective with inverse g- 1 (y) = y 113 , the inverse
function is not C 1 at o.
(ii) Let Ube the punctured plane U = JR 2 "°' {O}, and let h: (0, oo) x JR-+ U
be given by h(r,t) = (rcos(t) , rsin(t)). It is straightforward to check
that h is surjective. The derivative is
Dh = [c?s(t) -r sin(t)]
sm(t) r cos(t) '
g(c) lg(d)
1
g([c,d])
j =
l g(d)
f = -
g(c)
j(T)dT
g(d) id
1 g([c,d])
f =
1g(c)
f(T) dT =
c
f(g(s))g'(s) ds = {
J[c,d]
f(g(s))lg'(s)I ds.
f f = f (fog)lg'I . (8.11)
~(~,~) J~,~
This is essentially the form of the change of variables formula in higher dimensions.
The main theorem of this section is the following.
Remark 8. 7.6. In the special case that 1J! : U -t Vis a linear transformation, then
DiJ! = 1J! and the change of variables formula says that Jy f = Idet (1J!) I Jx (f o 1J!).
An especially important special case of this is when Xis an interval [a, b] C !Rn
and f = l. In this case (8.12) says that the volume of Y = iJ!([a, b]) is exactly
the determinant of 1J! times the volume of [a, b]. This should not be surprising-
the singular value decomposition says that 1J! can be written as UEVT , where U
and V are orthonormal. Orthonormal matrices are products of rigid rotations and
reflections , so they should not change the volume at all, and E is diagonal, so it
scales the ith standard basis vector by !Ji· This changes the volume by the product
of these !Ji-that is, by the determinant of E, which is the absolute value of the
determinant of 1J!.
We prove the change of variables theorem in Section 9.5. For the rest of this
section we discuss some implications and examples.
352 Chapter 8. Integration I
Example 8.7.7. Usually either the geometry of the region or the structure
of the integrand gives a hint about which diffeomorphism to use for change of
variables. Consider the integral
where R is the trapezoid region with vertices (1, 0) , (3, 0), (0, 3), and (0, 1).
Without a change of variables, this is not so easy to compute. But the presence
of t he terms x + y and x - y in the integrand and the fact that two of the sides
of t he trapezoid are segments of the lines x + y = 1 and x + y = 3 suggest the
change of variables u = y + x and v = y - x. Writing x and y in terms of u
and v gives -W(u, v) = (~(u - v), ~(u + v)) with
1
D-W = [1/2
2.
-1/2] and Idet(Dw)I =
1/2 1/2
~ f lu
1
3
-u
sin ( ~)
u
dv du = 0.
Jl f(x, y) dx dy = l l
f = (f o w)r = Jl f(r cos(B), r sin(B))r dr dB. (8.13)
Moreover, we can extend this relation to [O, 27r] x [O, oo) because the rays defined by
e= e
0 and = 27f have measure zero and hence contribute nothing to the integral.
8.7. Change of Variables 353
Example 8.7.9. Let A be the region {(r, B) I 0 :::; B :::; 7r/ 3, 0 :::; r :::;
J sin(3B)} , and let B = w(A) be the corresponding region in rectangular
coordinate , as in Figure 8.5.
The area of B is
r
1B 1 =
! A r dr dB =
r / 3 r.Jsin(3e)
J0 l 0 r dr dB
r /3 i 1
= lo sin(3B) dB = .
2 3
2
Example 8.7.10. The integral I= f~co e- x dx plays an important role in
probability and statistics, but it is not easy to integrate using traditional one-
variable techniques. We can convert it to a two-dimensional integral that is
easy to comput e using polar coordinates, as follows:
12 = ({co
.J _oo
e -x2 dx) (lco-oo
e_Y2 dy) = lco lco
-oo - oo
e-x2 - y2 dxdy
= { e - (x
2
+Y 2 )dA = f
2
7r {co e- r\drdB
l JR2 lo lo
r27r i
=.10 2dB = 7r.
Thus I= .jTi.
where E is the region bounded by the ellipse 16x 2 + 4y 2 = 64. Since cir-
cles are generally easier to work with than ellipses, it is natural to make the
substitution W1 (u, v) = (u, 2v), so that
and Idet(Dw1)I = 2.
Definition 8.7.12. Let U = (0,27r) x (O,n) x (O,oo), and define spherical coor-
dinates S: U---+ JR 3 by S(B,¢,r) = (rsin(¢)cos(B),rsin(¢)sin(B),rcos(¢)).
We have
-r sin(¢) sin( e) rcos(¢) cos(B) sin(¢) cos( B)l
DS= rsin(¢bcos(B) r cos(¢) sin( B) sin(¢) sin(B)
[ - rsin(¢) cos(¢)
z
(B,¢,r)
and
Idet(DS)I = r 2 sin(¢) .
It is straightforward to check that S is C 1 and bijective onto S(U) = V, and
hence has an inverse. Since det(DS) never vanishes, the inverse function theorem
guarantees that the inverse is C 1 , and thus S is a diffeomorphism. The change of
variables formula (8.12) gives
Example 8. 7.13. Consider the region D = [O, 271"] x [O, 71" /6] x [O, R], which
when mapped by S gives an ice-cream-cone-shaped solid C C JR 3 as in
Figure 8.7. As with polar coordinates, spherical coordinates are not bijective if
we include the boundary, but the boundary has measure zero, so it contributes
nothing to the integral.
Using (8.14) we see the volume of C is given by
6 2 3
{ 1= {R (" / f r. r 2 sin(¢)d()d<f;dr = ( 2 - J3)71"R
.fc .fo .fo .fo 3
As with polar and spherical coordinates, one can check that '11 is bijective to its
image, and since det(Dw) does not vanish on its domain, the inverse function
theorem guarantees that the inverse is C 1' so w is a diffeomorphism to its image.
We use hyperspherical coordinates to find the volume of the unit ball in !Rn (see
Exercise 8.36).
Exercises
Note to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
first and second line are for Section 1, the exercises between the second and third
lines are for Section 2, and so forth .
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *). We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with &. are especially important and are likely to be used later
in this book and beyond. Those marked with t are harder than average, but should
still be done.
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
8.1. Give an example of an interval [a, b] C IR and a function f: [a, b] --+IR that is
not in ~([a, b]; IR), but whose absolute value Iii is in ~([a, b]; IR) .
8.2. Prove Proposition 8.1.8.
Exercises 357
8.15. For any b >a> 0, find (with proof) the value of limn--+oo J: log(x)e-nx dx.
8.16. &. Prove that if (fk)k=O is any sequence of nonnegative functions in
L 1 ([a, b];JR) such that
f: r
k=O.![a,b]
fk < oo,
r f 1k fk=O r
.J[a,b] k=O
=
./[a,b]
fk .
lim r
k--+oo ./[a,b]
fk = o.
358 Chapter 8. Integration I
8.21. Prove the reverse Fatou lemma : If (fk)'Go is a sequence in L 1 ([a, h]; JR) and
if there exists some nonnegative g E L 1 ([a, b]; JR) with fk ~ g a.e. for all
k EN, then
limsup
k-+oo J[a,b]
fk ~ r limsup fk·
J[a,b] k-+oo
r
8.22. Prove that in Fatou's lemma the condition that each fk be nonnegative can be
replaced with the condition that there exists some function g E L 1 ([a, b]; JR)
such that for all k E N we have f k 2: g a.e.
8.23. Let
g(t) -
-110
e-t2(Hy2)
1
+y
2 dy .
lb Iim fn(x)dx
a n-+oo
=0< n/2 ~ n-+oo
lim lb
a
fn(x)dx
when a~ 0.
Exercises 359
1 0
e- tx dt = - - - -
x
(8.15)
(iii) Use the results of Exercise 6.26 to show for any x > 0 and for any n E N
roo tn e -tx dt = __EJ_
th at Jo x n+ l.
(iv) Evaluate at x = 1 to conclude that 10 tne-t dt = n!.
00
8.32. Let D c JR 2 be the square with vertices (2, 2), (3, 3) , (2, 4), (1, 3). Compute
the integral
fv 1n (y 2 - x 2 ) dxdy.
360 Chapter 8. Integration I
8.35 . Let Q be the open first quadrant Q = {(x, y) I x > 0, y > O}, and let
H be the upper half plane H = {(s, t) I t > O}. Define <I> : Q -+ H by
<I>(x,y) = (x 2 -y 2,xy) for (x,y) in Q. For a point (x,y) in Q, the pair of
numbers (x 2 - y 2,xy) = <I>(x,y) are called hyperbolic coordinates for (x,y).
(i) Show that <I> : Q-+ IR 2 is a diffeomorphism.
(ii) Now define D = {(x, y)Jx > 0, y > 0, 1:::; x 2 -y 2 :::; 9, 2:::; xy:::; 4}. Use
hyperbolic coordinates to show that
l (x
2
+ y 2)dxdy = 8.
8.36. Let R denote the box [O, 7r] x · · · x [O, 7r] x [O, 27r).
(i) Use hyperspherical coordinates to express the volume of the unit n-ball
in terms of the integral I== JR
sin(¢1) sin 2(¢2) ... sinn- 2(¢n- 2).
2
(ii) Use hyperspherical coordinates to express the integral f'R.n e-llxll dx in
terms of I and the Gamma function f(x) = f 0 tx- 1 e-t dt.
00
(iii) Use these results, combined with Example 8.7.10, to give a formula for
the volume of the unit n-ball.
(iv) Combine this with Exercise 8.28 to give a formula for the volume of
any n-ball ofradius r. (Alt ernatively, you could slightly generalize your
computation in (i) to achieve the same result .)
Notes
Much of our treatment of integration in this chapter and the next is inspired by Soo
Bong Chae's beautiful book [Cha95], which develops Lebesgue integration using an
approach due to Riesz. Another source for the Riesz approach is [Soh14]. The Riesz
approach has some significant similarities to our approach, but at heart they are
still very different ways of looking at integration. Sources on the Daniell integral
include [BM14], [AB66], and [Roy63] (the first edition of [RF10]). The Bochner
integral is described in [Mik14] . Sources for a more standard approach to Lebesgue
integration and measure theory include [Bre08, Jon93, RFlO, Rud87] .
Exercise 8.30 comes from Keith Conrad's "blurb" on differentiation under
the integral sign [Con16]. Exercise 8.31 comes from Luc Rey-Bellet [Rey06J.
Exercise 8.35 comes from Fitzpatrick [Fit06].
*Integration II
In this chapter we give the details and remaining proofs of the development of the
Daniell-Lebesgue integral, as outlined in the previous chapter.
Proposition 9 .1.1. If Vis a vector space and I ·II is a seminorm on V , then the set
K = {v EV I l vll = O} forms a vector subspace ofV. Moreover, II · II : V / K ~JR,
defined by the rule l v + Kii = l vll , forms a norm on the quotient space V/K .
Theorem 9.1.2. For any normed linear space (X, II · II ), there exists a Banach
space (X, II · II x) and an injective linear map ¢ : X ~ X such that for every x E X
361
362 Chapter 9. *Integration II
we have 11¢(x)llx = llxll (we call such a map an isometric embedding) such that
¢(X) is dense in X. Moreover, this embedding is unique, in the sense that if X is
another Banach space with an isometric embedding 'ljJ : X ---+ X such that 'l/;(X)
is dense in X, then there exists a unique isomorphism of Banach spaces g: X ---+ X
such that go 'ljJ = ¢.
Remark 9.1.3. This theorem also holds for general metric spaces, that is, every
metric space can be embedded as a dense subset in a complete metric space; but
we need the additional linear structure in all of our applications, and our proofs are
simplified by assuming X is a normed vector space.
Proof. Let X' be the set of all Cauchy sequences in X. The space X maps
injectively into X' by sending x E X to the constant sequence (x)k'=o· For any o:, /3 E
IF and any two Cauchy sequences (xk)k=O and (Yk)~ 0 , let o:(xk)~ 0 + /3(yk)~ 0 be
defined to be the sequence (o:xk + f3yk)~ 0 . It is straightforward to check that
this is again a Cauchy sequence, that X' is a vector space, and that X is a vector
subspace. ·
For each Cauchy sequence (xk)k=O' the sequence of norms (llxkll)k'=o is a
Cauchy sequence in JR, since for any E > 0 there is an N such that
whenever n,m > N. Thus, (llxkll)~, 0 has a limit. Define ll(xk)k=ollx = limk-+oo
1
llxk I · Again, it is straightforward to check that this is a seminorm, but there are
many sequences (xk)k=O such that I (xk)~ 0 ll = 0 but (xk)~ 0 -=/:- (O)k=l (the zero
element of X'), so it is not a norm; see Exercise 9.2.
Let KC X' be the set of Cauchy sequences (xk)k°=o such that ll(xk)k°=ollx = 1
Vista 9.1.4. A similar argument can be used to construct the Banach space
IR as the completion of the metric space IQ. Beware, however , that this method
of completing IQ does not fit into the hypotheses of Theorem 9.1.2, because IQ
is not a vector space over IR or <C. The proof of the theorem above also uses
the fact t hat JR is complete in order to construct the seminorm on X, so to
complete IQ in t his way needs some additional steps.
Thus, the integral and the L 1 -norm both commute with limits of L 1-Cauchy
sequences.
Definition 9.2.1. A set EC [a, b] C JR.n has measure zero if there is an L 1 -Cauchy
sequence (sk)~ 0 c S([a, b]; JR) of step functions such that limk-+= lsk(Y)I = oo for
each y EE.
For the rest of this section, when we say measure zero, we mean it in the sense
of Definition 9.2.1 , unless otherwise specified. Near the end of this section we prove
that the two definitions are equivalent. Unless otherwise specified, we work on a
fixed compact interval [a, b] C JR.n, so all functions are assumed to be defined on
[a, b] and all integration is done on [a, b].
For each t, the sequence (Bk ( t) )k'=o is nondecreasing, so the only way it can fail
to converge is if sk(t) -+ oo. Let E be the set of points t where sk(t) diverges .
Using (sk)k=O itself as the L 1-Cauchy sequence in Definit ion 9. 2.1 shows that E has
measure zero, and hence sk -+ f a.e.
The case where (sk)~ 0 is monotone decreasing follows from a similar argu-
ment; see Exercise 9.7. D
Proof. For each n E N let Fn be the partial sum Fn = L~=O Sk, and let Tn =
L~=O Jskl· Each Tn is a nonnegative step function, and the sequence (Tn)~=O is
monotone increasing. For each n we have
9.2. More about Measure Zero 365
Thus, the monotone increasing sequence (ll TnllL' );;::'= 0 converges to some value M.
Given c > 0, choose N > 0 such that 0 < (M - llTnllL') < c for all n > N . We
have
llTn - Tm ll L' = r
J[a,b]
ITn - Tml
= { (Tn - Tm)
J[a,b]
= llTn ll L' - ll Tm ll L1
= (M - llTmllL1) - (M - ll Tn ll L')
<c
whenever N < m < n. Therefore, (Tn);;::'=o is L 1-Cauchy. By Lemma 9.2.2 it
converges a.e. , and thus (Fn);;::'=o converges a.e., since pointwise absolute conver-
gence implies pointwise convergence. The desired function F is given by setting
F(t) = L:%°=o sk(t) when the sum converges and F(t) = 0 when it does not
converge. D
Proof. For each £ E N let ke be chosen such that llsn - smllL' < 2-e for all
m, n ;:::: ke. Let go = Sko and for each integer e > 0 let ge = Skt - Skt-1. This gives
llgell L' < 2 1- e for all£> 0. We have
00 00
By Lemma 9. 2.3 the sequence of partial sums SkN = I::!o gk converges to a function
f almost everywhere.
Let </>m = I:;;:,
0 gt and 'l/Jm = I:;;:,
0 g£ , so Skm. = </>m - 'l/Jm · For each £ E N
we have gt ;:::: 0 and g£ ;:::: 0, so the sequences (</>m)~= O and ('l/Jm)~=O are both
monotone increasing. They are also L 1 -Cauchy because for any c > 0
m m
11</>m - </>nllL' = L llgt ll L':::; L ll gt + g£ ll L'
£=n+l
m m
= L ll gell L' < L 21-e < 21- n < c
£=n+l
Proof.
(i)===?(ii) Let (sk)~ 0 be an L 1-Cauchy sequence of step functions with sk(t)--+ oo
for every t E E. By Proposition 9.2.4 there is a pair of monotone increasing,
L 1-Cauchy sequences (¢e)b 1 and ('l/Je)b 1 such that (¢e - 'l/Je)b 1 is a sub-
sequence of (sk)~ 0 . Since sk(t) --+ oo for every t E E, we must also have
¢k(t)--+ oo for every t EE.
m$ke$m m$k
Moreover, the sequence (sk)k'=o is monotone increasing, and hence (by essen-
tially the same argument as given in the proof of Lemma 9.2.3) the sequence
(sk)~ 0 is L 1-Cauchy. But for any t EE, we have
Sk(t) = 0
:L :L 1--+ 00. D
m$ke$m
Corollary 9.2. 7. The boundary of any interval (and hence the set of discontinuous
points of any step function) has measure zero .
Proof. The boundary of any interval in !Rn is contained in a finite union of sets of
the form {a} x [c, d], where [c, d] C JRn- 1 , and where a E JR is a single point (hence
of measure zero). D
Definition 9.3.1. Fix a compact n -interval [a, b] C ]Rn. We say that a function
f : [a, b] ---+ JR is Lebesgue integrable or just integrable on [a, b] if there exists an
L 1 -Cauchy sequence of step functions (sk)k:, 0 such that Sk ---+ f a.e . We denote the
vector space of integrable functions on [a, b] by 2 (for this section only) .
The results of Section 9.2.2 show that for any L 1 -Cauchy sequence (sk)k=O of
step functions, we can construct a function f such that some subsequence converges
almost everywhere to f. We use this to define a map qi : L 1 ([a, b];JR)---+ 2/2o by
sending any L 1-Cauchy sequence (sk)k=Oof step functions to the integrable function
f guaranteed to exist by Proposition 9.2.4. But it is not yet clear that this map is
well defined. To see this, we must show that the equivalence class f +20 is uniquely
determined.
Proposition 9.3.2. If (sk)k=O and (sk)k:, 0 are L 1 -Cauchy sequences of step func -
tions that are equivalent in L 1 ([a, b]; JR), and if (sk)k=O ---+ f a.e. and (sk)k'=o ---+
g a.e., then f = g a.e.
368 Chapter 9. *Integration II
This proposition shows that the function <I>: L 1 ([a,b];JR)---+ .!L'/.!/0 is well
defined, because if (sk)~ 0 is any £ 1 -Cauchy sequence with a subsequence (skt )b 1
converging almost everywhere to f (so that we should have <I>((sk)) = f) , and if
(sk)k=O is an equivalent £ 1-Cauchy sequence with a subsequence (sj,J;;'= 1 con-
verging almost everywhere to g (so that we should have <I>((sk)) = g), then the
subsequences are equivalent in £ 1 ([a, b]; JR), so f = g a.e.
The map <I> is surjective by the definition of.£', and it is straightforward to
check that <I> is also linear. We now show that <I> is injective. To do this, we must
show that if f = g a .e., and if (sk)k=O and (s~)~ 0 are two £ 1 -Cauchy sequences of
step functions such that Sk ---+ f a .e. ands~ ---+ g a.e., then (sk)k=O and (sk)~ 0 are
equivalent in L 1 ([a, b]; JR). To do this we first need two lemmata.
Since (sk)k=O is monotone decreasing, if sm, (t') < /, then sc(t') < 'Y for every
£ 2: M . Therefore, we have 0 :::; sc(t') < / for all t' in LJ{= 1 Ut,.
Putting this all together gives
llscllu = 1[a,b]
sc <BL
k=l
N
Lemma 9.3.4. Let (¢k)k'=o and ('lf!k)k=O be monotone increasing sequences of step
functions such that cPk ---+ f a.e. and 'lf!k ---+ g a.e. If f:::; g a.e., then
lim r
k--t= .f[a,b]
cPk :::; lim r
k--t= .f[a,b]
'lf!k .
Proof. For each m E N the sequence (¢m - 'lf!k)k=O is monotone decreasing, and
lim <Pm - 'lf!k = <Pm - lim 'lf!k = <Pm - g :::; f - g :::; 0 a.e.
k--tCXJ k--tCXJ
Thus, the sequence ((¢m - 'lf!k)+)k=O is nonnegative and monotone decreasing and
must converge to zero almost everywhere.
By Lemma 9.3.3 we have
which gives
r
.f[a,b]
<Pm :::; lim (
k--t=
r
J[a,b]
(<Pm - 'lf!k) + + r
J[a,b]
'lf!k) = lim r
k--t= J[a,b]
'lf!k·
Theorem 9.3.5. Let (sk)k'=o and (sk)k'=o be L1 -Cauchy sequences of step functions
(not necessarily monotone) such that Sk ---+ f a.e. ands~ ---+ g a.e. If f :::; g a.e .,
then
lim r
k--t= J[a,b]
s k < lim r
- k--t= J[a,b]
s~.
370 Chapter 9. *Integration II
The sequences (¢k + f3k)k=O and (O!k + 7/Jk)k=O are monotone increasing, so by
Lemma 9.3.4 we have
lim r
k--+oo J[a,b]
(¢k+f3k):s; lim
k--+oo J[a,b]
r
(o:k+7/Jk) ·
Since integration on step functions is linear, and addition (and subtraction) are
continuous, we have
lim r
k--+oo J[a,b]
(¢k - 7/Jk):::; lim
k--+oo J[a,b]
r
(o:k - f3k )·
lim r
k--+oo J[a,b]
Sk = lim r
k--+oo J[a,b]
(¢k - 7/Jk) :::; lim
k--+oo J[a,b]
r(o:k - f3k) = lim
k--+oo J[a,b]
r s~. D
Proposition 9.3.6. If f g a.e ., and if (sk)'Go and (sk)'G 0 are two L 1 -Cauchy
=
sequences of step functions such that Sk -+ f a.e. ands~ -+ g a.e., then (sk)k=O
and (sk)f=o are equivalent in L 1 ([a, b]; JR) .
Proof. Proposition 9.1.7 shows that the sequences (iskl)'G 0 and (iski)'G 0 are L 1 -
Cauchy, with !ski-+ Iii a.e. and ls~I -+ lgl a.e. By Theorem 9.3.5 we have
Substituting (sk - sk) for sk and 0 for s~ in (9.2) gives the desired result. D
We have shown that cl> is an isomorphism of vector spaces. We may use cl> to
define the L 1 -norm on 2 /2o by llfllv = ll(sk)'Gollv = limk--+oo llskllv whenever
f = <I>((sk)'G0). Similarly, we may define the integral f[a ,b] f = limk--+oo f[a,b] Sk
whenever f = <I>((sk)'G0).
From now on we usually describe elements of L 1 ([a, b]; IR) as integrable func-
tions (or equivalence classes of integrable functions) rather than as equivalence
classes of L 1 -Cauchy sequences of step functions- we have showed that the two
formulations are equivalent, but functions are usually more natural to work with.
9.3 . Lebesgue-Integrable Functions 371
Proposition 8.4.2. For any f, g E L 1 ([a, b]; IR) we have the following:
(i) If f::; g a.e., then fra,b] f::; fra,b] g.
(iii) The functions max(!, g), min(!, g), J+, 1- , and If I are all integrable.
(iv) If h : !Rn -t IR is a measurable function (see Definition 8.3.9 ), and if lhl E
L 1 ([a, b];IR) , then h E L 1 ([a, b];IR).
(v) If llgllL= ::; M < oo, then Jg E L 1 ([a, b]; IR) and llfgllu ::; Mllfllu.
lhl if Sk 2: lhl,
<Pk = mid(-lh l, Sk, lhl) = max(-lhl, min(sk, lhl)) = Sk if - lhl ::; Sk '.S lhl,
{
-lhl if Sk ::; - lhl.
Proposition 9.3.7. Let A c !Rn be a measurable set. For any c E !Rn and any
f E L 1 (A;IR), let Jc be the function on A - c = {t E !Rn I t + c EA} given by
fc(t) = f(t+c). WehavefcE L 1 (A-c;IR) and
lA-c
r fc = r f.
jA
is integrable on X.
The first step in the proof of Fubini's theorem is to check that it holds for step
functions .
r
J[a,b] x [c,d]
s(x,y)dxdy= r
J[a,b]
S(x)dx= r ((
J[a,b] J[c,d]
Sx(y)dy)dx.
each k E N and each x E X, let <I>k(x) = fy ¢k,x(Y) dy, where ¢k,x(Y) = ¢k(x, y).
Because of Fubini's theorem for step functions (Proposition 9.4.1) we have
JXxY
= lim { ¢k(x,y) dxdy
k--too
= { f (x, y) dxdy.
lxxY
Therefore, the sequence Ux
<l>k(x) dx)C:,:=0 is bounded, and by the monotone con-
vergence theorem (Theorem 8.4.5) we have <l>k ---+ <I> a .e. for some <I> E L 1 (X; JR)
with
{ <I>(x) dx = { f(x, y) dx dy.
Jx lxxY
We must now show that cl> = F a.e. and that f x is integrable for almost all
x E X. Let E be the measure-zero subset of X x Y where (¢k(x, y))'k=o fails to
converge. By Lemma 9.4. 2 the set Ex = {y E Y I (x , y) E E} has measure zero
for almost all x E X. For any x E X such that Ex has measure zero and such
that <I>k(x) ---+ <I>(x) we have that <I>k(x) = fy <!>k,x(Y) dy converges, and so by the
monotone convergence theorem fx = limk--too ¢k,x a.e. is integrable on Y , and
Definition 9.5.1. For each k E N let Qk C lRn be the set of points a E lRn whose
coordinates ai are all rational of the form ai = c/2k for c E Z. Let ek be the set of
compact intervals {n -cubes} of the form [a, a+ (2-k, ... , 2-k)], where a E Qk.
Corollary 9 .5.3. Every interval I C lRn, whether open, partially open, or closed,
is measurable. If A, B C lRn are open with I C B, and if f: A -7 B is continuous,
then f - 1 (I) is also measurable.
We are now ready to start the proof of the change of variables formula
(Theorem 8.7.5), which we restate here for the reader's convenience.
as required. D
Proof. Here we prove the first case. The second is similar. We proceed by induction
on n. The base case of n = 1 follows from Lemma 9.5.4.
Given
X = [a, b] = [a1, b1] x · · · x [ai, bi] x · · · x [an, bnJ,
=
lsr (Jr'lit(W) f(t, z) dz) dt, (9.5)
where the last equality follows from the induction hypothesis. Since X is compact,
there exists a compact interval Z C R_n-l such that W(X) CS x Z. Thus, we have
=Is (l Ilw(x)(t,z)f(t,z)dz) dt
= [ f. (9.6)
Lemma 9.5. 7. Assume that U, V, W C Rn are open and W : U ---+ V and <I> : V ---+
W are diffeomorphisms . Let X C U be measurable, with Y = W(X) and Z = <I>(Y).
If the change of variables formula with respect to W holds for all g E L 1 (Y; R) and
the change of variables formula with -respect to <I> holds for some f E L 1 (Z; R), then
the formula holds with respect to <I> o 1¥ for f.
9.5. Proof of the Change of Variables Theorem 377
Proof. First we show that there is an open neighborhood W of x where the diffeo-
morphism W is a composition of diffeomorphisms of the form W(t) = W(t 1, ... , tn) =
(ti, '1'2(t) , ... , Wn(t)) for some i, or of the form w(t) = ('11 1, ... , Wn_ 1(t), ti) · For
each i let C1,i be the (1, i) cofactor of DW(x), as given in Definition 2.9 .11. Using
the cofactor expansion in the first row (Theorem 2.9.16), we have
n OW
0 =!= det(Dw) = L ox 1 (x)C1,j,
j=l J
so there must be some i such that C 1,i =/= 0. Choose one such i and define'¢ : U---+ !Rn
by '¢(t) = (ti, '112, ... , Wn)· Note that Idet(D'lj;(x))I = IC1,il =/= 0, so by the inverse
function theorem (Theorem 7.4.8), there exists an open neighborhood W' c U of x
such that'¢ has a C 1 inverse on '¢(W') c V. In particular, '¢ is a diffeomorphism
on W' of the required form.
Let <I> : '¢(W') ---+ V be the diffeomorphism <I>= W o '¢- 1, so that W = <I> o '¢.
Letting z = (z1, .. . , Zn)= '¢(t1, . .. , tn) =(ti , W2(t) , . .. , Wn(t)) , we have
Now we can prove the full version of the change of variables formula.
Proof of Theorem 8. 7.5. It suffices to prove the theorem for the case of X =U
and f E L 1(V;IR). For each£ E z+ let
Exercises
Note to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
first and second line are for Section 1, the exercises between the second and third
lines are for Section 2, and so forth .
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *) . We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with & are especially important and are likely to be used later
in this book and beyond. Those marked with t are harder than average, but should
still be done.
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
e
9.5. For each E N let Qe C Rn be the lattice of points with coordinates all
lying in 2-ez. Show that every element of L 1 ([a, b]; X) is equivalent to an
L 1 -Cauchy sequence (sk)k=O of step functions where all the corners of all the
intervals involved in the step function are points of Qe for some€; that is, for
each k E N there exists an ek E N such that each interval Rr appearing in
the sum sk = LIE.9xrliR1 is of the form Rr = [ar, hr] with ar, br E Qek·
9.6. Define a generalized step function to be a function f of the form f = "L::i ciliR,
such that each Ri is any interval (possibly all open, all closed, all partially
open, or any combination of these).
Prove that in Definition 9.2.1 we may use generalized step functions with all
open intervals. That is, prove that a set E has measure zero in the sense
of Definition 9.2.l if and only if there is an L 1 -Cauchy sequence (fk)k=O of
generalized step functions of the form fk = "L::\ Ck ,i]. Rk,; such that each
Rk,i is an open interval and lfk(t)I --+ oo for every t EE. What changes if
we use generalized step functions with all closed intervals?
9.7. Prove Lemma 9.2.2 for a monotone decreasing sequence of functions .
9.8. Prove Proposition 9.2.6.
9.9. Prove that if I c Rn is an interval and f E !ft(!; R), then the graph r f =
{(x, f(x)) CI x R} C Rn+l has measure zero.
9.10. Find a sequence (fk)'f:: 0 of functions in f!/t([O, l]; R) such that fk--+ 0 a.e. but
1
limk--+oo f 0 fk(t) dt-/:- 0.
9.11. Let f be continuous on [a, b] C R Describe in detail how to construct a
monotone increasing sequence (sk)k=O of step functions such that Sk--+ f a.e.
and limk--+oo Jia,b] Sk < oo.
9.12. Prove Proposition 8.4.2(iii).
9.13. Give an example of a function on R that is measurable but is not integrable.
9.14. Let f and g be measurable. Prove the following:
9.19. Suppose that X C JR.n and Y C ]Rm and both f: X ---+ JR and g: Y ---+ JR
are integrable. Prove that the function h: Xx Y---+ R, given by h(x,y) =
f(x)g(y), is integrable and
r h-rlx 1 }yr 9
lxxY
9.20. Prove Proposition 8.6.5 .
Notes
As mentioned in the notes at the end of the previous chapter, much of our devel-
opment of integration is inspired by [Cha95]. Other references are given at the end
of the previous chapter. The proof of Proposition 9.2.5 is modeled after [Cha95,
Sect. II.2 .4], and the proof of Proposition 9.3.3 is from [Cha95, Sect. II.2.3] . The
proof of Proposition 9.3.4 is from [Cha95, Sect. II.3.3]. The proof of the change of
variables formula is based on [Mun91, Sect. 19] and [Dri04, Sect. 20.2].
Calculus on Manifolds
381
382 Chapter 10. Calculus on Manifolds
Remark 10.1.2. With some work it is possible to show the results of this chapter
also hold for curves O" : I ---+ X on a closed interval I that are not differentiable at
the endpoints of I, but that are continuous on I and 0 1 on the interior of I.
Unexample 10.1.3.
We define the tangent to the curve O" at time t 1 to be the vector 0" 1 (ti). If a
curve in !Rn is thought of as the trajectory of a particle, then 0" 1 (ti) is its velocity
at t ime ti . The line in X that is tangent to the curve at time t1 is defined by the
parametrization L(t) = fo'(ti) + O"(ti).
Definition 10.1.4. Two smooth parametrized curves 0" 1 : I ---+ X and 0"2 : J ---+ X
are equivalent if there exists a bijective 0 1 map¢: I---+ J, such that ¢'(t) > 0 for
all t E I and 0"2 o ¢ = 0"1. In this case, we say that 0"2 is a reparametrization of 0" 1 .
Each equivalence class of parametrizations is called a smooth, oriented curve .
If we replace the condition that ¢' (t) > 0 by the condition that ¢' (t) -=f. 0 for all
t, we get a larger equivalence class that includes orientation-reversing reparametriza-
tions. Each of these larger equivalence classes is called a smooth, unoriented curve
or just a smooth curve.
Remark 10.1.5. The tangent vector O"'(ti) to the curve O" at the point O"(t 1 ) is not
independent of parametrization. A reparametrization ¢ : I ---+ J scales the tangent
vector by ¢'. However, we can define the unit tangent T to be
and the unit tangent at each point depends only on the orientation of the curve.
See Figure 10.l.
Definition 10.1.6. A finite collection of smooth parametrized curves 0" 1 : [a 1 , b1] ---+
X, ... , O"k : [ak, bk] ---+ X is called a piecewise-smooth parametrized curve if we have
O"i(bi) = O"i+1(aH1) for each i E {1,. .. ,k - 1}. Such a curve is often denoted
0"1 + ... + O"k.
10.1. Curves and Arclength 383
CJ(t) + T(t)
CJ(a)
/ T (t)
Figure 10.1. A smooth parametrized curve CJ with starting point CJ(a) and
ending point CJ( b), as given in Definition 10.1.1. For each t E (a, b), the vector T( t)
is the unit tangent to the curve at time t . See Remark 10.1 .5 for the definition
of T(t).
10.1.2 Arclength
A rclength measures the length of an oriented curve in a way that is independent
of parametrization. It is essential for understanding the geometry of a curve. It
should seem intuitive that arclength would be the integral of speed, where speed is
the norm 11 CJ 1 ( u) 11 of the velocity.
2
la 2
7r llCl' (u) II du = la 7r r ll( - sin( u) , cos( u)) I du = 27rr,
which agrees with the classical definition of arclength of the circle.
Example 10.1.11. The graph ofa function f E 0 1 ([a, b]; JR) defines a smooth
parametrized curve Cl : [a, b] -+ JR~ 2 by Cl(t) = (t, f(t)). We have Cl 1 (t) =
(1, f'(t)) so that
Proof. Since ¢'(t) i= 0 for all t E [c, d] the intermediate value theorem guarantees
that either ¢'(t) > 0 for all tor ¢'(t) < 0 for all t. We prove the case of ¢'(t) < 0.
The case of ¢' (t) > 0 is similar (and slightly easier) .
Since ¢ is bijective, it must map some point to to b. If to > c, then the mean
value theorem guarantees that for some~ E [c, to] we have
Definition 10.1.13. Given Cl: [a, b] --7 X, define the arclength functions: [a, b] -+
E [a, b] to the length of Cl restricted to the subinterval [a, t]
JR by assigning t
s(t) = len(Cll[a,tJ)·
If a= 0 and llCl' (u) II = 1 for all u in [O, b], then s(t) = t, and we say that Cl is
parametrized by arclength.
So a( T) is parametrized by arclength.
Proposition 10.1.17. For any smooth curve with parametrization er the paramet-
rization 'Y = er o p of the curve is, in fact, a parametrization by arclength; that is,
Proof. This follows immediately from the fact that s'(t) = ll er'(t) ll combined with
the chain rule. D
1 C
f ds = ;·b f(<T(t))lla'(t)llx dt,
a
Remark 10.2.3. Since every oriented curve has only one parametrization by arc-
length, the line integral depends only on the oriented curve class of C .
where 0 :::; t :::; 7r. The line integral Jc xyz ds can be evaluated as
- lo"' 4tsin(t) cos(t)lla'(t)ll dt = -v'5 lo"' 4tsin(t) cos(t) dt
0
Example 10.2.5.
If C is parametrized by a : [a, b] --+!Rn and the force F : C--+ !Rn is not necessarily
constant, then over a small interval 6.a of the curve containing a( t), the work done is
approximately F(a(t)) · 6.a. Summing these pieces and taking the limit as 6.a--+ 0
yields an integral giving the work done by the force F to move a particle along the
curve C. This motivates the following definition.
Definition 10.2.6. Given a curve C C !Fn with parametrization a : [a, b] --+ C and
a 0 1 vector field F: C--+ wn, if F(a(t)) · a'(t) is integrable on [a,b], then we define
the line integral of the vector field F over C to be
where T is the unit tangent to C . The · in the left-hand integral is a formal symbol
defined by (1 O.1) , while the · in the second and third integrals is the usual dot-
product: x · y = (x, y) = xHy.
If we write a and F in terms of the standard basis as
and F(x) = (F1 (x), ... , Fn(x)), then it is traditional to write dxi = x~(t) dt, and in
this notation we have
Remark 10 .2. 7. Using this new notation, our previous discussion tells us that the
line integral
i F · da
is the work done by a force F: C-+ ]Rn moving a particle along the curve C.
Propos ition 10.2.8. The line integral of a vector field F over a smooth curve
does not depend on the parametrization of the curve. That is, given two equivalent
parametrizations a 1 : [a, b] -+ C and a2 : [c, d] -+ C, we have
87r3
= 27r+ 3 ·
In this case the integral depends only on the value of the potential at the
endpoints-it is independent of the path C and of CJ. This is an important
phenomenon that we revisit several times.
Example 10.3.3.
(1·1·) vor
r, every n E 71
!LJ + t he i"d entity
· map I : min
1£',. -t min
1£',. is
· · d
a parametnze
n-manifold.
(iii) For any U c "!Rm, the graph of any C 1 function f: U -t "JR gives a
parametrized m-manifold in "JR.m+l by a(u) = (u,f(u)) E JF'm+l. (Check
that Da has rank m.)
(iii) a2 o ¢ = a1.
In this case, we say that a2 is a reparametrization of a1. Each equivalence class of
parametrizations is called an oriented m-manifold or, if m is understood, it is just
called an oriented manifold.
390 Chapter 10. Calculus on Manifolds
If we drop the condition det(Dcp) > 0, then we get a larger equivalence class
of manifolds. Since the derivative Dcp is continuous and nonvanishing, it is either
always positive or always negative. ~f det(D¢(u)) < 0 for all u E U1, then we say
that¢ is orientation reversing. Each of these larger equivalence classes is called an
unoriented m-manifold or just an m--manifold .
Example 10.3.5. We may parametrize the upper half of the unit sphere
8 2 c JR 3 by a : U ---+ JR3 , where U = B(O , I ) is the unit disk in the plane
and a(u, v) = (u , v, JI - u 2 - v2 ) .. But we may also use t he parametrization
cr(u, v) = (v, u, JI - u 2 - v2 ). These are equivalent by t he map G : U---+ U
given by G(u, v) = (v, u). Moreover, we have
Thus, ifv1, ... ,vm is any basis for lRm, then the vectors Da(u)vi, ... ,Da(u)vm
form a basis for TpM.
l
We have
01 ,
-v
vl - u 2 -v 2
2
so the standard basis vectors e 1 , e 2 E IR are mapped by Da(l/2, 0) to
[l 0 - l/v'3f and [o 1 O)T, respectively. A similar calculation shows
that for the parametrization () the standard basis vectors are mapped by
Ddl/2,0) to [O 1 O)T and [l 0 -l/J3)T), respectively. And for the
parametrization f3 we have
e~
- sin(¢)
D/3 = cos(B) cos(¢) - sin( sin(¢)] ,
[ sin( B) cos(¢) cos(B) sin(¢)
and thus the standard basis vectors are mapped by D/3(1T /3, 1T /2) to
[-v'3/2 0 l / 2)T and [o -v'3/2 o)T , respectively. It is straightforward
to check that each of these pairs of vectors spans the same subspace TpS 2 of JR 3 .
and so
x = Da(u)v = Df3(w) ((D/3)- 1 (p)Da(u)v) E tl (Df3(w)).
In the special case of a surface in IR 3 we can use the cross product of tangent
vectors to define a normal to the surface. This cross product depends strongly
on the choice of parametrization, but if we rescale the normal to have length one,
then the resulting unit normal depends only on the oriented equivalence class of
the surface.
392 Chapter 10. Calculus on Manifolds
Figure 10.2. The tangent space TpM (red) to the manifold M at the point
p is a vector space, and hence passes through the origin. The traditional picture
that is often drawn instead is the tran.slate p + TpM (blue) . This translate is not a
vector space, but there is an obvious bijection from it to the tangent space given by
vr--+v-p.
Proposition 10.3.11. The unit normal depends only on the orientation of the
surface (that is, the orientation-preserving equivalence class). If the orientation is
reversed, the unit normal changes sign: N H - N.
~
Da(u)e1
a(u)
Da(u)ei = D(3(¢(u))D¢(u)ei.
By Proposition C.3.2, the cross product D(3(¢(u)) D¢(u)e 1 x D (3(¢(u)) D¢(u)e2 is
equal to det(D¢(u))(D(3(¢(u))e1 x D(3(¢(u))e2), so we have
Vista 10.3.12. The previous construction of the unit normal depends on the
fact that we have a two-dimensional tangent space embedded in JR 3 . Although
the cross product is only defined for a pair of vectors in JR 3 , there is a very
beautiful and powerful generalization of this idea to manifolds of arbitrary
dimension in X using the exterior algebra of differential forms. Unfortunately
we cannot treat this topic here, but the interested reader may find a complete
treatment in many books on vector calculus or differential geometry, including
several listed in the notes at the end of this chapter.
Let Q be the unit interval (cube) [O, JI.] c JRk, where JI. = I:~=l ei. This
is the parallelepiped in JRk defined by the standard basis vectors e1, ... ,ek· If
x 1, ... , x k is a collection of any k vectors in JR k, we can construct a linear operator
394 Chapter 10. Calculu s on Manifolds
Figure 10.4. The parallelepiped defined by three vectors x1, x2, and X3.
Each of the planes spanned by any two of these three vectors contains a face of the
parallelepiped, and every edge is a translate of one of these three vectors.
L(x 1 , ... ,xk) : JR.k -+ JR.k by sending each ei to xi. In terms of the standard
basis, the matrix representation of L(x 1 , . .. ,xk) has its ith column equal to the
representation of xi. The operator L(x 1 , ... , xk) maps Q to the parallelepiped
defined by x 1 , . .. , Xk· Therefore, the volume of this parallelepiped is equal to the
modulus of the determinant of L (see Remark 8.7.6.) We can also rewrite this as
where L : JR.k -+ JR.n is the linear transformation mapping each ei E JRk to xi E ]Rn.
Remark 10.4. 7. For a surface Sin JR 3 , you may have seen surface integrals defined
differently- not with J det( Da TDa), but rather as
Is f dS = l f(a) ll D1a x D2all =fl f(a(u, v))l lau x a vll dudv. (10.2)
This is equivalent to Definition 10.4.4, as can be seen by verifying that the area of
the parallelogram defined by two vectors v 1, v 2 in JR 3 is also equal to the length
ll v1 x v2 II of the cross product. For more on properties of cross products, see
Appendix C.3.
Remark 10.4.9. Combining the definition of the unit normal with the relation
(10.2), we can also write the surface integral of a vector field as
Example 10.5.2.
(i) The set JR 1 "' { 0} consists of two connected components, namely, the sets
( - oo, 0) and (0, oo).
(ii) The set ]Rn has only one connected component, namely, itself.
(iii) The set {(x , y) E JR 2 [ xy -=/. O} has four connected components, namely,
the four quadrants of the plane.
components. One of these is bounded and one is unbounded. The curve 'Y is the
topological boundary of each component.
Remark 10.5.4. We do not prove this theorem because it would take us far from
our goals for this text. A careful proof is actually quite difficult.
Definition 10.5.5. We call the bounded component of the complement of the curve
'Y the interior of 'Y and the unbounded component of the complement the exterior of
T If a point zo lies in the interior of "f, we say that zo is enclosed by 'Y or that it
lies within 'Y.
Remark 10.5.6. Because <C is homeomorphic to the plane JR. 2 (see Definition 5.9.1),
the Jordan curve theorem also applies to simple closed curves in C.
Definition 10.5.7. We say that an open set Uc <C or in JR. 2 is simply connected
if for any simple closed curve 'Y that lies inside U, every point in the interior of 'Y
is also contained in U. See Figure 10.5 for an example.
(a) (b)
Figure 10.5. A region (a) in the plane that is simply connected, and one
(b) that is not simply connected; see Definition 10.5. 7.
Example 10.5.9.
Unexample 10.5.10.
(i) The annulus {(x, y) E IR 2 I 0 < ll(x, y)ll < l} is not simply connected.
(ii) The set S = <C "\ B(O , r) is not simply connected, because for any c > 0
the circle {z E <C I llzll = r + c} lies in S, and the origin lies in the
interior of t he circle, but the origin is not in S.
398 Chapter 10. Calculus on Manifolds
Proposition 10.5.11. The interior of any simple closed curve in the plane is
simply connected.
Proof. Let "! be a simple closed curve, and let JR2 "- "! be the disjoint union of
connected components U and B, with U unbounded and B bounded. For any simple
closed curve a- c B we take JR 2 "-a- = v U (3 with v unbounded and (3 bounded.
Since U is connected and misses a-, we must have U C v or U C (3. Since U is
unbounded, it cannot lie in (3, so we have UC v, and hence (3 C vc C uc =BU ry.
But"! n (3 = 0 so (3 c B . D
Definition 10.5.12. Let "! : [a, b] ---+ JR 2 be a simple closed curve with interior
8 c JR 2 . Writing"!= (x(t),y(t)), we define the left-pointing normal vector n(t) at
t to be n(t) = (-y'(t), x'(t)) .
We say that"! is positively oriented if for any t E [a, b] there is a 6 > 0 such
that 1(t) + hn(t) E 8 whenever 0 < h < 6.
Remark 10.5.13. Said informally, 'Y has positive orientation if 8 always lies to the
left of the tangent vector 'Y' (t), or, equivalently, if"! is traversed in a counterclockwise
direction. See Figure 10.6 for a depiction of this.
y y
f g
b f----~---~~---
f
a f----~~--__. _ _ __
~+---~------~--- x - 1 - - - - -- - - - - - ---+ X
a b
h_ 8
.n
8Q
x
- a
8P =
Y
1
'Y
(P,Q) · d1 = 1
'Y
Pdx+Qdy .
Proof. It suffices to prove the theorem in the case that n is simple. We show that
In~~ = I'Y Q dy in the case that n is x-simple. The proof that - In~~ = I'Y P dx
when n is y-simple is essentially the same.
If n is x-simple, we can write TI = {(x, y) I x E [a, b], f(x) ::::; y ::::; g(x)}.
We may take / to be the sum of four curves 1 1 , 1 2, /3, /4, traversed consecutively,
where 11(t),13(t) : [a,b] ---+ JR. 2 are given by 11(t) = (t,f(t)) (the graph off,
traversed from left to right) and 1 3 (t) = (b +a - t ,g(b +a - t)) (the graph of
g traversed from right to left), respectively, and 1 2 , "(4 : [O, l] ---+ JR. 2 are given by
400 Chapter 10. Calculus on Manifolds
a b
Figure 10.9. The x -simple region D used in the proof of Green's theorem
(Th eorem 10.5.15). Here /'i is the graph off, traversed from left to right, while ')'3
is the graph of g, traversed from right to left. The path /'2 is the right-hand vertical
line, traversed from bottom to top, and ')'4 is the left-hand vertical line, traversed
from top to bottom.
= (b, (1 - t)f(b) + tg(b)) (the right vertical line, traversed from bottom to
')'2 (t)
top) and 1'4 (t) = (a, (1 - t)g(a) + tf(a)) (the left vertical line, traversed from top
to bottom), respectively. See Figure 10.9 for a depiction of D and the four curves
that make up ')'.
By Fubini's theorem, the general Leibniz integral rule (8.10), and the funda-
mental theorem of calculus, we have
k lb
IT OX
oQ
- =
a
!g(x)
f(x)
oQ
- dydx
OX
g(b) !g(a)
= Q(b, y) dy - Q(a, y) dy
! f(b) f(a)
= i2 Q dy + i4 Q dy + i, Q dy + i3 Q dy
= i Qdy. D
rx
laD
2
dx + 2xy dy
we may apply Green's theorem with P(x , y) = x2 and Q(x , y) = 2xy:
laD x 2
dx + 2xydy = .fl~~ - ~:=.fl 2ydxdy
Remark 10.5.17. Although the statement of the theorem only mentions simple
closed curves (and hence simply connected regions) , we can easily extend it to more
general cases. For example, if n is the region shown in Figure 10.10, we can cut
n into two pieces, n1 u n2, each of which is bounded by a simple closed piecewise-
smooth curve.
Since each of the new cuts is traversed twice, once in each direction, the
contribution of each cut to the line integral is zero. Also each cut has measure zero
in the plane, so it also contributes zero to the integral ffo; ~~ - ~~, so we have
re
liri
oQ _ oP
ox oy
= t Jiriire
i= l
oQ _ oP
ox oy
=L
2
i=l
1'Yi
(P, Q) . d1
a
Figure 10.10. Two cross cuts 0' 1 and 0'2 subdivide the annulus n into two
simply connected regions, n1 and n2. The boundary of f21 is /1 - 0'2 + T1 - 0'1
and the boundary of n2 is /2 + 0'1 + T2 + 0'2. When integrating along the boundary
of both 0 1 and f22 , the contributions from ±0'1 and ±0'2 cancel, giving the same
result as integrating along the boundary of n, that is, integrating over / + T. See
Remark 10. 5.17 for details and Example 10. 5.18 for a specific case.
402 Chapter 10. Calculu s on Manifolds
1 TUr
y 3 dx -x3 dy
cut the annulus along the x-axis and add new paths CT1 : [1, 2] -+ JR 2 and
CT 2 : [1,2]-+ JR2, given by CT 1(t) = (t,O) and CT2(t) = (-2+t,0).
Green's theorem applies to each half of the annulus to give
1
T+r
3
y 3 dx - x dy = (1T1 +rl - <Tt - <T2
3
y dx - x dy)
3
+ (1 r2+12+<r1U+<r2
3 3
y dx - x dy)
= r
ln1
-3(::r; 2 + y 2) dx dy + r
ln2
-3(x 2 + y2) dx dy
=In -3(x + 2
y
2
) dxdy
{21' /,2
= lo (-3r )r
2
dr dB = -457r /2.
1
Remark 10.5.19. Green's theorem,. when combined with the chain rule, can be
applied to surfaces in JR 3 to give a special case of Stokes' theorem. Assume 'Y is a
simple closed curve in the plane with interior n, and a : u -+ JR3 is a C 1 surface
n
with c u. Let s = a(n) with boundary c (parametrized by a("!) = CT).
For F: S-+ JR 3 with F = (P, Q', R), the curl of Fis
curl(F) = V x F= (aj~
ay
_aQ, aP _ aR, aQ _ aP).
az az ax ax ay (10.3)
The curl operator describes the infinitesimal rotation of F at each point. Stokes'
theorem states that
l F · dCT = Is curl(F) · dS.
Vista 10.5.20. Both Stokes' and Green's theorems are special cases of a
great generalization of the fundamental theorem of calculus. Roughly speak-
ing, when integrating over a parametrized manifold M , there is a natural
way to differentiate integrands called exterior differentiation that turns a k-
dimensional integrand 'T} into a k + 1-dimensional integrand d'T].
Exercises 403
r
laM
ry = r dry ,
jM
(10.4)
r
} B[a,b]
f = f(b) - f(a) =lb J'(t)dt =
a
r
J[a ,b]
df.
Exercises
Note to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
first and second line are for Section 1, the exercises between the second and third
lines are for Section 2, and so forth.
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *). We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with & are especially important and are likely to be used later
in this book and beyond. Those marked with tare harder than average, but should
still be done .
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
10.1. Let H be the curve with parametrization 1J : [O, 27r] ---t JR3 given by !J(t) =
(r cos( t), r sin( t), ct) for some positive constants r, c E R Find the arclength
of H.
10.2. Prove Proposition 10.1.15.
404 Chapter 10. Calculus on Manifolds
10.3. Consider a wheel of radius 1 roUing along the x-axis in JR 2 without slipping.
Fix a point on the outside of the wheel, and let C be the curve swept out by
the point.
(i) Find a parametrization a- : JR---+ JR 2 of this curve.
(ii) Find the distance traveled by the point when the wheel makes one full
rotation.
10.4. Let X = M 2 (JR) be the Banach space of 2 x 2 real matrices with the 1-norm,
and let C be the curve with parametrization a-: [O, 1] ---+ C, given by
(iv) Find an open subset U of JR 2 such that the point p = (1, 3, 7) E S lies
in a(U) and such that a restricted to U is a parametrized 2-manifold.
(v) Give a basis for the tangent space TpS of Sat p.
(vi) Give the unit normal N of Sat p with the parametrization a.
10.11. Let¢: JR 2 -+ JR 2 be given by (t, u) H (u, t). Let S, p, U, and a be as in the
previous problem.
(i) Find a basis of TpS using the parametrization a o ¢. Prove that the
span of this basis is the same as the span of the basis computed with
the parametrization a.
(ii) Compute the unit normal at p for the parametrization ao¢. Verify that
this is the same as the unit normal computed for a, except that the sign
has changed.
10.12. For each of the parametrizations a, (3, and <J of Examples 10.3.5 and 10.3.7,
compute the unit normal at the point (1/2, 0, v'3/2). Verify that it is the
same for a and (3, but has the opposite sign for <J.
10.13. Describe the tangent space TpM at each point of each of the following
manifolds:
(i) A vector subspace of ]Rn .
(ii) The graph of a smooth function f : JRk -+ R
(iii) The cylinder parametrized by u,v H (u,cos(v) ,sin(v)).
10.20. Evaluate the integral f 0 (x-y 3 ) dx+x 3 dy, where C is the unit circle x 2 +y 2 =
1 traversed once counterclockwise.
10.21. Use Green's theorem to give another proof of the result in Example 10.2.10
that if u : JR 2 -+JR is 0 2 and F = Du, then for any simple closed curve C in
the plane satisfying the hypothesis of Green's theorem and parametrized by
2
<J : [a, b] -+ JR we have
l F · d<J = 0.
406 Chapter 10. Calculus on Manifolds
10.22. Let D = {z EC : lzl :::; 1} be the unit disk, and let C be the boundary circle
{ (x, y) I x 2 + y 2 = 1}, traversed once counterclockwise. Let
{ 8Q
lv ax
8P
8y
and i Pdx+Qdy
l \7 · F = i F · dn.
10.24.* Using the same assumptions as the previous exercise, let \7 x F = curl(F)
be as in (10.3), and let e 3 = (0, 0, 1) be the third standard basis vector. Prove
that
l (\7 x F) · e3 = i F · Tds .
Notes
The interested reader can find proofs of t he Jordan curve theorem in [BroOO, Hal07a,
Hal07b, Mae84) . For more on generalizations of the cross product, the exterior
algebra, differential forms, and the generalized fundamental theorem of calculus,
including its relation to Green's and Stokes' theorems, we recommend the books
[HH99) and [Spi65).
Complex Analysis
Between two truths of the real domain, the easiest and shortest path quite often
passes through the complex domain.
-Paul Painleve
407
408 Chapter 11. Complex Analysis
total real derivative off is represented by the 2 x 2 Jacobian matrix, provided the
partial derivatives exist and are continuous.
But if we consider the domain C as a one-dimensional complex vector space,
then the total derivative at each point is a bounded C-linear transformation from
C to X. In this case the total complex derivative at a point z0 is given by
Definition 11.1.1. Let f: U-+ X, where U C C is open. We say that the function
f is holomorphic on U if it is differentiable {see Definition 6. 3.1) at every point of
U, considering U as an open subset of the one-dimensional complex vector space C;
that is, f is holomorphic on U if the limit
exists for every zo E U, where the limits are taken with z and h in C (not just in
IR). If U is not specified, then saying f is holomorphic means that f is holomorphic
on all of C. If f is holomorphic on all of C, it is sometimes said to be entire.
. f(zo
f '( zo ) = 1im + h) - f(zo)
h--tO
h
. (zo + reie)2 - z20
= hm .
r--tO reie
. 2zoreie + (reie)2
= r--tO
hm .
re'e
= lim 2zo + reie = 2zo.
r--tO
The reason that real differentiability does not imply complex differentiability is
that in the complex case the linear transformation D f (zo) = f' (z0 ) is given by
complex scalar multiplication w H wf'(z0), whereas most real linear transforma-
tions from JR 2 to X are not given by complex scalar multiplication. Of course
any C-linear map is JR-linear, but the converse is certainly not true , as the next
example shows.
Df(x,y) = [~ ~1 ] .
For this function to be holomorphic, the limit (11.1) must exist regardless of
the direction in which z approaches zo. Suppose zo = 0, and we let h = t
where t E R If the derivative f'(zo) = Iimh-+o(f(zo + h) - f(zo))/h exists,
then it must be equal to
On the other hand , if h =it with t E JR, then whenever the derivative f'(zo) =
limh_,o(f(zo + h) - f(zo)) / h exists, it must be equal to
X + iy H xb1 + yb2 .
Thus, the JR-linear map defined by b 1 , b2 is C-linear if and only if b2 = ib1.
In terms of derivatives of a function f: C ---+ X, this means that the total
real derivative [of / ax of jay] can only be C-linear, and hence can only define a
complex derivative f'(z) = of /ax + i8f jay, if it satisfies
(11.2)
410 Chapter 11. Complex Analysis
This important equation is called the Cauchy- Riemann equation. In the special
case that X = C, we can write f(:x, y) = u(x, y) + iv(x, y), so that af /ax
au/ax+ i8v/ax and aj /ay = au/ay + iav/ay. Expanding (11.2) gives
au av au av
and - - - - (11.3)
ax ay ay - ax'
which, together, are also often called the Cauchy- Riemann equations. These equa-
tions are an important way to determine whether a function is holomorphic.
Proof. Assume that f'(zo) exists for zo E U. Let zo = xo + iyo and z = x + iyo
where (xo, yo) E R. 2 and x ER As x ---+ xo we have
Since f' (zo) exists, these two real limits must exist and be equal, which gives the
Cauchy-Riemann equation (11.2) . If X = C, we can write f(x, y) = u(x, y) +
iv(x, y), expand (11.2), and match the real parts to real parts, and imaginary parts
to imaginary parts to get ( 11. 3) .
Conversely, if the real partials of f exist and are continuous, then the total
real derivative exists and is given by Df(x,y) = [¥x%fJ.
By Lemma 6.4.1 (the
alternative characterization of the Frechet derivative) applied to the function f on
U c R. 2 at any z 0 = x 0 + iyo E U, for every E > 0 there exists a 6 > 0 such that if
(a, b) E R. 2satisfies ll(a, b)ll2 < 6, then we have
Proof. This follows immediately from the previous theorem and the fact that
continuous partials imply differentiability (see Theorem 6.2.14). D
au yz _ x2 av x2 - y2
but
ax (x2 + y2)2 ay (x2 + y2)2 ·
These are only equal if x 2 = y2. Similarly
au -2xy av 2xy
and
ay (x2 + y2)2 ax (x2 + y2)2 ·
These are equal only if xy = 0. The only point where x 2 = y 2 and xy = 0 is
x = y = 0, so the origin is the only point of C where the Cauchy-Riemann
equations could hold; but f is not even defined at the origin. Since the Cauchy-
Riemann equations do not hold on any open set, f is not holomorphic on any
open set .
f'g - Jg'
g2
Proof. For any z 1, z2 EU, let g: [O, l ] -+ Ube a smooth path satisfying g(O) = z1
and g(l) = z 2. From the fundamental theorem of calculus (6.16), we have
1
f(z2) - f(z1) = f(g(l)) - f(g(O)) = fo J'(g(t))g'(t) dt = 0.
Thus, we have f(z1) = f(z2). Since the choice of z 1 and z2 is arbitrary, the function
f is constant. D
11.2. Properties and Examples 413
for every n E N, then for any positive r < R the series :Z:::%°=o ak(z - zo)k and the
series :Z:::%°=o kak(z - z 0 )k- l both converge uniformly and absolutely on the closed
ball B(zo, r) c <C.
Proof. Let D denot e the closed ball B (z 0 ,r) and set p = r / R. For every z E D
we have
Thus, on D we have
00 00
Thus, on D we have
The series M L:%°=opk and the series M / R L:%°=okpk- l both converge absolutely
by the ratio test because p < l. Hence :Z:::%°=oak(z - zo)k and :Z:::kENkak(z - zo)k- l
414 Chapter 11. Complex Analysis
both converge absolutely on every B(zo, r) c B(zo, R), and therefore they both
converge uniformly on B(zo, R). D
Corollary 11.2.6. If a series I:r=o ak(z - zo)k does not converge at some point
z1, then the series does not converge at any point z E C with iz - zo I > lz1 - zo I-
Definition 11.2. 7. Given a power series I:r=o ak ( z - zo )k, the largest real number
R such that the series converges uniformly on compact subsets in B(zo, R) is called
the radius of convergence of the series. If the series converges for every R > 0,
then we say the radius of convergence is oo .
Theorem 11.2.8. Given any power series f(z) = I:r=o ak(z - z 0 )k that is uni-
formly convergent on compact subsets in an open ball B(zo, R), the function f is
holomorphic on B(z0 , R). The power series g(z) = I:r=l kak(z - zo)k- l also con-
verges uniformly and absolutely on compact subsets in B(zo, R), and g(z) is the
derivative of f(z) on B(zo, R).
Remark 11.2.10. Theorem 11.2.8 says that if f(z) is analytic on U, then f(z)
is holomorphic on U. This gives a large collection of holomorphic functions to work
with. This theorem also guarantees that the derivative of any complex analytic func-
t ion is also analytic, and hence holomorphic. Thus, every complex analytic function
is infinitely differentiable.
Example 11.2.11.
. oo (_1) n z2n+ 1
sm(z) = ~ (2n + 1)!
00 lzl2n+1
This is absolutely convergent everywhere because the sum l:n=O (2n+l ) !
is convergent everywhere (again by the ratio test). This shows that
sin(z) is holomorphic on all of C.
00 ( l )n Z2n
(iii) Define the function cos(z) by the power series cos(z) = l:n=O - (2n)! .
Again, this is absolutely convergent everywhere, and hence cos(z) is
holomorphic on all of C.
The three series of the previous example are related by Euler 's formula.
Proof.
OU
-
av ex cosy and -
OU av
= - - = - e smy.
x ·
ax = -oy = oy ax
Thus, the Cauchy-Riemann equations hold.
oo (A )n
f(z) = ezA = 'E -~- .
n= 0
n.
An argument similar to that of Example 11.2.ll(i) shows that the series con-
verges absolutely on the entire plane, and hence f (z) is holomorphic on all of
C. We can use Theorem 11.2.8 to find f'(z): set zo = 0, and ak = 1~. This
gives
"fj(t) = Xj(t) + iyj(t) for each j E {1, ... , n}. The contour integral fr f(z) dz off
over r = 2=7= "(j
1 is the sum of the line integrals
l f(z) dz = t, ,l 1
f(z) dz= t, 1: 1
f('Yj(t))'Yj(t) dt (11.5)
= t 1b
j= l a;
1
f('Yj(t)) (dx + idy) = t jb
j= l a1
1
f('Yj(t))(xj(t) + iyj(t)) dt.
Remark 11.3.3. A contour integral can be computed in two equivalent ways: ei-
ther as f--r f(z) dz, which is a line integral in <C, or as f--r J('Y(t)) (dx + idy) , which is
a line integral in JR 2 •
Remark 11.3.4. A contour integral does not usually represent an area or a vol-
ume, but it often represents a physical quantity like work or energy. Even when
the contour integral is not obviously being used to model a physical or geometric
quantity, it provides an important tool for working with holomorphic functions. We
use contour integrals repeatedly throughout this and the following chapters. One
of the important themes of Chapter 12 is that contour integrals are very useful for
understanding and manipulating linear operators.
Lemma 11.3.5. Let 'Y : [O, 27T] --t <C be the circle centered at z0 of radius r, given
by 'Y( B) = zo + rei 8 . For any n E Z we have
j(
'Y
Z - Zo )nd Z = {27Ti,
0
n=
otherwise.
- 1,
.l f(z)dz =
2 2
1 -r f('Y(B))'Y'(B)dB = 1 -r (zo + reie - zo)n (irei 8 ) dB
Since /j(bj) = /J+ 1(aj+1) , the sum (11.5) collapses to give F(rn(bn)) - F(r1(a1)) .
Since zo = 11(ai) and z1 = /n(bn), we have the desired result. D
Remark 11.3.8. The theorem above implies that the value of the contour integral
fr F' (z) dz depends only on the endpoints of r and is independent of the path itself,
provided r lies entirely in u.
fr .F'(z)dz = 0.
Ir f(z)dz =0 (11.6)
Remark 11.3.13. Let f: U---+ X. If we can write f = u +iv, where u and v are
C 1 functions from U c IR 2 to X (taken as a Banach space over IR) , then since U is
simply connected, we could use Green's theorem as follows. If R is the region that
has r as its boundary, then
Remark 11.3.15. The Cauchy- Goursat theorem implies that contour integrals of
any holomorphic function are path independent on a simply connected domain.
That is to say, if 1 1 and 12 are two contours in a simply connected domain U with
the same starting and ending points, then the contour 1 1 - 1 2 is a closed contour
in U, so
0 = r
ln-~
f(z) dz== 1
n
f(z) dz - 1~
f(z) dz.
Hence
r f(z) dz= 1 f(z) dz.
J
'Yl )'2
J
Example 11.3.16. Lemma 11.3.5 shows that 'YO (z - zo)- 1 dz = 27ri, where
'Yo is a circle around z0 , traversed once in a counterclockwise direction. Con-
sider now a different path 1 1 that also wraps once around the point z0 , as in
Figure 11.1. Adding a segment er (a "cut") from the starting (and ending) point
of 1 1 to the starting (and ending) point of lo, the contour r = 1 1 +er - 1 0 - er
encloses a simply connected region; hence the integral fr(z-zo)- 1 dz vanishes,
J J
and we have 'YO (z - zo)- 1 dz= 'Y l (z - zo)- 1 dz, since the contribution from
er cancels.
/1
l
In this case we have
f(z)dz = 0. (11.7)
Note that the conditions of the lemma hold for every closed contour consisting
of a finite sum r =/'I+· +!'n, where each l'i is linear. We call such paths polygonal.
Q if W = Zi,
gi(w) = f(w) - f(z ,)
{ ~w~-~ z,~ -
J'( Zi ) 'f -r-
1 W
__;_ Zi.
since any oriented segment of a side of Ri that is not part of r is also contained in
exactly one other R1 , but with the opposite orientation in R1 , and thus they cancel
out in the sum.
422 Chapter 11 Compl ex Analysis
By definition of gi we have
But by Corollary 11.3.9 we have fci dw = 0 and fci wdw = 0, and thus
fCi
f(w) dw == f Ci
(w - zi)gi(w) dw .
Summing over all i E {1, .. . , n} and using the fact that both lw-zil < v'2/2m and
llgi(w)ll < c for all w E Ri, we get
11.l f(w) dwll < crmv'2(4nrm + len(r)) = c.J2(4A +rm len(f)), (11.9)
where A is the total area of all the squares R 1 U · · · U Rn· Since IT is compact and
measurable, the area A approaches the (finite) measure of IT as m --+ oo. Since c
is arbitrary, we may make the right side of (11.9) as small as we please, and hence
the left side must vanish. D
To extend the result to a generall closed contour we first prove one more lemma.
Lemma 11.3.18. Letry: [a, b] --+Ube C 1 , and let f be holomorphic on U. For any
c > 0 there is a polygonal path u = I=~=l O"k c U with u(a) = 7(a) and u(b) = 7(b)
such that
Iii f(z) dz - l f(z) dzll < c.
Proof. Since I' is compact, the distance p = d(!', uc) to uc must be positive (see
Exercise 5.33). Let Z be the compact set Z = {z E U I d(z, !') :<:; p/2} C U. Since
f is holomorphic, it is uniformly continuous on Z.
Since I' is C 1 it has a well-defined arclength L = J'Y ldzl . For every c > 0,
choose 8 > 0 so that for all z, w E Z we have
n !-y(tk)
<-2:
3L k=l
€
-r(t•-1)
€
[dz[ = - .
3
(11.11)
By the mean value theorem (Theorem 6.5.1), for each k there exists a tk, E [tk-1, tk]
such that
n n
s =I: J('Y(tk))b(tk) - ,,(tk_i)J = I: J('Y(tk)h'(t'k)[tk - tk-ll · (11.12)
k=l k= l
Combining (11.10), (11.11), (11.12), and (11.13) with the triangle inequality gives
II;: f (z) dz - if (z) dzll '.O II;: J(z) dz - ~ f(1(t,) )>' (t.)(t. - ,,_,)II
+II~ f('Y(tk)h'(tk)(tk-tk - 1)- sll + JJs- 1f(z) dzJJ
< i +II~ f('Y(tk))b'(tk) - ,,'(tk)](tk - tk-1) 11 + i
424 Chapter 11. Complex Analysis
2
:::; ; +Fi=b'(tk)-1'(tk)i(tk - tk- i)
k =,1
Proof of Theorem 11.3.11. Let r c Ube a closed contour, with r = L:: 1 I'm,
where /'1 : [a1, bi] --+ <C, /'2 : [a2, b2] --+ <C, . . . , I'm : [am, bm] --+ <C are all C 1. For any
c: > 0, Lemma 11.3.18 guarantees there are polygonal paths cr1, . . . , CTm such that
cri(ai )= l'i(ai) and cri(bi) = l'i(bi) for every i and such that II J,, f(z) dz- fa-, f(z) dz[[
< c:/m. Letting er be the polygonal path er= L:;: 1 cri gives
But er is a closed polygonal contour, and therefore Lemma 11.3.17 applies to give
[ f(z) dz = O;
hence ,
Proof. If O' is any circle centered at z0 and lying entirely in the interior of"'(, then
using the Cauchy-Goursat theorem with the same argument as in Example 11.3.16
and Figure 11.1 , we have that
Therefore, it suffices to prove the result in the case that "'( is some circle of sufficiently
small radius r.
By Lemma 11.3.5 we have
which gives
I ~ 1 JJ!l dz - f(zo)ll
271'2 } , Z - Zo
= 11 ~ 1
27ri } ,
f(z) - f(zo) dz [[
Z - Zo
2~ fo
2
= [[ 7r f (zo +reit) - f (z0 ) dt[[
< sup ll f(z) - f(zo)ll ·
lz-zol =r
Since f is holomorphic in U, it is continuous at zo; so, for any E. > 0 there is a choice
of 6 > 0 such that llJ(z) - f(zo)ll < E. whenever iz - zol ::; 6. Therefore, choosing
0 < r < 6, we have
I~ 1 JJ!l
2n}, z - zo
dz - f(zo)ll < c..
Since this holds for every positive c., the result follows. D
Example 11.4.2.
(i) Let r be any simple closed contour around 0. Since e 2 is holomorphic
everywhere, we compute
426 Chapter 11. Complex Ana lys is
where r is a circle of radius strictly less than one around the origin. The
function f(z) = cos(z)/(l+z 3 ) is holomorphic away from the three roots
of 1 + z 3 = 0, which all have norm l. So inside of the open ball B(O, 1)
the function f(z) is holomorphic, and r lies inside of B(O, 1). We now
compute
1 cos(zl dz= J, f(z) dz= 21fif(O) = 27fi.
Jrz+z Jr z
P roof. Parametrize C in the usual way, ')'(t) = z 0 + reit for t E [O, 27f], and apply
the Cauchy integral formula:
f(zo) = ~ 9~ f(z) dz
27ft, C Z - Zo
R emark 11.4.4. Gauss's mean value theorem says that the average (mean) value
of a holomorphic function on a circle is the value of the function at the center of
the circle. This is much more precise than the other mean value theorems we have
encountered, which just say that the average value over a set is achieved somewhere
in the set, but do not specify where t hat mean value is attained.
is holomorphic at all w in the complement of /' (both the interior and exterior
components), and its derivative satis.fies
F~(w) = n Fn+1(w).
11.4. Cauchy's Integral Formula 427
Proof. The result follows immediately from Leibniz's integral rule (Theorem 8.6.9),
which allows us to pass the derivative through the integral. For every n we have
W
d
-d Fn(w) = -d
d
W
:k'Y
(
f(z)
Z-W n
) dz
= J:. ~ f(z) dz
.' Gaw (z - w)n
= J:. nf (z) dz
}, (z - w)n+I
= nFn+I(w). 0
_!},__ f(k-l)(w)
dw
=~ 1
f(z) dz
27Ti , (z - w)k+l '
Remark 11.4.9. This last corollary is yet another demonstration of the fact that
holomorphic functions are very special, because there are many functions from IB. 2
to IB. 2 that are differentiable but not twice differentiable, whereas any function
from C to C that is differentiable as a complex function (holomorphic) is also
infinitely differentiable.
428 Chapter 11. Complex Analysis
Example 11.4.10.
j cos(z) dz
Jc 2z 3 '
where C is any simple closed contour enclosing 0, use the Cauchy differ-
entiation formula with f(z) == cos(z)/2 to get
.rcj cos(z)
2z 3
dz= 27ri J"(O)
2!
= _ 7ri .
2
~
sin(z) d
(
Jr z
3
+ z 5 z,
where r is a circle of radius strictly less than one around the origin. Note
that f (z) = (~:~}) is a holomorphic function inside the circle r because
it can be written as a product of two holomorphic functions sin(z) and
1/(1 + z 2). Therefore, we have
i
1
1
(z 2 - 1) 2 (z - 1)
dz= i 1
1
(z - 1)3(z + 1) 2
dz ,
J __ l _ dz=j 1 dz
".rr (z - 1)3(z + 1)2 ".rr +12
1
(z - 1)3(z + 1)2
1 1
= J
".rr (z-1)3(z+1)2
dz +j
".rr (z-1)3(z+1)2
dz .
1 2
11.5. Consequences of Cauchy's Integral Formula 429
J, 1 J, h(z) . II 37fi
J,2 (z2 - 1)2(z - 1) dz= .'fr2 (z - 1)3 dz= 11if2 (1) = s ·
Therefore the integral is equal to 0.
Proof. Let ME JR be such that llf(z)ll < M for all z EC. Choose any zo EC and
let 'Y be a circle of radius R centered at zo. Then
Example 11.5.2. The functions sin(z) and cos(z) are bounded if z E JR, but
since they are not constant, Liouville's theorem guarantees they cannot be
bounded on all of C. That is, for every M > 0, there must be a z E C such
that Isin(z)I > M , and similarly for cos(z).
P roof. If Iii > E, then ll/ JI < 1/c. Moreover, since f is entire and nonzero, 1/f
is also entire. Hence, by Liouville's theorem, 1/ f is constant, which implies f is
constant. D
Proof. Let f be any nonconstant polynomial of degree k > 0, let c be the coefficient
of the degree-k term, and let p = f / c, so
It suffices to show that p(z) has a root. Let a= max{ lak - 11, lak- 21, ... , laol}. If
a= 0, then p(z) = zk and z = 0 is a root of p; thus, we may assume a> 0. Suppose
t hat p(z) has no roots, and let R = max{(k + l)a, 1}. If lzl 2 R, then
Thus, p(z) is uniformly bounded away from zero on the exterior of B(O, R). Since no
roots exist for lzl :::; R , compactness and continuity imply that p(z) is also uniformly
bounded away from zero on the interior of B(O, R). By Corollary 11 .5.3, we must
have that p(z) is constant, which is a contradiction. 0
Proof. For any r > 0 with B(z0 , r) C U, Gauss's mean value theorem
(Corollary 11.4.3) implies
2~ ifo
2
lf(zo) I = r. f(zo + reit) dtl
:::; ~ f
2
7r lf(zo + reit)I dt.
27r lo
But since If I attains its maximum at zo we have
1 1271"
lf(zo)I 2: - lf(zo + reit)I dt,
27!" . 0
and hence equality holds. And thus we see
Proof. Since lfl is continuous on a compact set, it must attain its maximum
somewhere on D. If it is constant, lf l attains its maximum at every point of D. If
it is not constant, then the maximum modulus principle guarantees the maximum
cannot be attained on the interior of D; hence it must be on the boundary. D
The expansion (11.18) is called the Taylor expansion or the Taylor series off at zo.
Proof. For any 0 < E: < r consider the closed ball D = B(zo , r - c) C B(zo, r) C U.
We need only show that the series in (11 .18) converges uniformly on D. Let/ be
the circle {w E <C I r - E:/ 2 = lw - zol} with the usual, counterclockwise orientation.
Note that 1 lies entirely within B(z0 , r) and completely encloses D.
For any z ED and w E /, expand w ~ z as a power series in zo to get
f(z) = ~ J,
27fi
f(w) dw =
.G w - z
~ J,
27ri
f
'Ki k=O
f(w) (z - zo)k dw.
(w - zo)k+l
f(z) = ~ ~
1
7fi k=O
~ *
and hence with uniformly convergent sums. Thus, we have
'Y
(z - zo)k ~ f(k)(zo)
f(w) ( _ )k+l dw = ~
W Zo k=O
kl
.
k
(z - zo) ,
where the last line follows from Cauchy's differentiation formula (11.15).
This shows that the Taylor series for f (z) converges uniformly on any closed
ball B(z 0 , r - c:) contained in any open ball B(z0 , r) C U . D
Remark 11.6.2. We have already seen (see Theorem 11.2.8) that analytic func-
tions are holomorphic. The previous theorem guarantees that the converse holds,
so a complex function is holomorphic if and only if it is analytic. Because of this
theorem, many people use the terms holomorphic and analytic interchangeably.
Let Zk+1 be the first point in the sequence for which the power series expansion
of f around zk+1 is not identically zero. Since f is identically zero on the ball
B(zk,€), and since Zk+i lies in B(zk,€), all the derivatives off vanish at Zk+i, so
the power series expansion off (zk+d around the point Zk+l is also identically 0-a
contradiction. D
Proof. It is clear from the Taylor expansion that f factors as (z - z 0 )ng(z) with
g analytic. Since g is continuous near z0 , there is a neighborhood V of z0 where
g(z) =/= 0 if z E V. Since the polynomial (z - zor vanishes only at z = zo, the
product (z - z 0 )ng(z) = f(z) cannot vanish on V "'z0 . D
Proof. The convergent sequence (zk)k=O of zeros must intersect every neigh-
borhood of w; hence a neighborhood with no additional zeros, as described in
Proposition 11 .6.6, cannot exist. Thus, f must be identically zero on a neighbor-
hood of w. By Corollary 11.6.4 f must be zero on all of U. D
f(z) = 2o
oo
Cn(z - zo)n
oo
+ ~ C-n
( 1
z _ zo
)n , (11.21)
and both of these series converge uniformly and absolutely on every closed annulus
of the form Dp,12 = {z E c Ip:::; lz - zol :::; e} for r < p < e < R . Furthermore, if 'Y
is any circle around z 0 with radius r + c < R, for some c > 0, then for each integer
n the coefficients are given by
,R
•Z
f(z) = ~
27fZ
f f(w) dw = ~ j
1' _,, W -
2 1
Z
f(w) dw - ~ j
27rz ':G2 W - Z
f(w) dw.
27rz ':Gr W - Z
(11 .23)
1 ~ f(w)
- · --
27rZ . 1'2 W - z
=~ 1
6 - .
2
k=O 1fZ
i1'2
(
f(w)
W- Z
k
)k+l (z - zo) dw = 6~ ck(z - z0 ) k ,
k=O
1 ~ (w - zo)k
z - w = 6 (z - z0 )k+ 1 '
k=O
which converges uniformly and absolutely as a function in w and z on 1 1 x Dp,e·
Integrating the second half of (11.23) term by term gives
_ _l j f(w) dw =~ C-k .
27ri 'K,1'1 w - z 6 (z - z0 )k
k=O
To see that this second series converges uniformly and absolutely on D p, 12 substitute
t = l/(z - z0 ) and use the previous results for power series int. D
Proposition 11.6.9. For a given choice of A = {z E C I r < lz - zol < R}, the
Laurent expansion (11.20) off is unique.
Proof. The proof is similar to that for Taylor series, but instead of differentiating
term by term, write the expansion of (z!z~z/k+r and integrate term by term around
a contour I centered at zo with radius r + E in A, using Lemma 11.3.5. The details
are Exercise 11.28. D
oo (-l)kz2k-3 1 1 z
cos(z)/z3 = L ~~ = z3 - 2!z + 4! + ....
k=O
1/2 =
z+i
1/2
2i+(z-i)
= l/4i
1--i(z-i)/2
= ~
4i
f (i/ 2)k(z _ i)k.
k=O
This converges when [(i/2)(z - i) [ < 1 and diverges at (i/2)(z - i) = 1, so the
radius of convergence is l. Thus we have
z 1/2 1 ~ . k . k
- - = -. + --: L )i/2) (z - i) .
z 2
+1 Z- i 4i k=O
(i) The series l:::k~-oo ck(z - z0 )k is the principal part of the Laurent series for
f at zo .
11.7. The Residue Theorem 439
(ii) If the coefficients Ck all vanish for k < 0 (if the principal part is zero), then the
singularity is called removable . In this case f can be extended to a holomorphic
function on B(zo, r) using the power series f(z) = I: ~=O ck(z - zo)k.
(iii) If the principal part has only a finite number of nonzero terms, so that f =
L~-Nck (z - zo)k, with c_N f:. 0, then z 0 is called a pole off of order N.
(iv) A pole of first order, meaning that Ck = 0 for all k < - 1, and c_ 1 f:. 0, is
called a simple pole.
(v) If the principal part has an infinite number of nonzero terms, then z 0 is an
essential singularity off .
Example 11.7.2.
si: z = ~ ( z _ ~: + ~~ _ .. .) = 1_ ~~ + ~: _ ....
Example 11.7.3. Assume that f and g are both holomorphic and C-valued
in a neighborhood of zo, that f has a zero at zo of order k, and that g has a zero
of order eat z 0 . We can write f(z) = (z - z 0 )k F(z) and g(z) = (z - zo)eG(z),
where F and Gare both holomorphic and do not vanish at zo.
and that agrees with p/ q at all points of <C except at the zeros of q. It is stan-
dard practice to replace p/ q by a/ b everywhere but still write p/ q to denote
the function a/b. We will also do this everywhere without further comment.
~ J.
27ri 1-i f(z)dz
is called the residue off at zo and is denoted Res(!, zo)
Res(.f, zo) = c_ 1 ·
Proof. This follows immediately from the fact that the Laurent expansion converges
uniformly on compact subsets, so integration commutes with the sums and we get
Res(!, zo) = -2
1.
7rt
i'Y
f (z) dz= -2
1.
1ft
i
'Y n= -
L
00
oo
en(z - zor dz
= -1 . ~ ~
~ Cn ( z -- zo )n dz = -1 . ~ -C-1- dz = c_ 1 . D
27rt n =- oo. 'Y 27rt , 'Y z - zo
Proposition 11. 7.8. If f has an isolated singularity at zo, then the following hold:
(ii) If limz__, z0 (z - zo)k f(z) exists (is finite) for some k 2: 0, then the singularity
is removable or is a pole of order less than or equal to k .
Proof. This follows from the Laurent expansion; see Exercise 11.29. 0
If the contour 'Y is closed but not simple, or if it does not contain z0 , then the
integral off around "( depends not only on the coefficient c_ 1 (the residue) , but
also on the contour itself.
If "( is a simple closed curve traversed once in a counterclockwise direction
with zo in the interior of"(, then the integral of l /(z - z0 ) around 'Y is 27fi. A little
thought shows that if 'Y circles around z 0 a total of k times in a counterclockwise
direction, then the integral is 27fki. If 'Y does not enclose the point z0 at all, there
exists a simply connected region containing 'Y but not zo where z! zo is holomorphic.
When the function z!zo is holomorphic on a simply connected region containing
"(, the Cauchy-Goursat theorem guarantees that the integral over this contour is 0.
These observations motivate the following definition.
Definition 11. 7.9. Let 'Y be a closed contour on C and z 0 E C a point not on the
contour 'Y . The winding number of 'Y with respect to z 0 is
2m
1
I('Y , zo) = -. i --.
1
dz
z - zo
(11.26)
Nota Bene 11.7.10. The winding number essentially counts the total num-
ber of times a closed curve travels counterclockwise around a given point.
(i) Lemma 11.3.5 and Example 11.3.16 show that for any simple closed
contour er t he winding number I(cr, 0) is 1 if zo is contained in er and is
zero otherwise.
Lemma 11. 7 .12. Let U be a simply connected open set, and let 'Y be a closed
contour in U. If N(z) = 2:::%': 0 (z!; 0 )k is uniformly convergent on compact subsets
in the punctured set U '\ { z0 }, then we have
1
- . J.. N(z) dz= Res(N, zo)I('Y, zo).
27fZ),
i I
f(z)dz = 2'lfi t
i=l
Res(!, zi)I('Y, zi)· (11.27)
Remark 11.7.14. The idea behind this theorem should be fairly clear: use the
usual change-of-path method to replace the contour ry with a sum of little circles
L;~=l "fj, one around each isolated singularity Zj, traversed I('Y,zj) times. To com-
pute the integral :fo,j f(z) dz, just use the Laurent expansion and integrate term by
term to get the desired result.
We need to be more careful, however, since there is some sloppiness in the
claim about winding numbers matching the number of times the path happens to
encircle a given singularity. For the careful proof, it turns out to be easier to use a
slightly different approach.
Each NJ converges uniformly on compact subsets of <C "\ { ZJ} and hence is holo-
morphic on U "\ {ZJ}.
Let g(z) be the function obtained by subtracting from f the sum of all the
principal parts:
n
g(z) = f(z) - L NJ(z).
j= l
Near ZJ the positive part of the Laurent expansion off is of the form I::=o~) (z -
Zj )m, which converges uniformly on compact subsets and defines a holomorphic
function on a neighborhood BJ of Zj· In Bj, the principal parts Nc(z) are all
holomorphic if C # j, so the function G(z) = I::=ocm(z - ZJ)m - L t#] Ne is
holomorphic on BJ· Since g(z) = G(z) at every point of Bj "\ {zJ}, the function
g(z) has a removable singularity at z = Zj·
Since this holds for every j E {1, ... , n }, the Cauchy-Goursat theorem gives
Thus, we see
n
= 27fi L Res(!, ZJ)I(r, ZJ),
j= l
The residue theorem is very powerful, especially when combined with another
method for calculating the residues easily. The next proposition gives one method
for doing this.
Proposition 11. 7.15. Assume h is a <C-valued function and g and h are both
holomorphic in a neighborhood of zo. If g(zo) # 0, h(zo) = 0, and h'(zo) # 0, then
the function g(z)/h(z) has a simple pole at zo and
g(z) ) g(zo)
(11 .28)
Res ( h(z) 'zo = h'(zo) .
Hence
. Z - Zo 1
1im - - = - -
z -+zo h(z) h'(zo)
444 Chapter 11. Complex Analysis
•
e37ri/4
•
e7ri/4
-R I R
•
e57ri/ 4
•
e77ri/4
g(zo) . g(z) (g )
h'(zo) = }!.,n; (z -- zo) h(z)
0
= Res h' zo ,
~
1 1
f
.
1
z2 - 1
= 2ni [Res (--
z2 - 1 , 1) +Res ( -z2 - 1 , -1)]
Example 11. 7.17. Contour-integral techniques are useful for computing real
integrals that would otherwise be extremely difficult to evaluate. For example,
consider the integral J~00 xff~ 1 . It is not difficult to verify that this improper
integral converges and t hat it can be computed using the symmetric limit
To compute this integral, consider the contour integral over the following
D-shaped contour. Let C be the upper half circle of radius R > 2 centered at
the origin, as shown in Figure 11. 6, let "( be the line segment from ( - R, 0) to
(R, 0), and let D = C +I be their sum. We have
11.8. *The Argument Principle and Its Consequences 445
j dz 1 dz
.'f°r> z 4 + 1 = c z 4 + 1 +
j'Y
dx
x4 + 1
1
= c z4 +
dz
1+
JR-R
dx
x4 + 1·
j
Jv
~
z4 + 1
= 27fi
1
z + 1'
1
[Res ( - 4 - er.i/ 4) +Res ( - 4 - e 37fi/ 4)]
z + 1'
= 27ri. [ 4(ei/
1
4)3
+ (e3.,,.i/
1
4)3
] = - 2~• [e- 3i.,,./ 4 + e-9.,,.i/ 4]
4
7f
Now consider the integral just over C. Parametrizing C by Reit with t E [O, 7r]
gives
Therefore we have
1 ~=
00
_ 00 x4 +1
lim
R--too
(JR~+
-R x 4 +1
~) 1
c z 4 +1
= lim j
R--t ooJv z 4
~ = _!!__
+1 J2'
27r I (CJ, 0) is the total change in angle (argument) for the contour CJ. If u = f o /,
then 27rl (f o /, 0) is the total change in argument for f o /, at least if f (w) =I- 0 for
every w E f.
(11.29)
Since g is holomorphic and nonvanishing at zo, the term ~g? is also holomorphic
near z0 . This shows that the function F(z) has residue k at zo. The residue theorem
now gives (11.29), as desired. D
Remark 11.8.2. This theorem says the integral (11.29) is always an integer mul-
tiple of 27ri. That means we can compute it exactly by just using an approximation
that is good enough to identify which multiple of 27ri it must be. If f is holomorphic,
a rough numerical approximation that is just good enough to identify the nearest
integer multiple of 27ri tells the exact number of zeros off (with multiplicity) that
lie inside of I· If f is meromorphic, then such a numerical approximation tells the
exact difference between the number of zeros and poles inside of I·
with f(z) =f w for every z E ry. If N is the number of solutions of f(z) = w (with
multiplicities) that lie within ry, then
-1 j f'(z) dz = N.
27fi 'Yr f(z) - w
Proof. Let g =f - w and apply the zero and pole counting formula (11.29) tog
on ry. D
I(f o ry, 0) = Z - P.
Proof. Since ry is a simple closed contour, the winding number I("(, zo) is 1 for
every zo inside of ry and 0 otherwise. We have
where the last line follows from the zero and pole counting formula (11.29). D
(z - 7) 3 z 2
f(z) = (z - 6)3(z + 2) 5 (z - 1) 2 ·
Let's evaluate the integral of f' / f around the circle of radius 4, centered at
zero, traversed once, oriented counterclockwise. By Theorem 11.8.1 we have
Proof. Consider the function F(z) = g(z)/ f(z). The difference between the
number Z of zeros and the number P of poles of F is precisely the difference
between the number of zeros of g and the number of zeros off. We show that this
difference is zero.
If for g has a zero at some z E "(,then the hypothesis lf(z)I > lf(z) - g(z)I
could not hold. Therefore, the function F has no poles or zeros on T
For all z E 'Y we have
Therefore, the distance from the contour F o 'Y to 1 is always less than 1, and hence
0 does not lie within the contour F o 'Y and I(F o "(, 0) = 0. This gives
O=l(Fo 'y,O)=Z -P. D
Example 11.8.8.
(i) To find the number of zeros of z 5 +8z+10 inside the unit circle lzl = 1,
choose g(z) = z5 + 8z + 10 and f (z) = 10. On the unit circle we have
lg(z)- f (z) I = lz 5 + 8zl :S 9 < If (z) I = 10, so Rouche's theorem says the
number of zeros of g inside the unit circle is the same as f, that is, none.
11.8. *The Argument Principle and Its Consequences 449
Figure 11.7. Rouche's dog walking. Your path (blue) corresponds to the
contour f( z) and the dog's path (brown) corresponds to the contour g(z). At the
origin is a lamppost. If the leash is not long enough to extend to the lamppost from
any point of your path, then you and the dog must circle the lamppost the same
number of times; see Remark 11 .8. 'l.
Nota Bene 11.8.9. To u e Rouc:he 's t heorem, you need to find a suitable
function f (z). A good rule of thumb is to consider a function that shares
part of t he function g( z ) but whose zeros are easier to find . For example, in
part (i) of t he previous example, both f(z) = z 5 and f (z) = 10 are functions
for which it is easy to find the zeros. But you can check that f (z) = z5 does
not satisfy the hypothesis of the theorem, whereas f (z) = 10 does.
If you happen to choose the wrong function f(z) t he first time, just
try another.
Remark 11.8.11. The holomorphic open mapping theorem shows, among other
things, that no holomorphic function can map an open set to the real line, since the
real line is not open in C.
In particular, the maps ~(z), c;J(z), and I· I cannot be holomorphic, since they
all map C to the real line. This also shows that z cannot be holomorphic, since
2~(z) = z+z.
Exercises
Note to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
first and second line are for Section 1, the exercises between the second and third
lines are for Section 2, and so forth.
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *). We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with & are especially important and are likely to be used later
in this book and beyond. Those marked with t are harder than average, but should
still be done.
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
11.1. Identify (and prove) the largest open subset of C where the following functions
are holomorphic:
(i) g(z) = lzl.
(ii) f(z) = zsin(z).
2
(iii) h(z) = elzl .
11.6. Prove Theorem 11.2.2. (Hint: For (iv) first show that any monomial azn
is holomorphic by induction on n . Then show I:~=O ajzj is holomorphic by
induction on k.)
11.7. Can a power series 2::%"=oak (z -- 2)k converge at z = 0 but diverge at z = 3?
Prove or disprove.
11.8. Consider the series (1 + z)- 1 = 2:%"=0 (-z)k.
(i) What is the radius of convergence R?
(ii) Find a point on the boundary lzl = R where the series diverges.
(iii) Find a convergent power series expansion for f(z) = log(l + z) around
z = 0 by integrating the series for (1 + z)- 1 term by term.
(iv) What is the radius of convergence of this new series?
11.9. Let f be holomorphic on an open, connected subset A of C. Define g(z) =
f (z) on the set
A= {z I z EA}.
Prove that g is holomorphic on A, and that g'(z) = f'(z) .
11.10. Show that the length of a contour"(: [a, b] -4 <C is given by f,, ldzl.
11.11. Consider the contour integral fr lzl dz, where r is the upper half of the circle
of radius r centered at the origin oriented clockwise. Compute the integral
in two different ways:
(i) Parametrize r and calculate the integral from the definition.
(ii) Note that fr lzl dz= fr r dz . Use the Cauchy-Goursat theorem to show
that fr r dz = f~r r dt and compute this last integral.
11 .12. Evaluate the integral :j}0 (z - z) dz around the circle C = {z E <C: lzl = 1}.
11.13. The function f(z) = z 2 ~ 1 fails to be holomorphic at z = ± i. Let 'Y: [O, 27r] --*
<C be given by 'Y(t) = reit for some r > l. Prove that :}}; f(z) dz = 0 by
. I'
followmg these steps:
(i) Use the Cauchy-Goursat theorem to show that the answer is indepen-
dent of r > l.
(ii) Bound the norm of f(z) on 'Yin terms of r.
(iii) Use this bound to show that the absolute value of the integral is less
than c, if r is large enough.. Beware: It generally does not make sense
to talk about inequalities of the form f,, Iii dz ::::; f,, lgl dz, since neither
term is necessarily real.
11.14. Let 'Y be the curve 1 + 2eit fort E [O, 27r]. Compute the following:
(i) J (z+l)e"
I' z
dz .
(ii) J ----za-
(z+l)e" d
I' z.
(iii) J cos(z)
I'z2+2 z.
d
(iv) J (z2+
I'
cos(z) d
2 )2 Z.
Exercises 453
(vi) J sin(z) dz
"!~ .
11.15. Reevaluate the integral of Exercise 11.12 using the Cauchy integral formula.
Hint: On C we have 1 = lzl 2 = zz, which implies that z = ~ on C.
11.16. Reevaluate t he integral of Exercise 11.13 using the Cauchy integral for-
mula and changing the contour to two small circles, one around i and one
around -i.
11 .17. The nth Legendre polynomial Pn(z) is defined to be
Prove that
1 ):, (w2 - l)n
Pn(z) = 27ri ~ 2n(w - z)n+I dw ,
where / is any simple closed curve containing z.
11.18. Let f( z) be holomorphic such that lf(z)I :::; M for all z with lz - zo l :::; r .
Prove that
lf(nl (zo) I :::; Mn!r - n.
11.19. Prove that any function f that is holomorphic on a punctured ball B(z 0 , r) "
{zo} and is continuous at zo must, in fact, be holomorphic on the whole ball
B (zo, r), following these steps:
(i) Choose a circle/ around zo of radius less than r, and define a function
F(z) = -217it. ':YI'
A:. f(w)
w- z
dw. Prove that F(z) is holomorphic inside of/.
(ii) Prove t hat f(z) = F(z) on the interior of/, except possibly at z0 .
(iii) Show that f(zo) = F(z 0 ) , so F = f on the interior of/.
(ii) Find a sequence w 1, W2, . .. such that Icos( wn) I --* oo.
(iii) Consider the function f (z) = z 3 ~ 1 . Prove that If I --* 0 as lzl --* oo.
This function is not constant. Why isn't that a counterexample to
Liouville's theorem?
11.21. A common form of the fundamental theorem of algebra states that any poly-
nomial with coefficients in C of degree n has exactly n roots in C, counted
with multiplicity. Use Theorem 11.5.4 to prove this alternative form of the
fundamental theorem of algebra.
11.22. The minimum modulus principle states that if f is holomorphic and not
constant on a pat h-connected, open set U, and if If ( z) I of. 0 for every z E U,
then lfl has no minimum on U .
(i) Prove the minimum modulus principle.
454 Chapter 11. Complex Analysis
11.24. For each of the following functions, find the Laurent expansion on an annulus
of the form {z I 0 < lz - zol < R} around the specified point zo, and find the
greatest R for which the expansion converges
(i) z/(z - 1) around zo = 0.
(ii) ez /(z - 1) around zo = l.
(iii) sin(z)/z around z 0 = 0.
3
(iv) ez around zo = 0.
11.25. Prove that if f = L~o ak(z -- zo)k and g = L~o bj(z - zo)j are C-valued
and both have radius of convergence at least r, then the product
00 00 00
L: ak(z-zo)kL)j(z-zo)j = L L
akbj(z-zo)n
k=O j=O n=O k+j=n
has radius of convergence at least r. Hint: Use Taylor's theorem and induc-
tion to show that the nth coefficient of the power series expansion of f g is
L k+j=n akbj·
11.26. Find the Laurent expansion of the following functions in the indicated region:
(i) z/(z + 1) in the region 0 < lzl < 1.
2
(ii) ez /z in the region 0 < lzl < oo.
(iii) z(z-\)(:_ 3) 2 in the region 0 < lz - 31 < l.
Exercises 455
11.27. Uniqueness of the Laurent expansion depends on the choice of region. That
is, different regions for the Laurent expansion give different expansions. Show
this by computing the Laurent expansion for f (z) = z(z1:._ l) in the following
two regions:
(i) 0 < !z! < l.
(ii) 1 < !z! < oo.
11.28. Prove Proposition 11.6.9.
= !R
1
x2 x2
- - dx = lim - -4 dx
-= 1 + x 4 R--+=. - R 1+x
2
by considering the contour integral ef>7 l~z 4 dz, where "( is the contour con-
sisting of the union of the upper half of the circle of radius R and the real
interval [- R, R] (that is, "( = {z E C I R = !z!, 'S(z) 2'.: O} U [-R, R]). Hint:
Bound 1 ~: 4 on the upper half circle and show that this part of the integral
goes to 0 as R ~ oo.
456 Chapter 11. Complex Analysis
11.34.* For each function p(z), find the number of zeros inside of the region A
without explicitly solving for them.
(i) p(z) = + 4z 2 -1 and A is the disk JzJ < l.
z6
(ii) p(z) = z 3 + 9z + 27 and A is the disk lzJ < 2.
(iii) p(z) = z 6 - 5z 2 + 10 and A is the annulus 1 < lzJ < 2.
(iv) p(z) = z 4 - z + 5 and A is the first quadrant {z E C I ~(z) > 0,
~(z) > O}.
11.35.* Show that ez = 5z 3 - 1 has exactly three solutions in the unit ball B(O, 1).
11.36.* Prove the following extension of the zero and pole counting formula.
Let f be a C-valued meromorphic function on a simply connected open set
U with zeros z1 , . .. , Zm of multiplicities a 1 , ... , am, respectively, and poles
w 1 , ... , Wn of multiplicities b1 , . . . , bn, respectively. If none of the zeros or
poles of f lie on a closed contour 'Y, and if h is holomorphic on U, then
11.37.* Let 'Y be the circle lzl = R of radius R (traversed once, counterclockwise).
Let f(z) = zn + an_ 1 zn-l + · · · + a 1 z + ao, where the coefficients ak are all
complex numbers (and notice that the leading coefficient is 1). Show that
.
hm - .
1 :k -f(f'(z)) zdz = -an-l
27rZ
R-too 'Y Z
. 1
hm ;:,.---:
:k -f(f'(z)) zdz LAJ·
=
n
R-too L,7ri ~ Z
' j=l
11.38.* Use the holomorphic open mapping theorem to give a new proof of the
maximum modulus principle.
Notes
Much of our treatment of complex analysis is inspired by the books [MH99] and
[Der72] . Sections 11 .6- 11.8 are especially indebted to [MH99]. Other references
include [Ahl78, CB09, SS02, Tayll]. Our proof of the Cauchy- Goursat theorem is
modeled after [CB09, Sect. 47] and [Cos12]. Exercise 11.3 is from [SS02].
Part IV
Linear Analysis II
Spectral Calculus
Trying to make a model of an atom by studying its spectrum is like trying to make a
model of a grand piano by listening to the noise it makes when thrown downstairs.
-British Journal of Radiology
In this chapter, we return to the study of spectral theory for linear operators on
finite-dimensional vector spaces, or, equivalently, spectral theory of matrices. In
Chapter 4 we proved several key results about spectral theory, but largely restricted
our study to semisimple (and hence diagonalizable) matrices. The tools of complex
analysis developed in the previous chapter give more powerful techniques for study-
ing the spectral properties of matrices. In particular, we can now generalize the
results of Chapter 4 to general matrix operators.
The main innovation in this chapter is the use of a matrix-valued function
called the resolvent of an operator. Given A E Mn(IF), the resolvent is the function
R(z) = (zI -A) - 1 . The first main result is the spectral resolution formula, which
says that if f is any holomorphic function whose power series converges on a disk
containing the entire spectrum of A, then the operator f (A) can be computed using
the Cauchy integral formula
1
f(A) =- . J f(z)R(z)dz,
27r2 Jr
where r is a suitably chosen simple curve. As an immediate corollary of the spectral
resolution formula, we get the famous Cayley-Hamilton theorem, which says that
the characteristic polynomial p(z) of A satisfies p(A) = 0.
The next main result is the spectral decomposition of an operator, which says
that the space !Fn can be decomposed as a direct sum of generalized eigenspaces
of the operator, and that the operator can be decomposed in terms of how it acts
on those eigenspaces. The spectral decomposition leads to easy proofs of several
key results, including the power method for computing the dominant eigenvalue-
eigenvector pair and the spectral mapping theorem, which says that when an operator
A is mapped by a holomorphic function, the eigenvalues of f (A) are just the im-
ages f (>..), where >.. E o-(A). It also gives a nice way to write out the spectral
decomposition off (A) in terms of the spectral decomposition of A.
459
460 Chapter 12. Spectral Calculus
12.1 Projections
A projection is a special type of operator whose domain can be decomposed into
two complementary subspaces: one which maps identically to itself, and one which
maps to zero. A standard example of a projection is the operator p : JR. 3 ---+ JR. 3
given by P(x, y, z) = (x, y, 0). Intuitively, the image of an object in JR. 3 under this
projection is the shadow in the plane that the object would cast if the sun were
directly overhead (on the z-axis), hence the name "projection." We have already
seen many examples of this type of prnjection in Section 3.1. 3, namely, orthogonal
projections. In this section, we consider more general projections that may or may
not be orthogonal. 44
Throughout this section assume that V is a vector space.
12.1.1 Projections
44 Nonorthogonal projections are sometimes called oblique projections to distinguish them from
orthogonal projections.
12.1. Projections 461
(ii) JV (P) = a (I - P) .
Proof.
(ii) Note that x E JV (P) if and only if Px = 0, and this holds if and only if
(I - P)x = x. This is equivalent to x Ea (I - P), using (i). D
[~ ~] '
where I is the k x k identity matrix (on a (P)), and the zero blocks are submatrices
of the appropriate sizes.
Proof. By Theorem 12.l.4 the basis SU Tis a basis of V. By Theorem 4.2.9 the
matrix representation in this basis is of the form
A 11 0 ]
[ 0 A22 '
Proposition 12.1.10. Let A E Mn(lF) have distinct eigenvalues >-.1, >-.2, . .. , An · Let
. .. , rn be a corresponding basis of {right) eigenvectors, and let S be the matrix
r 1,
whose columns are these eigenvectors, so the i th column of S is r i . Let £{ , .. . , £~
be the rows of s- 1 ~he left eigenvectors of A, by Remark 4.3.19), and define the
rank-I map Pi = r iii . For all i , j we have
(i) tT rj = J ij ,
Proof.
(iv) 2:~ 1 pi = L:~=l rilT' which is the outer-product expansion of ss- 1 =I.
(v) This follows by combining (iii) and (iv). D
The proposition says that any simple operator L on an n-dimensional space can
be decomposed into the sum of n rank-1 eigenprojections, and we can decompose its
domain into the direct sum of one-dimensional invariant subspaces. Representing L
464 Chapter 12. Spectral Calculus
0 0 0
0 0 0
Pi= eie{ = ,
0 1 0
0 0
where the only nonzero entry is P ii = 1. In this setting it is clear that the matrix
representation of L is
.A1 0 0 0
0 .A2 0 0
A= 0 0 .A3 0 = .A1P1 + · · · + AnPn.
0 0 0 An
The previous proposition says how to express these projections in terms of any
basis-not just an eigenbasis. In other words, the eigenprojections give a "basis-
free" description of spectral theory for simple operators (or matrices). In the rest
of the chapter we will develop a similar basis-free version of spectral theory for
matrices that are not necessarily even semisimple.
A = [! ~] .
We saw in Example 4.1.12 that the eigenvalues are .A 1 = -2 and >.. 2 = 5, with
corresponding eigenvectors
r1 = [ ~1 ] and r2 = [~] .
It is not hard to check that
~ ~3] [~ ~] .
1
p1 = [4 and P2 = -
7 -4 7
12.2. Generalized Eigenvectors 465
These are both rank 1, and we can check the various properties in the lemma:
(ii) We have
-3] [3 3] = [O OJ
3 4 4 0 0 )
Pi
1[-44 -3]
+ p2 = 7 1[3 3]
3 +7 4 4 [~ ~] = I.
(v) Finally,
-3] ~ [3 3]
3 +7 4 4 [~ ~] =A.
(12 .1)
466 Chapter 12. Spectral Calculus
In the ascending chain (12.1) it might seem possible for there to be some
proper (strict) inclusions that could be followed by an equality and then some more
proper inclusions before the terminal subspace. The following theorem shows that
this is not possible-once we have an equality, the rest are all equalities.
Theorem 12.2.3. The index of any B E Mn (IF) is well defined and finite. Further-
more, if k = ind(B), then JV (Bm) ==JV (Bk) for all m 2 k, and every inclusion
JV (B 1- 1 ) c JV (B 1) for l = 1, 2, ... , k is proper.
o - 1
B = 3, -2 ll
-41 5
-1 2 .
[21 -1
-1 -3 3
The rank-nullity theorem says t hat the dimension of the range and the kernel
of any operator on IFn must add to n, but it is possible that these spaces may have a
nontrivial intersection. The next theorem shows that when we take k large enough,
the range and kernel of Bk have a trivial intersection, and so together they span
the entire space.
Theorem 12.2.5. Let BE Mn(IF). If k : :'.'.: ind(B), then JV (Bk) n ~(Bk) = {O}
and wn =JV (Bk) E!7 ~(Bk).
12. 2. Generalized Eigenvectors 467
Proof. Suppose x E JV (Bk) n a' (Bk). Thus, Bkx = 0, and there exists y E lFn
such that x = Bky. Hence, B kBky = B 2 ky = 0, and soy E JV (B 2 k) = JV (Bk).
Therefore x = Bky = 0. It follows that JV (Bk) n a' (Bk) = {O}.
To prove that lFn = JV (Bk) + a' (Bk), use the rank-nullity theorem, which
states that n = dimJV (Bk)+ rank(Bk) . By Corollary 2.3.19, we have wn =
JV (Bk) +a' (Bk). Therefore, lFn = JV (Bk) tfJ a' (Bk). D
Corollary 12.2.6. If B E Mn(lF) and k = ind(B), then a' (Bm) = a' (Bk) for all
m~ k.
Proof. By Exercise 2.8 we also have a' (Bm) c a' (Bk), so it suffices to show
dimBm = dim Bk. But by Theorem 12.2.3 we have JV (Bm) = JV (Bk); so,
Theorem 12.2.5 implies that dim Bm = dim Bk. D
We conclude this section with one final important observation about repeated
powers of an operator B on a vector x .
Proposition 12.2.7. Assume BE Mn(lF) . If Bmx = 0 and Bm- l x-=/:- 0 for some
m E z+, then the set {x, Bx, ... , Bm- 1 x} is linearly independent.
If k = ind(>..! - A), then the sequence stabilizes at JV ((AI -A)k). For each
eigenvalue>.., the space JV((>..! - A)k) is a good generalization of the eigenspace.
This generalization allows us to put a nondiagonalizable operator L into block-
diagonal form. This is an important approach for understanding the spectral theory
of finite-dimensional operators.
468 Chapter 12. Spectral Calculus
The next four lemmata provide the key tools we need to prove the main
theorem of this section (Theorem 12.2.14), which says !Fn decomposes into a direct
sum of its generalized eigenspaces.
Proof. Any subspace is invariant under the operator AI. Since A= .XI - (.XI -A),
it suffices to show that <ff>.. is (AI - A)-invariant. If x E <ff>.. and y = (AI - A)x, then
(.XI - A)ky = (AI - A)k+lx = 0, which implies that y E <ff>._. D
l 1
A= 0 1
[0 0
This matrix has two eigenvalues, .A, = 1 and >. = 3, but is not semisimple. A
straightforward calculation shows that the eigenvectors associated with >. =
1 and >. = 3 are [1 0 0] T and [1 2 2] T, respectively. An additional
calculation shows that ind(3I - A.) = 1 and ind(ll - A) = 2. So to span
the generalized eigenspace <ff1, we need an eigenvector corresponding to A = 1
and one additional generalized eigenvector v 3 E JY ((ll - A) 2 ) with v 3 ~
JY (ll - A). The vector v 3 = [O 1 OJ T satisfies this condition. Thus the
generalized eigenspaces are <ff1 = span( { [1 2 2] T , [0 1 0r}) and IC3 =
span({[l 0 O]T}).
Proof. Let k>.. = ind (AI - A) and kµ =ind (µI - A). The previous lemma shows
that <ff>.. is (AI - A)-invariant. We claim that µI - A restricted to <ff>.. has a trivial
kernel, and thus (by iterating) the kernel of (µI - A)k" on <ff>.. is also trivial. This
implies that <ff>.. n <ffµ = {O}.
To prove the claim, suppose that x E JY (µI - A) n <ff>.., so that Ax= µx and
(AI - A)k"'x = 0. Using the binomial theorem to expand (.XI - A)k"' shows that
(.A - µ)k-'x = O; see Exercise 12.9 for details. This implies that x = 0. D
12.2. Generalized Eigenvectors 469
Lemma 12.2.12. Assume that W1 and W2 are A-invariant subspaces of IFn with
W1 n W2 = {O}. If A is an eigenvalue of A with generalized eigenspace <ff>.., such
that <ff>. n Wi = {O} for each i, then
Therefore, for each i we have Xi E JV ((Al - A)k) =<ff>,. But we also have Xi E wi
by definition, and thus xi = 0. This implies that x = 0. D
Proof. The proof follows that of Theorem 4.4.5. By Schur's lemma we can assume
that A is upper triangular and of the form
where the block Tu is upper triangular with all diagonal values equal to A, and
the block T22 is upper triangular with all diagonal values being different than A.
The block Al - Tu is strictly upper triangular, and the block Al - T22 is upper
triangular but has all nonzero diagonals and is thus nonsingular. It follows that
(AI -Tur" = 0 and (AI -T22)k is nonsingular for all k EN. Therefore, dim (<ff>.) =
dim JV ((AI - A)m") =dim JV ((AI - Tu)m") = m>,. D
Theorem 12.2.14. Given A E Mn(IF), we can decompose IFn into a direct sum of
A -invariant generalized eigenspaces
IFn = <ff>., EB g>-2 EB ... EB gAr' (12 .3)
Proof. We claim first that for any A E cr(A) and for any subset M = {µ1, . .. , µe} C
cr(A) of distinct eigenvalues with A tf_ M, we have
To show this we use Lemmata 12.2.11 , 12.2.9, and 12.2.12 and induct on the size of
the set M .
470 Chapter 12_ Spectral Calculus
If [M[ = 1, the claim holds by Lemma 12.2.11. If the claim holds for [M[ = m,
then for any M' with [M'[ = m + 1, write M' = M U {v }. Set Wo = !:>.., set
W1 = <f:v, and set W2 = ffiµEM I:µ- The conditions of Lemma 12.2.12 hold, and so
!:>.. n ffiµE M' I:µ= {0}, proving the claim.
This shows we can define the subspace W = l:>.. 1 EB l:>.. 2 EB · · · EB l:>..r of lFn .
So, it suffices to show that W = lFn. By Lemma 12.2.13, the dimensions of the
generalized eigenspaces are equal to the algebraic multiplicities, which add up ton.
This implies that dim W = n, which implies W = lFn by Corollary 1.4.7. 0
45 Although we define and study the resolvent only for matrices (i.e., finite-dimensional operators),
many of the results in this chapter work in the much more general case of closed linear operators
(see, for example, [Sch12].)
12.3. The Resolvent 471
A= [~ ~]. (12.5)
Note that R(z) has poles precisely at a(A) = {3, -2}, so the resolvent set is
p(A) = C "- {3, -2}.
Remark 12.3.3. Cramer's rule (Corollary 2.9.24) shows that the resolvent takes
the form of the rational function
A~r~~~~j
Using (12.7) we compute the resolvent as R(z) = B(z)/p(z), where
B(z) =
(z - l)~(z
O
- 7) 3(z - l)(z - 7) 9(z - 7)
(z - 1) 2 (z - 7) 3(z - l)(z - 7) 9(z27- 1)
0 (z - 1) 2 (z - 7) 3(z - 1) 2
j
r 0 0 0 (z - 1) 3
and p(z) = (z -1) 3 (z - 7) .
(12.11)
Proof.
(i) Write (zi - z 2)I = (ziI - A) - (z2I - A), and then left and right multiply by
R(z2) and R(zi), respectively.
(ii) Write A 2 - Ai = (zI - Ai) - (zI - Az), and then left and right multiply by
R(Ai , z) and R(A2, z), respectively.
(iii) From (12.4), we have (zI - A)R(z) = R(z)(zI -A ), which simplifies to (12.10) .
(iv) For z 1 = z2 , (12.11) follows trivially. Otherwise, using (12 .8) and relabeling
indices, we have
Theorem 12.3.6. The set p(A) is open, and R(z) is holomorphic on p(A) with
the following convergent power series at zo E p(A) for lz - zol < llR(zo)ll - i:
00
Proof. We use the fact (from Proposition 5.7. 4) that when llB ll < 1 the Neumann
series ""£~ 0 Bk converges to (I - Bt-i . From (12.8) we have
R(zo) = R(z) + (z - zo)R(zo)R(z) =[I+ (z - zo)R(zo)]R(z) .
Setting B = -( z - z0 )R(z0 ) in the Neumann series, we have
00
and this series converges in the open neighborhood {z EC I lz - zol < llR(zo)ll-i}
of zo. Therefore p(A) is an open set, and R(z) is holomorphic on p(A) . D
46 Recall from Definition 3.5.15 that a matrix norm is a norm on Mn(lF) that satisfies the submul-
tiplicative property !IABll ~ llAllllB!I.
12.3. The Resolvent 473
Remark 12.3.7. Comparing (12.12) with the Taylor series (11.18) for R(z) reveals
a relationship between powers of R(z) and its derivatives:
(12.13)
Theorem 12.3.8. If lzl > l All, then R(z) exists and is given by
(12.14)
Corollary 12.3.9.
lim llR(z) JI = 0. (12.15)
JzJ-+oo
Moreover, R(z) is holomorphic in a neighborhood of z = oo.
Remark 12.3.10. When we say R(z) is holomorphic in a neighborhood of oo, we
simply mean that if we make the substitution w = l / z, then R(l / w) is holomorphic
in a neighborhood of w = 0.
l R(z) ll < ~ ~1
- L..., lzlk+
k =O
=
lzl
(1- ~)
_.!._
lzl
-
1
1
lzl - l All .
This shows lim JzJ -+oo l R(z)ll = 0. D
Remark 12.3.11. Remark 12.3.3 shows that the spectrum is the set of roots of the
characteristic polynomial. By the fundamental theorem of algebra (Theorem 11.5.4),
this is nonempty and finite; see also Exercise 11.21.
The previous theorem gives an alternative way to see that the spectrum is
nonempty, which we give in the following corollary.
Proof. Suppose that O"(A) = 0, so that p(A) = <C and R(z) is entire. By Corollary
12.3.9, R(z) is uniformly bounded on <C. Hence, R(z) is constant by Liouville's
theorem (Theorem 11.5.1). Thus, R(z) = limz--+oo R(z) = 0. This is a contradiction,
since I= (zl - A)R(z). D
Lemma 12.3.13. For any matrix norm II ·II and any A E Mn(lF), the limit
Proof. For 1 :Sm< k fixed, we have the inequality llAkl l :S IJAk-mllllAmll· Let
ak = log llAkJI . The inequality implies that ak :::; am+ ak-m· By the division
algorithm, there exists a unique q and r such that k = qm + r, where 0 :::; r < m,
or, alternatively, we have q = lk/mj. This gives
It follows that
ak , q 1
k '."., kam + kar .
Leaving m fixed and letting k > m grow gives
. q
1imsup -k
.
= 1imsup -
lk/mJ
k- = - .
1
k k m
It follows that
. ak .
hmksup k :S hmksup
(qkam + kar
1 ) am
= --;;;-· (12.17)
Theorem 12.3.14. The regions of convergence in Theorems 12.3.6 and 12.3.8 can
be increased to
1
(i) lz - zol < [r(R(z0 ))r and
(ii) lzl > r(A), respectively.
Proof. In each part below, let r denote r(R(zo)) and r(A) , respectively.
(i) Let z satisfy lz-zol < r- 1 . There exists an c > 0 such that Jz - zol < (r+2t:)- 1 .
Moreover, there exists N such that llR(z0 )kll 1/k < r + c for all k > N. Thus
lz - zolkllR(zo)kll :S (;.:;e)k < 1, which implies that (12.12) converges.
12.4. Spectral Resolution 475
(ii) Let z satisfy iz l > r. There exists an c. > 0 such that iz l > r + 2c.. Moreover
there exists N such that ll Ak ll 1 / k < r + c. for all k > N. Thus lzl-kllAk l <
(;.:;c)< 1, which implies that (12 .14) converges. 0
Remark 12.3.15. Theorem 12.3.14 implies that R(z) is holomorphic on lz l >
r(A), which implies
r (A) ~ O"M = sup l>- 1.
AEa(A)
In the next section we show in the finite-dimensional case that r(A) = O"M · Not
only does t his justify the name spectral radius for r(A), but it also implies that the
value of r(A) is independent of the operator norm used in (12.16).
Definition 12.4.1. Let >. E u(A) . Let r be a positively oriented simple closed
curve containing>. E u(A) but no other points of u(A) . The spectral projection (or
eigenprojection) of A associated with >. is given by
1
P>, = Res(R(z), >.) = - . J, R(z)dz. (12.18)
2'Tri Jr
Theorem 12.4.2. For any A E Mn(C) and any>. E u(A), the following properties
of P>, hold:
(i) Idempotence: Pf= P>, for all>. E u(A).
(ii) Independence: P>,PN = P>-'P>- = 0, whenever>. , NE u(A) with>. =f. >.'.
(iii) A -invariance: AP>, = P>,A for all>. E u(A).
(iv) Completeness: 2=>-Ea(A) P>, =I.
Proof.
(i) Let r and r' be two positively oriented simple closed contours in p(A) sur-
rounding>. and no other points of u(A). Assume also that r is interior to I''
as in Figure 12.1. For each z E r we have
dz'
1~ = 0.
ir'
- - = 27Ti
z' - z
and
.'fr z' - z
By Hilbert's identity we have
476 Chapter 12. Spectral Calculus
,\
•
(~ ) 11
2
P'f. =
2m Jr }r, R(z)R(z')dz' dz
= (~)2 l, l, R(z),- R(z') dz' dz
2?Ti 1r .'fr, z -- z
1
= - . 1 R(z)dz = P>-..
2?Ti 1r
(ii) Let rand f' be two positively oriented disjoint simple closed contours in p(A)
surrounding,\ and A', respectively, and such that no other points of a(A) are
interior to either curve, as in Figure 12.2. Note that
i dz'
--=0
. rr z' - z
and 1~=0.
.'fr z' - z
,\
• ,\'
•
Figure 12.2. Diagram to assist in the proof of Theorem 12.4 .2(ii). In this
case z is not interior tor', nor is z' ·interior tor, and no points of a(A) "'-{A,,\'}
are interior to either curve.
12.4. Spectral Resolution 477
P;_P;_, = (
2 ~i ri fr,, R(z)R(z')dz' dz
= ( ~ ) 2 l, l, R(z) ,- R(z') dz' dz
27ri Jr }r, z - z
[1Jr R(z) (1Jr, ~)dz - Jr,1 R(z') (1,fr -{!!-)dz']
2
1
= ( -27rZ.) Z - Z Z - Z
= o.
(iii) This follows directly from (12.10), that is,
(iv) Let r be a positively oriented circle centered at z = 0 having radius R > r(A).
Theorems 12.3.8 and 12.3.14 guarantee that the Laurent expansion R(z) =
L:~o Akz-(k+l) holds along r, so the residue theorem (Theorem 11.7.13)
applied to this Laurent series gives
00
-1. ck
R(z)dz = - 1 . :/; Ak L0
k+l dz = A =I.
27rZ • r 27rZ r k=O z
where each I';_ is a positively oriented simple closed contour containing A and
no other points of o-(A). D
Remark 12.4.3. The double integrals in the previous proof require Fubini's
theorem (Theorem 8.6.1) to change the order of integration. Although we only
proved Fubini's theorem for real-valued integrals, it is straightforward to extend it
to the complex-valued case.
R(z) = z -1 3 (1[3
5 3 2])
2 + z +12( 15[-23 -2])
3 .
478 Chapter 12. Spect ra l Calculus
P3 =
1[3 2]
5 3 2
o o
P1 =
[
0 0
0 0 0 1/2 ~ ~~~1 and
0
0
1
-1/81
-1/4
-1/2 .
0 0 0 1 0 0
f(A) = -~ 1 (12.19)
27ri 1r J(z)R(z)dz.
Proof. Express f(z) as a power series f(z) = L,':=o akzk . W ithout loss of gener-
ality, let r be a positively oriented circle centered at z = 0 having radius bo, where
r(A) < bo < b. Thus, we have
-1. ck f(z)R(z)dz = - 1 .
2 7ri . r 2Ki r
i f(z)z- 1
00
L kAk
z
k= O
dz.
Proof. By Theorem 12.3.14, we know that r(A) 2: O'Af . For equality, it suffices to
show that r(A) :::; O'M +c for all c > 0. Let r be a positively oriented circle centered
at z = 0 of radius O'Af + c. By the spectral resolution formula, we have
1
An = - . J.. znR(z)dz . (12.20)
27rt Jr
Hence,
where
K = sup ll R(z)l loo ·
r
This gives
Proof. Let r be a simple closed contour containing O'(A). By the spectral resolution
formula (12.19) and Cramer's rule (Corollary 2.9.24) , we have
p(A) = - . lcf
27rt . r
det(zl - A)(zl - A) - 1 dz = - .
27ri. r
lcf.
adJ(zI - A)dz = 0,
Nota Bene 12.4.9. It might seem tempting to try to prove the Cayley-
Hamilton theorem by simply substituting A in for z in the expression
det(zJ - A) . Unfortunately it doesn't even make sense to substitute a matrix
A for the scalar z in the scalar multiplication zI. Contrast this with substi-
tuting A for z into p(z) , which is a polynomial in one variable, so p(A) is a
well-defined matrix in Mn(C) .
1 J. R(z)
Ak = 27fi .'fr (z - >.)k+I dz, (12.22)
where r is a positively oriented simple closed contour containing >. and no other
points of O'(A); see Section 11 .6.3 for a review of Laurent expansions.
The main goal of these two sections is to establish the spectral decomposi-
tion formula for any operator A E Mn (IF). The spectral decomposition formula
is the generalization of the formula A = 2=>. >.P>. for semisimple matrices (see
Proposition 12.1.lO(v)) to general, not necessarily semisimple, matrices. We then
show in Section 12.7 that the spectral decomposition is unique. Writing out the
spectral decomposition explicitly in terms of a specific choice of basis gives the
popular Jordan normal form of the operator, but our development is a basis-free
description that works in any basis.
Nata Bene 12.5.1. Do not confuse the coefficient Ak of the Laurent expan-
sion (12.21) of the resolvent of A with the power Ak of A. Both can be
computed by a contour integral: Ak can be computed by (12.22) and Ak can
be computed by (12.20) (an application of the spectral resolution formula).
But despite the apparent similarity, they are very different things.
Lemma 12.5.2. Assume that>. E a-(A). Let r and f' be two positively oriented
simple closed contours in p(A) surrounding>. and no other points of O'(A) . Assume
also that r is interior to r'' that z' is a point on r'' and that z is a point on r' as
depicted in Figure 12.1. Let
1, n2'.0,
fin= {
0, n < 0.
(ii)
(12.24)
12. 5. Spectral Decomposition I 481
Proof.
(i) Since z' is outside ofr, the function (z' - z) - 1 is holomorphic within. Expand
(z' - z)- 1 in terms of z - ;\to get
1 1 1 oo (z - >.)k
z' - z z' ->- 1-
z - ;\) =
--
L (z' -
k=O
;\)k+l ·
( z' - >-
Inserting into (12.23) and shrinking r to a small circler>- around;\ with every
point of r>- nearer to;\ than z' (see Figure 12.3) gives
1
27ri
j
Jr,.. (z - >-)
-m - 1 [~ (z - >.)k
-f;:o (z' - ;\)k+l
ldz
(ii) In this case, both ;\ and z lie inside r. Split the contour into r >- and r z as in
Figure 12.3, so the left side of (12.24) becomes
~ j (z' - ;\) - n- 1 (z 1
- z) - 1 dz' +~ j (z' - >.) - n- 1 (z 1 - z)- 1 dz'
27ri Jr,.. 2m Jr"
= (1 - TJn)(z - ;\)-n-1.
The first integral follows the same idea as (i), except with a minus sign. The
second integral is an application of Cauchy's integral formula (11.14). D
Lemma 12.5.3. The coefficients of the Laurent expansion of R(z) at;\ E O"(A)
satisfy the identity
(12 .25)
r'
Figure 12.3. Diagram of the paths and points in Lemma 12.5.2. The
contour integral over r' (red) is the same as the sum of the integrals over the blue
circles r z and r >-.
482 Chapter 12. Spectral Calculus
Proof. Let r and r' be positively oriented simple closed contours surrounding .A
and no other points of O"(A). Assume also that r is interior to r' as depicted in
Figure 12.l. We have
(~) J J (z -
2
Remark 12.5.4. Note that P>. = A_ 1 , so Lemma 12.5.3 gives another proof that
P>. is a projection, since
Lemma 12.5.5. Fix A E O"(A) and define D>. = A_2 and S>. = Ao. The following
hold:
(i) For n ::'.: 2, we have A_n = D~- 1 .
(ii) For n:::: 1, we have An = (-1)ns~+ 1 .
R( z ) -_ z P>.
- .A + '"'
L..... (z - D>.
.A) k+ 1 + '"'
L.) -1) k( z - .A) k S >.k+1 . (12.27)
k=l k=O
(v) We have
- - 0
o 1
0
o1 -1/41
- 1/ 2
and A_3
2
o o
= D i = 9 00 00 0 ~ -~21
A_2 - Di - 3 0 0 0 0 0 .
r0 0 0 0 r0 0 0 0
Note that the holomorphic part of the Laurent expansion is a geometric series,
which sums nicely to give the final expression
Dr P1
R (z) =
(z - 1)3
+ (z Di
-1 )2
+ - Pi
z- 1
- + - -
z-7·
(12. 29)
(12.30)
Let f>. be a positively oriented circle around>. containing no other points of O'(A).
By definition, we have R(z)(zI - A)== I, so zR(z) = AR(z) +I. This gives
AP>.=~ j AR(z)dz
2ni Jr,,
= ~ 1 AR(z) + Idz (I is holomorphic)
2ni Jr,,
= ~ 1 zR(z)dz
2ni Jr,,
=
1
- . 1 >.R(z)dz + -21 . 1 (z - >.)R(z)dz
2n i .fr,, ni Jr,,
= >-.P>. +D>. .
To prove r(D>.) = 0 parametrize f>. by z(t) = >. + peit for any sufficiently
small choice of p > 0. By Lemma 12.5.5(i) we have
2~ lifo
2
= 7r leikt R(z(t))pieit dtll
~1
1 3 0
0 1 3
A=
0 0 1
0 0 0
[~ ~rn
D1 = (A - J)P1 =
3
0
0
3
0
1
0
0 -1/81
-1/4 [o 1
= 3 0 0 1 -1/41
0
-1/2
0 0 0 1 -1/2 0 0 0 0 '
0 0 0 0 0 0 0 0 0
12.6. Spectral Decomposition II 485
which agrees with our earlier computation in Example 12.5.6 and clearly has
all its eigenvalues equal to 0. Moreover
0 01[0 0 0 1/81
[~ ~1
3 0 0
~(A - ~ r-6~ -6 3 0 0 0 0 1/4 0 0
D, 7I)P, 0 -6 3 0 0 0 1/2 0 0
0 0 0 0 0 0 1 0 0
Remark 12.6.4. Recall that the order of a nilpotent operator B E Mn(lF) is the
smallest m such that Bm = 0. Since the order m of a nilpotent operator is the same
as its index, we have from Exercise 12.6 that m ::::; n .
Proposition 12.6.5. For each>. E O"(A), the order mA of the nilpotent operator
DA satisfies mA ::::; dim&? (PA).
Remark 12.6.6. The proposition implies that R(z) is meromorphic, that is, it has
no essential singularities. More precisely, (12.27) becomes
PA m" - 1 Dk oo
~ A ~ k k k+l
R(z) = z->. + ~ (z - >.)k+l + ~ ( - 1) (z - >.) SA . (12.31)
k= l k=O
Therefore the principal part (12.28) becomes
PA m"-l D~
PAR(z) = R(z)PA = z - A+ L
(z - >.)k+l. (12.32)
k= l
Lemma 12.6.7. LetA E O"(A) andy EV. If(>.I-A)y E &?(PA), theny E &?(PA).
Proof. Assume y -:/- O; otherwise the result is trivial. Let v = (AI - A)y. If
v E &? (PA) , then v = PA v . Independence of the projections (Theorem 12.4.2(ii))
implies that Pµ v = 0 whenever µ E O"(A) and µ-:/- >.. Combining this with the fact
that PµA = µPµ + Dµ (12.30) and the fact that Pµ and Dµ commute (12 .26) gives
0 = Pµ(>.I - A)y = >.Pµy - µPµy - Dµy,
486 Chapter 12. Spectral Calculus
which implies
DµPµy = Dµy = (>. - µ)PµY·
Since Dµ is nilpotent, it follows that r(Dµ) = 0, which implies >. = µ (which is
false) or Pµy = 0 . The fact that I= °L::µEa(A) Pµ (see Theorem 12.4.2(iv)) gives
y = l= Pµy = P>,y,
µEa(A)
Remark 12 .6.8. The proof of the previous lemma did not use the definition of
the projections Pµ and the nilpotents Dµ, but rather only the fact that the Pµ are
projections satisfying the basic properties listed in Theorem 12.4.2, and that the
Dµ are nilpotents satisfying commutativity of Pµ with Dµ and satisfying PµA =
µPµ + Dw Thus, the lemma holds for any collection of projections and nilpotents
indexed by the elements of u(A) and satisfying these properties.
Theorem 12.6.9. For each >. E u(A), the generalized eigenspace <ff>, is equal to
!% (P>J·
Proof. We first show that <ff>, c .~ (P>-.). Recall that IC>, C JV ( (>.I - A )k>-),
where k>, = ind(AI - A). Choose x E IC>, so that (>.I - A)k>-- 1 x -=f. 0 . The set
{x, (AI -A)x , ... , (AI -A)k>-- 1 x} is a basis for If>-. by Proposition 12.2.7. It suffices
to show that each basis vector is in ~~ ( P>-.) .
If y = (AI - A)k>-- 1 x, then (M - A)y = 0 E !% (P>-.), which from Lemma
12.6.7 implies y E !% (P>,). Similarly, y E !% (P>-.) implies (AI - A)k>-- 2 x E !% (P>-.) ·
Repeating gives (AI - A)lx E !% (P>-.) for each£. Thus, <ff>, c !% (P>-.) ·
Finally, we note that lFn = E9 !% ( P>,) and P = E9 IC>,. Since IC>, C !% ( P>-.) for
each>. E u(A), it follows that!% (P>,) =IC>,. D
Remark 12.6.10. The previous theorem holds for any collection of projections
and nilpotents satisfying the properties listed in Remark 12.6.8. This is important
for the proof that the spectral decomposition is unique (Theorem 12.7.5).
o o 1 -1;2]
D2 = g 0 0 0 0 and D 31 -- o·'
1
[0 0 0 0
0 0 0 0
hence, Di has order 3. On the other hand, P1 has rank 1 and D1 has order 1,
that is, D 7 = 0, as we have already seen.
12.6. Spectral Decomposition II 487
R(z) = L [z -
>.Ea (A)
p>.
), +
m"-
L
k=l
1
D~ l
(z - >.)k+l ' (12.33)
A= L >-P>.+D>. . (12.34)
>.Ea(A)
R(z) = R(z) L
>.Ea(A)
P>. = L
>.Ea(A)
R(z)P>. = L
>.Ea(A) k=l
1
[z ~\ - mf (z -~~)k+l l·
Similarly Lemma 12.6.1 yields
(12.35)
488 Chapter 12. Spectral Calculus
Proof. For each),. E O"(A) , let f;, be a small circle around),. that contains no other
points of er( A) and which lies inside of the open, simply connected set U containing
cr(A). For convenience of notation, set ao,>. = f(>..) . By the spectral resolution
formula (Theorem 12.4.6) we have
f(A) =
2 ~i L
>-Ea-(A)
ir>-
f(z)R(z) dz
In the next section we show (Theorem 12.7.6) that (12.35) is not just a way
to compute f(A), but that it is actually the spectral decomposition of f(A) .
Ex ample 12.6.15. For each),. E cr(A), let m;, be the order of D;, .
(12.37)
(12 .38)
(ii) Given t E IR, we compute f (A) = eAt. The Taylor expansion of ezt
around).. is
(12.39)
12.7. Spectral Mapping Theorem 489
f(z) - f(>.)
g(z) = z - >. '
z i= >. ,
{ f'(>.), z = >..
Note that g(z) is holomorphic in a punctured disk around >. and it is continuous
at >., so by Exercise 11.19 it must be holomorphic in a neighborhood of>.. Also,
g(z)(z - >.) = f(z) - µ, and hence g(A)(A - AI) = f(A) - µI.
If x is an eigenvector of A associated to >., then
from Examples 12.4.5 and 12.6.13, with a(A) = {1 , 7}. The spectral mapping
theorem allows us to easily determine the eigenvalues of the following matrices:
(i) Let B = Ak for k E z. Since f(z) = zk is holomorphic on an open,
simply connected set containing a(A) , it follows by the spectral mapping
theorem that a(B ) = a(Ak) = {1, 7k} .
(ii) Let B = eAt for t E JR. Since f (z) = ezt is holomorphic on an open,
simply connected set containing O"(A) , it follows by the spectral mapping
theorem that
Remark 12.7.4. For a less-complicated matrix, like the one found in Example
12.3.4, the spectral mapping theorem may not seem like a big deal, but for more
general matrices the spectral mapping theorem can be extremely helpful. The
spectral mapping theorem also plays an important role in the proof of the power
method for finding eigenvectors and in the proof of the Perron-Frobenius theorem,
presented in the next section.
Theorem 12.7.5. Given A E Mn(IB'), assume that for every.XE O"(A) there is a
projection Q>. E Mn(IF) and a nilpotent C>. E Mn(IF) satisfying
12.7. Spectral Mapping Theorem 491
(i) Q~ = Q;.. ,
(ii) Q>.Qµ = 0 for allµ E o-(A) withµ#- .\
given in Corollary 12.6.14 is the spectral decomposition of f(A) . That is, for each
v E O"(j(A)), the eigenprojection P 11 for f(A) is given by
L L: ak,µD~. (12.42)
µEo-(A) k = l
f(µ)=r.1
(12.43)
Thus, if A is the matrix from Examples 12.4.5 and 12.6.13, then we have that
( 12.44)
-~71
- 21 63
7 -21
0 7 -3 .
0 0 1
Ak = ~ j zk R(z)dz,
2ni Jr
where r is a positively oriented circular contour centered at zero having radius
greater than 1. Using the residue theorem (Theorem 11.7.13) , we have that
(12.45)
where r 'r/ is a positively oriented circular contour centered at zero and having radius
rJ, and f 1 is a small positively oriented circular contour centered at 1 and having
radius 1 - rJ.
From (12 .37), we have that
The last equality follows because the eigenvalue ,\ = 1 is semisimple, and thus
A _ 1 = P and A-e-i = 0 for each C 2 1. Combining this with (12.45) we have
Ak - P =~ j zkR(z)dz.
2ni Jr.,,
For any given operator norm II · II, following the proof of Corollary 12.4.7 gives
1/5 2/5
2/5]
A= 3/5 1/5 2/5 .
[ 1/5 2/5 1/5
It is easy to show that o-(A) = {1, -1/5} , with the second eigenvalue having
algebraic multiplicity two. Therefore, the eigenvalue 1 is simple. For any xo
with Px 0 -=/= 0, semisimplicity of the eigenvalue >. = 1 means that Pxo is an
eigenvector of A with eigenvalue L
The iterative map Xk = Axk--1 gives
Remark 12.8.2. Sometimes it can also be useful to use the notation B ~A when
B - A ~ 0, and B >- A when B - A >- 0.
Remark 12.8.3. If A >- 0, then Ak >- 0 for all k E ;z;+, simply because every entry
is a sum of products of strictly positive numbers. This implies that A is not nilpo-
tent, and therefore, by Lemma 12.6.~:, the spectral radius r(A) must be positive.
12.8. The Perron-Frobenius Theorem 495
1 1 1 2
R(z) = - I+ - 2 A+ - A
z z z3
+ .. · (12.46)
1 1 1 2
R(.A) = - I+ - A+ - A
.A ,x.2 A3
+ .. ·
R(z)Dm;.-1
>..
= ~
~
~+ ~
[z - µ
mj - 1
~
Dk
µ,
(z - µ)k+l
l Dm;.-1
.A
=
Dm;. - 1
----">..' ------
z - .A '
(12.47)
µ,Eu(A ) k=l
where ID>.. is the order of the eigennilpotent D.x (define D';:;.-l = P.x if ID>.. = 1).
But R(z) = (zl - A)- 1 , so (12.47) implies that
which gives
(12.48)
Hence, the nonzero columns of D';:;. - l are eigenvectors of .A. By (12.31) we also have
But for lzl > .A = r(A), the resolvent R(z) can be written (see Theorem 12.3.8) as
R(z) = :Z:::::~o J:,. Since A~ 0 we also have limz-+.A+ (z - .A)m>- R(z) ~ 0. D
The next theorem shows that for positive matrices the real eigenvalue equal
to its spectral radius is simple and the eigenvector is componentwise positive. We
will need t he following simple lemma in the proof.
Proof. Let B, C E Mn(IR) with B, C t: 0. Assume there exists some k such that
bkk> 0 and Ckk > 0. The (k, k) entry of BC is
n
L bkiCik = bkkCkk +L bkiCik> (12.49)
i=l if.k
but bkkCkk is strictly positive, and the remaining terms are all nonnegative, so the
(k, k) entry of BC is strictly positive.
If bkk > 0, then by induction, using (12.49) with C = Bm- 1, the (k, k) entry
of Bm = B B m - l is also strictly positive for all m E z+; hence Bm -=f. 0. D
Remark 12 .8. 7. For any nonnegative matrix A, if a is the smallest diagonal en-
try of A, then t he end of the previous proof (taking € = a) shows that u(A) C
B(a, r(A) - a). Thus, in the case of a nonnegative matrix with positive diagonal,
we still have the conclusion that the eigenvalue ,\ = r(A) (often called the Perron
root or Perron-Probenius eigenvalue) is t he only eigenvalue on the circle [z[ = r(A).
1/5
A= 3/5
[ 1/5
12.8. The Perron-Frobenius Theorem 497
>. = r(A)
Proof. We know from Theorem 12.8.4 that A has an eigenvalue >.. equal to its
spectral radius r(A) and that >.. has a nonnegative eigenvector. What remains
to be shown is that >.. is simple and the nonnegative eigenvector is positive. By
Proposition 12.8.10 the matrix (J + A) is primitive. We define B = (J + A)k,
where k E ;z;+ is chosen to be large enough so that B >- 0. It follows from the
spectral mapping theorem that >.. E oc(A) if and only if (1 + >..)k E a-(B), and thus
the algebraic multiplicity of >.. is equal to the algebraic multiplicity of (1 + >..)k.
Observe that
r(B) = max J(l +>..)kl= max J(l + >..)ik = { max 1(1 + >..)i}k = (1 + r(A))k
>.Ea(A) >.Ea(A) >.Ea(A)
because when the disk izl ::; r(A) is translated one unit to the right, the point of
maximum modulus is z = 1 + r(A). Since B is a positive matrix, the algebraic
multiplicity of the eigenvalue at r(B) is one, and so the algebraic multiplicity of>..
must also be one.
Finally, let v be the nonnegative eigenvector of A corresponding to>..= r(A) .
Since v = [v1 , ... , Vn] T is not identically zero, there exists an i with vi i= 0. For
each j E {1, .. . , n} there exists a k such that the (i,j) entry of Ak is positive (since
A is irreducible). This implies that the ith entry of Akv is positive. But we have
Akv = >..kv, and thus vi > 0. Since this holds for every j, we have v >- 0. D
Assume that at each time step a user moves to a new page by clicking on a link
(selected with equal likelihood). Let A= [%] E M 4 (1R) , where aiJ is the probability
that a user at page j will click on the link for page i. At the first page a user will
click on pages two or four with equal probability; hence, a2 1 = a 41 = 0.5. A user
at page two will click on page one with probability one; hence, a 12 = 1. The third
page has two links, going to pages two and four . Thus, a23 = a43 = 0.5.
47 This algorithm is named after Larry Page, one of Google's founders.
12.8. The Perron - Frobenius Theorem 499
The fourth page presents a problem, because it has no outbound links. Instead
of setting all the corresponding entries to 0, we assume the user will randomly
''teleport" to another page (all with equal probability), so ai 4 = 0.25 for each i.
Putting this all together, we have
0 1 0 0.25]
A = 0.5 0 0.5 0.25
0 0 0 0.25 .
[
0.5 0 0.5 0.25
If ek E JR 4 is the kth standard basis vector, then Aek is the kth column of A-the
vector describing the probability that a user starting at page k will move to each of
the other pages. If the kth entry of x E JR 4 is the percentage of all users currently
on page k, then the product Ax describes the expected percentage of users that will
be on each of the pages after the next step. Repeating the process, A 2 x describes
the percentage of traffic that will be at each page after two steps, and Akx the
percentage of traffic at each page after k steps.
Notice that A is nonnegative and every column sums to 1, so A has a left
eigenvector l = ll. T of all ones, with corresponding eigenvalue l. But we also have
ll A ll 1 = 1, so Lemma 12.3.13 implies that r(A):::; 1; hence r(A) = l.
A right eigenvector r corresponding to the eigenvalue 1 satisfies Ar = r. If
r is scaled so that its entries are nonnegative and sum to 1, then it represents the
distribution of traffic in a steady state, where the overall distribution of traffic at
the next time step Ar is the same as the distribution r at the current time. If the
eigenvalue 1 has a one-dimensional eigenspace, then there is a unique nonnegative
choice of r whose entries sum to 1, and the percentage rk of traffic at the kth page
is a reasonable indicator of how important that page is.
The same logic applies to an arbitrary number n of pages with any config-
uration of links. Again construct a matrix A E Mn (IF) corresponding to the link
probabilities, following the approach described above. Since A is nonnegative and
its columns sum to 1, the previous argument shows that r(A) = 1, and if the
eigenvalue 1 is simple, then there is a unique nonnegative right eigenvector r whose
entries sum to l. This eigenvector gives the desired ranking of pages.
The one remaining problem is that the eigenvalue 1 is not necessarily simple.
To address this , the PageRank algorithm assumes that at each time step a percent-
age c < 1 of users follow links in A and the remaining percentage 1 - c > 0 teleport
to a random web page. If all web pages are equally likely for those who teleport,
the new matrix of page-hit probabilities is given by
1- c
B=cA+ - - E, (12.50)
n
where E E Mn (JR) is the matrix of all ones, that is, E = ll.ll. T. All of the columns
of B sum to one, and applying the same argument used above gives r(B) = 1.
Depending on the choice of c, the matrix B might be a more realistic model
for Internet traffic than A is, since some users really do just move to a new page
without following a link. But another important advantage of B over A is that
B >- 0, so Perron's theorem (Theorem 12.8.6) applies, guaranteeing the eigenvalue
1 is simple. Thus, there is a unique positive right eigenvector r whose entries
sum to 1, and we can use r to rank the importance of web pages. Moreover,
500 Chapter 12. Spectral Calculus
since the Perron root 1 is simple, the power method (see Theorem 12.7.8) guaran-
tees that for every nontrivial initial choice x 0 ~ 0, the sequence xo, Axo, A2 xo, .. .
converges to r.
Vista 12.8.12. Probability matrices like those considered above are examples
of Markov chains, which have widespread applications. Both Perron's theorem
and the power method are key tools in the study of Markov chains.
The projections Po and P* commute with A, and so~ (Po) and~ (P*) are
both A-invariant. We can write lFn =~(Po) EB~ (P*), and since I= Po+ P* we
have
A = APo + AP* = Do + AP*. (12 .52)
Definition 12.9.2. Let A E Mn(lF) have spectral decomposition (12.51), and let
P* be as in Definition 12.9.1 . Let C denote the restriction of A to &l (P*). The
operator C has no zero eigenvalues, so it is invertible. Define the Drazin inverse
AD of A to be the operator
Remark 12.9.4. Since lFn = &l (P*) EB JY (P.), we may choose a basis such that
the operator A can be written in block form as
where S is the change of basis matrix that block diagonalizes A into complemen-
tary subspaces, N is a nilpotent block of order k, and the block M is the matrix
representation of C in this basis. Thus, we have
AP =
*
s- 1 [M0 OJ S '
0
D0 = 3 - 1 [O
0 N
OJ S '
The spectral decomposition of the Drazin inverse is just like the spectral de-
composition for the usual inverse given in (12.43), except that the terms that would
have corresponded to the eigenvalue 0 are missing, as shown in the next theorem.
Theorem 12.9.5. For A E Mn(lF) with spectral decomposition (12.51), the Drazin
inverse AD can be written as
(12.54)
where each m;.. denotes the algebraic multiplicity of the eigenvalue >- E u(A) .
C = L >-Pc,>. +De,>.,
>.Eu(C)
and observe that u(C) = u(A) "'{O}. Using (12.43) gives the spectral decomposi-
tion of c- 1 as
By the definition of C, we have that Pc,>-. o P* = P>-. and De,>-. o P* = D>-. for every
>. E lT(C) , and this implies that
10 01 00 -927] 3
0 0 3
o 0 -27]
9
and Di=
[0 0 0 0
Pi= 0 0 1 3
[0 0 0
0 0 0
0
0
.
AD=
1
0
[0
0
-3
1
0
0
9
-3
1
0
-18
81
0
3 .
l
It is straightforward to verify that AD A = AAD = P 1 and AD AA D = AD.
Example 12 .9.7. For the matrix A from Example 12.9.6 we can use the
Wedderburn decomposition to compute A 0 and show that we get the same
answer as we got from the spectral decomposition in that example. For details,
see Exercise 12.43.
12.9. The Drazin Inverse 503
(12 .55)
Proof. Note that the function f(z) = l/z has Taylor series around A =f. 0 equal to
Proposition 12.9.9. For A E Mn(lF) with index ind(A) = k, the Drazin inverse
AD of A satisfies
(i) AAD =AD A ,
(ii) Ak+ 1 AD = Ak , and
(iii) AD AAD =AD.
Similarly, we have
(iii) BAB= B .
Since I= P* +Po, it suffices to show that BP* =ADP* and BP0 =AD Po= 0.
From (i) and the fact that Po and P* commute with A , we have
and thus
AP*BP0 Ak-·l = P*BP0 Ak = 0.
Since A is invertible on fl (P*), this gives
P*BPoAk-i = 0.
Using the argument again, we have
which yields
and
By (ii) we have
Let P* = _AD A = A.AD be the projection of Definition 12. 9.1 for the matrix A, and
let Po and Do be the 0-eigenprojection and 0-eigennilpotent, respectively, of A. We
show below that the general solution is x(t) = eAD Bt P*q, where q E !Fn is arbitrary.
Taking the derivative of the proposed solution, we have that x' (t) = AD Bx( t).
Plugging it into (12.56) gives P.Bx(t) = Bx(t). Since B commutes with P., it
suffices to show that Pox(t) = 0.
Recall that B = I - >.A. Multiplying (12.57) by Po gives
Definition 12.10.1. For A E Mn(JF) and b E lFn, the kth Krylov subspace of A
generated by b is
J£k(A, b) = span{b, Ab, A2 b, .. . , Ak-lb } .
If dim(J£k(A, b)) = k, we call {b, Ab, A2 b, . . . , Ak-lb} the Krylov basis of J£k(A, b).
For each .A E O"(A) let g>-. = !Ji! (P>-.) be the generalized eigenspace of A associated
with .A and let m>-. = ind(D>-.)· For d 1 , . . . , de as in the previous proposition, there
is a basis for g>-. in which .AP>-.+ D>-. is represented by an m>-. x m>-. matrix that can
be written in block-matrix form as
0
0
0
.A 1 0 0
0 .A 1 0
0 0 0 1
0 0 0 .A
Remark 12.10.6 (Finding the Jordan normal form). Here we summarize the
algorithm for finding the Jordan form for a given matrix A E Mn(lF).
(i) Find the spectrum {.Ai, ... , Ar} of A and the algebraic multiplicity mj of each
Aj. (When working by hand, on very small matrices, this could be done by
computing the characteristic polynomial PA(z) = det(zl - A) and factoring
the polynomial as fl;=i (z - Aj )m;, but for most matrices, other algorithms,
such as those of Section 13.4, are both faster and more stable.)
(ii) For each Aj, compute the dimension of JV ((.\jl - A)k) fork= 1, 2, ... until
dim JV ((.\jl - A)k) = mj· Setting dJk = dimJV ((.\jl -A)k), this gives a
sequence 0 = dj 0 < d)i < · · · < djr = mj.
(iii) The sequence (djJ gives information about the sizes of the corresponding
blocks. The sequence is interpreted as follows: dj 1 - djo = dli tells how many
Jordan blocks of size at least one corresponding to eigenvalue Aj there are.
Similarly d12 - dj 1 tells how many Jordan blocks are at least size 2, and so
forth.
(iv) Construct the matrix J using the information from step (iii).
11 - A ~ [ ~: ~; ~;]
and di = dim JV ((ll -A)i) = 2. Since (ll - A) 2 = 0, we have d2 =
dim JV ((ll -A) 2 ) = 3, which is equal to the multiplicity of.\.
[H I]
510 Chapter 12. Spectral Calculus
=
o3 -1 1lj
- 2 -4 5
A
[21 -1
-1
-1
-3
2 .
3
To find t he Jordan normal form, we first verify that CT(A) = {O} . We already
saw that A 2 = 0, and that dim A' (A) = 2, so the geometric multiplicity
of the eigenvalue >. = 0 is 2. This shows that 6"o can be decomposed into
two complementary two-dimensional subspaces. The 0-eigenspace is A' (A) =
span{x1,X3}, spanned by X1 = [2 1 1 O] T and X3 = [-1 1 0 l]T .
We seek an x2 so that {(A-- >.I)x2,x2} = {x 1,x2} and an x4 so that
{(A- U)x4,x 4} = {x3,x4}. So we must solve (A- OJ)x2 = Ax2 = x1 and
Ax4 = X3. We find X2 = [-1 -2 0 o]T and X4 = [1 1 0 of. Verify
that the set {x 1,x2,x3 , x4} = {Ax 2, x2,Ax4, x 4} is a basis and that this basis
gives the desired normal form; that is, verify that we have
2 -1
O 1 0 OJ 1 -2
J = s- 1AS = 0 0 0 0 where S =
[00 00 00 01 ,
[
1
0 0
0
First verify that CT( A) = {1} and A'(I-A) = span{z 1,z2} , where z 1 =
[o 1 o]T and z 2 = [-2 o 1]1".
Since (I -A) 2 = 0, we have two blocks that are (I -A)-invariant: a 1x1
block and a (not semisim ple) 2 x 2 block. To find the 2 x 2 block we must find an
x that is not in A' ((A - >.I)). This will guarantee that { (A - >.I) x , x} has two
elements. In our case a simple choice is x2 = [1 0 O] T. Let (A->.I) x2 = x 1.
Since (A - !) 2 = 0, we have (A - I)x 1 = 0, which gives Ax 1 = x 1 and
Ax2 = X1 +x2 .
Verify that z1 is not in span(x1, x2) , so {x 1, x 2, z i} gives a basis of JF 3.
Verify that letting P = [x1 x2 z1] puts p-l AP in Jordan form . The
verification of the preceding details is Exercise 12.52.
Exercises 511
Exercises
Note to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
first and second line are for Section 1, the exercises between the second and third
lines are for Section 2, and so forth.
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *). We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with .&. are especially important and are likely to be used later
in this book and beyond. Those marked with t are harder than average, but should
still be done.
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
12.l. Verify the claim of Example 12.l.2; that is, show that (I - P) 2 = I - P.
12.2. Show that the matrix
2
1 [2 cos B sin 2B ]
2 sin 28 2 sin 2 e
is a projection for all B ER
12.3. Let P be a projection on a finite-dimensional vector space V. Prove that
rank(P) = tr (P). Hint: Use Corollary 12.l.5 and Exercise 2.25.
12.4. Let F E Mn(lF) be the n x n matrix that reverses the indexing of x, that
is, if x = [x1 x2 xnr, then Fx = [xn Xn-1 . . . x1]T. Define
E E Mn(lF) such that Ex = (x + Fx)/2. Prove that E is a projection.
Determine whether it is an orthogonal projection or an oblique projection.
What are the entries of E?
12.5. Consider the matrix
A= [~ ~].
Find the eigenvalues and eigenvectors of A and use those to write out the
eigenprojection matrices P 1 and P 2 of Proposition 12.1.10. Verify that each
of the five properties listed in that proposition holds for these matrices.
12.11. Assume NE Mn (IF) is nilpotent (see Exercise 4.1), that is, Nk = 0 for some
k EN.
(i) Use the Neumann series of I + N to prove that I+ N is invertible.
(ii) Write (J + N)- 1 in terms of powers of N.
(iii) Generalize your results to show that >.I+ N is invertible for any nonzero
>. E C.
(iv) Write (>.I+ N)- 1 in terms of powers of N.
12.12. Let
A= rn ~].
(i) Write the resolvent R(z) of A in the form R(z) = B(z)/p(z), where B(z)
is a matrix of polynomials in z and p(z) is a monic polynomial in z .
(ii) Factor p(z) into the form p(z) = (z - a)(z - b) and write the partial
fraction decomposition of R(z) in the form R(z) = C/(z-a)+D/(z -b),
where C and D are constant matrices.
12.13. Let
A.=[~~~]
0 0 7
·
(i) Write the resolvent R(z) of A in the form R(z) = B(z)/p(z), where B(z)
is a matrix of polynomials in z and p( z) is a monic polynomial in z .
(ii) Factor p(z) and write the partial fraction decomposition of R(z).
12.14. Let
A= [~ ~] .
Compute the spectral radius r(A) using the 2-norm (see Exercise 4.16), using
the 1-norm 11 · Iii (see Theorem 3.5.20) , and using the infinity norm I · !loo·
12.15. Let I · I be a matrix norm. Prove that if A E Mn(IF) satisfies r(A) < 1, then
for any 0 < € < 1 - r(A) there exists m E z+ such that
12.16. For A E Mn(lF), let IAI denote componentwise modulus, so if aij is the (i,j)
entry of A, then laijl is the (i,j) entry of IAI. Let~ denote componentwise
inequality (see also Definition 12.8.1) .
(i) Prove that r(A):::; r(IAI).
(ii) Prove that if 0 ::SA ::SB, then r(A) :::; r(B) .
12.17. Using the matrix A in Exercise 12.12, compute its spectral projections and
show that the four properties in Theorem 12.4.2 are satisfied.
12.18. Using the matrix A in Exercise 12.13, compute its spectral projections and
show that the four properties in Theorem 12.4.2 are satisfied.
12.19. Verify the Cayley-Hamilton theorem for the matrix A in Exercise 12.12.
12.20. Compute the eigenvalues, resolvent, and spectral projections of the matrix
12.21. Let
o
A = [3i
3i]
6 .
12.30. Given A E Mn(lF), use the spectral decomposition (12.33) and Exercise 12.3
to prove that tr R(z) = p'(z)/p(z), where pis the characteristic polynomial
of A. Hint: Write the characteristic polynomial in factored form (4.5), and
show that
Bxk
(12.61)
Xk+i = IJBxkll'
12.35. Give an example of a matrix A E M2(lF) that has two distinct eigenvalues of
modulus 1 and such that the iterative map Xk = Axk-l fails to provide an
eigenvector for any nonzero xo that is not already an eigenvector.
12.36. Let
A ~ [!
1
0
2 !]
(i) Show that A is irreducible.
(ii) Find the Perron root of A.
12.37. For A E Mn(lF), prove that A>- 0 if and only if Ax>- 0 for every x ~ 0 with
x=f. O.
12.38. Prove Proposition 12.8.10.
Exerci ses 515
12.43. Given the matrix A from Example 12.9.6, use the Wedderburn decomposition
to compute AD. In particular, let
-~71
1 0 0 1 0 0
s= 0 1 0
an d S- 1 = 0 1
0 0
0
1
271
-9
3 .
[
0 0 1 -3 [
0 0 0 1 0 0 0 1
Verify that your answer is the same as that given in Example 12.9.6.
12.44. Given A E Mn(lF) , let Po and Do denote, respectively, the eigenprojection
and the eigennilpotent of the eigenvalue ,\ = 0. Show that
A=
a 1
0 a 2 ,
ol
[0 0 a
Notes
Two great sources on resolvent methods are [Kat95] and [Cha12]. Exercise 12.4
comes from [TB97].
Exerc ises 517
A short history and further discussion of Perron's theorem and the Perron-
Frobenius theorem can be found in [MacOO]. For more details about Google's
PageRank algorithm, see [LM06] and [BL06].
For more about the Drazin inverse, see [CM09] and [Wil79]. The proof of
Proposition 12.9.10 is modeled after the proof of the same result in [Wil79] .
The proof of Proposition 12.10.3 was inspired by [Wi107].
Iterative Methods
519
520 Chapter 13. Iterative Methods
approximation, until it is sufficiently close that the algorithm can terminate. For
example, Newton's method, described in Section 7.3, is an iterative method for
finding zeros.
We begin the chapter with a result about convergence of sequences of the form
Xk+1 = Bxk + c, k E N,
for B E Mn (IF) and x 0 , c E !Fn. This one result gives three fundamental iterative
methods for solving linear systems: the Jacobi method, the Gauss- Seidel method,
and the method of successive overrelaxation (SOR).
We then develop some iterative methods for solving numerical linear alge-
bra problems based on Krylov subspaces. Krylov methods rank among the fastest
general methods available for solving large, sparse linear systems and eigenvalue
problems. One of the most important of these methods is the GMRES algorithm.
We conclude the chapter with some powerful iterative methods for computing
eigenvalues, including QR iteration and the Arnoldi method.
where B E Mn(IF) and c , x 0 E !Fn are given. Iterative processes of this form show
up in applications such as Markov cha.ins, analysis of equilibria in economics, 48 and
control theory.
In this section we first show that sequences of the form (13.1) converge when-
ever the spectral radius r(B) satisfies r(B) < 1, regardless of the initial starting
point, and that the limit x E !Fn satisfies the equation x =Bx+ c, or, equivalently,
x = (I - B)- 1 c. We then use this fact to describe three iterative methods for
approximating solutions to linear systems of the form Ax = b .
converge to x = (I - B) - 1 c whenever l Bll < 1. The next theorem shows that this
convergence result holds whenever r(B) < 1, regardless of the size of l Bll·
Theorem 13.1.1. Let B E Mn(lF) and c E lFn. If r(B) < 1, then sequences
generated by (13.1) converge to x =(I - B)- 1 c, regardless of the initial vector x 0 .
Proof. Let 11 • 11 be the induced norm on Mn(lF). If r(B) < 1, then Exercise 12.15
shows that for any 0 < c < 1 - r(B), there exists m E z+ such that l Bkll <
(r(B) + c)k :::; r(B) + c < 1 for all k 2: m. Set f (x) =Bx + c and observe that the
kth composition satisfies
k- 1
fk(x) = Bkx+ LBJc.
j =O
Remark 13.1.2. The proof of the previous theorem also shows that the sequence
(13 .1) converges linearly (see Definition 7.3.1), and it gives a bound on the
approximat ion error E:k = l xk - xo l , namely,
0 0
0
A= *
* *
Here * indicates an arbitrary entry-not necessarily zero or nonzero.
522 Chapter 13. Iterative Methods
Jacobi's Method
By writing Ax= bas Dx = -(L + U)x + b , we have
x = -D- 1 (L + U)x + D- 1 b, (13 .2)
Decomposing A, we have
3
D= 0 2 0 ,
o ol
[0 0 2
Thus,
Starting with xo = [0 1 1] T and iterating via ( 13. l), we find that Xk ap-
proximates the correct answer [- 7 9 6] T to four digits of accuracy once
k ~ 99. We note that r(BJac) == y!5f6 ~ 0.9129, which implies that the
approximation error satisfies [[x - xk[[ :::; 0.9130k · [[x - x 0 [[ fork sufficiently
large.
(13.3)
Hence for Bes = -(D + L)- 1 U and c = (D + L)- 1 b, we have that the iteration
(13.1) converges to x as long as r(Bcs) < 1. Note that although the inverse
( D + L )- 1 is fairly easy to compute (because D + L is lower triangular), it is
generally both faster and more stable to write the Gauss-Seidel iteration as
(D + L)xk+l = -Uxk +b
and then solve for Xk+l by back substitution.
13.1. Methods for Lin ear Systems 523
Nota Bene 13.1.5. Just because one method converges faster than another
does not necessarily mean that it is a faster algorithm. One must also look at
the computational cost of each iteration.
Successive Overrelaxation
The Gauss-Seidel method converges faster than the Jacobi method because the
spectral radius of Bes is smaller than the spectral radius of BJac· We can often
improve convergence even more by splitting up A in a way that has a free parameter
and then tuning the decomposition to further reduce the value of r(B) . More
precisely, by writing Ax= bas (D+wL) x = ((1 - w)D - wU)x+wb, where w > 0,
we have
x = (D + wL)- 1 ((1 - w)D - wU)x + w(D + wL) - 1 b. (13.4)
Hence for Bw = (D+wL)- 1 ((1-w)D-wU) and Cw = w(D+wL) - 1 b, the iteration
(13.1) converges to x as long as r(Bw) < l. We note of course that when w = 1
we have the Gauss-Seidel case. Choosing w to make r(Bw) as small as possible will
give the fastest convergence.
Again, we note that rather than multiply by the inverse matrix to construct
Bw, it is generally both faster and more stable to write the iteration as
Example 13.1.6. Consider again the system from Examples 13.l.3 and 13.1.4.
Since w = 1 corresponds to the Gauss- Seidel case, we already know that
r(B 1 ) = 5/6 ~ 0.8333 ....
524 Chapter 13. Iterative Methods
(13.5)
where j3 = r(BJac); see [Ise09, Gre97] for details. In the case of this example,
r(BJac) = .J576, which gives w* = ~(6 - v'6) ~ 1.4202 and r( Bw.) = w* -
1 ~ 0.4202. Therefore, the approximation error is bounded by ll x - Xk II :S:
0.4202k · llx -xoll, so this method converges much faster than the Gauss- Seidel
method.
For example, starting with x 0 = [O 1 l]T, the term X k approximates
the correct answer [-7 9 6] T to four digits of accuracy once k ::::: 18. Com-
pare this to k ::::: 50 for Gauss- Seidel and k ::::: 99 for Jacobi.
which implies that llBJaclloo < 1. The result follows from Lemma 12.3.13. D
Proof. Define
(13 .7)
13.1. Methods for Linear Systems 525
j> i j<i
j<i j>i
for every i. Choosing i such that IYil = llYllcxi, we have
which implies
ll Bcsxllcxi = ll Yllcxi < :LJ>i 1%1 <
llxllcxi llxllcxi - laiil - :LJ<i laijl - 'Y·
Since this holds for all x =/= 0, we have that ll Bcs llcxi :S: "( < 1. D
If>. E <J(Bw), then there exists a nonzero x E lFn such that Bwx = >.x, and some
straightforward algebraic manipulations give
1 xH(w- 1 D+L)x
1- >. xHAx
Since w E (0, 2) and A > 0, we have that ~ > 1, and by Exercise 4.27 we have
xHDx > 0. Moreover, since A> 0 we have xHAx > 0. Therefore,
~
2 < ~
(- 1
1- >.
) =
~
(- 1
1-
. 1-
>. 1 - X
x) = ~(
1-
1-
2~(>.) +
x )
l>-1 2
= 1- u
1 - 2u + u 2 + v 2
.
Proof. Since q(A) = 0, the degree of q is greater than or equal to the degree
of p. Using polynomial division, 49 we can write q = mp+ r, where m, r E IF[x]
and deg(r) < deg(p). Thus, r(A) = q(A) - m(A)p(A) = 0, which contradicts the
minimality of p, unless r = 0. D
Proposition 13.2.5. Given A E Mn(lF), let A = L>.E<T(A) >..P.x +D.x be the spectral
decomposition. For each distinct eigenvalue>.. , let m.x denote the order of nilpotency
of D.x. In this case, the minimal polynomial of A is
This implies that t he solution x lies in the d-dimensional subspace spanned by the
vectors b , Ab, A2 b , . . . , Ad- 1b.
Definition 13.2.7. For A E Mn(lF) and b E lFn, the kth Krylov subspace of A
generated by b is
Remark 13.2.8. From (13.10) it follows that the solution of the invertible linear
system Ax= b lies in the Krylov subspace Jed(A, b) .
528 Chapter 13. Iterative Methods
Remark 13.2.10. It's clear that J[k(A, b) C Jt;i(A, b) whenever k :::; d, but this
also holds for k > d, in particular, given a polynomial q E lF[x] of any order, we
can write it in the form q = mp+ r, where m, r E lF[x] and deg(r) < deg(p). Thus,
q(A)b = m(A)p(A)b + r(A)b = r(A) b E Jt;i(A, b) .
Definition 13.2.12. Let A E Mn(lF) and b E lFn . The linear system Ax= bis said
to have a Krylov solution if for some positive integer k there exists x E J[k (A, b)
such that Ax = b .
13.2. Minimal Polynomials and Krylov Subspaces 529
and hence
(13.11)
The matrix in parentheses is nonsingular because it can be written in the form
I + M, where M = -aoN - a1N 2 - · · · - am- 2Nm- l is nilpotent; see
Exercise 13.ll(i). Hence b = 0, which is a contradiction D
Proof. The nonsingular case is immediate. Assume now that A is singular. Let
A = L>.Ecr (A) >..P;., + D;., be the spectral decomposition of A and let P* = L>.#O P;.. .
If xis a Krylov solution, then by Remark 13.2.13, it follows that x E Jtd(A, b),
where d is the degree of the minimal polynomial of A. Thus, we can write x as a
linear combination x =I:~:~ ajAJb. Left multiplying by P* and Po gives
d-1 d- 1
P*x = LajAJP*b and Pox= L::ajD6Pob , (13.13)
j=O j=O
and hence P*x E Jtd(A, P*b) and Pox E Jtd(Do, Pob).
Using the spectral decomposition of A, it is straightforward to verify that
P0 A = D 0 , and thus since Ax = b, then Pox is a solution of the nilpotent linear
system D 0 P 0 x = P0 b, which implies P0 b = 0 by the lemma. Therefore, b = P*b ,
which implies that b E &; (P*) = &;(Am) and x = A Db by Exercise 12.48.
530 Chapter 13. Iterative Methods
There are several linear solvers that use Arnoldi iteration. GMRES is fairly
general and serves as a good example. For many systems, it will perform better than
Gaussian elimination (see Section 2.7), or, more precisely, the LU decomposition
(see Application 2.7.15). It is also often better than the methods described in
Section 13.l. Generally speaking, GMRES performs well when the matrix A is
sparse and when the eigenvalues of A are not clustered around the origin.
[~ : :] E Ms(F) and
[0O** **0* **]** E M4x3(lF),
where * indicates an arbitrary entry- not necessarily zero or nonzero.
The Arnoldi process begins with an initial vector b -:/- 0. Since the basis being
constructed is orthonormal, the first step is to normalize b to get q1 = b/llbll·
This is the Arnoldi basis for the initial Krylov subspace Jfi(A, b) . Proceeding
inductively, if the orthonormal basis q 1 , .. . , qj for Jtj(A, b) has been computed,
then construct the next basis element by subtracting from A% E Jtj+l (A, b) its
projection onto Jtj (A, b). This goes as follows . First, set
Now set
j
If hJ+l,j = 0, then lliiJ+ill = 0, which implies Aqj E Jtj(A, b), and the algorithm
terminates. Otherwise set
- qj+l
qj+l - -h- . (13.15)
j+l,j
This algorithm produces an orthonormal basis for each Jtk(A, b) fork = 1, 2, ... , m,
where mis the smallest integer such that Xm(A, b) = Xm+1(A, b).
For the ambient space JFn, define for each k = 1, 2, ... , m the n x k matrix
Qk whose jth column is the jth Arnoldi basis vector % · Let ilk be the (k + 1)
x k Hessenberg matrix whose (i,j) entry is hij· Note that when i > j + 1,
532 Chapter 13. Iterative Methods
we haven't yet defined hi,j , so we just set it to O; this is why the matrix fh
is Hessenberg.
and
Example 13.3.4. Consider the linear system Ax= b from Example 13.2.11 ,
that is,
A~ r~ ~ ~ ~1 Md b ~ rJ
Applying the Arnoldi method yields the following sequence of orthonormal
and Hessenberg matrices computed to four digits of accuracy:
0.51
0.5 8.75]
l
0.5 ' [5.403 '
r0.5
0.5 o. 7635 8.75 -5 .472]
0.5 0.1157
5.403 -1.495 '
0.5 -0.3471 ' [ 0
r0.5 -0.5322
0. 7238
l l
13.3. The Arnoldi Iteration and GM RES Methods 533
0.7635 - 5.472
04025 [ 8 75 2230
[05
0.5 0.1157 -0.7759 - 1.495 -0.1341
Q3=
0.5 -0.3471 - 0.1016 '
H3 = 5.r3
0.7238 0.6715 '
l
0.5 -0.5322 0.4750 0 0.1634
and
0.7635 0.4025
0.0710
[05
0.5 0.1157 -0 .7759 -0.3668
=
l
Q4 0.7869 ,
0.5 -0 .3471 -0.1016
0.5 -0.5322 0.4750 - 0.4911
-5.472 2.230
[ 8 75 - 1.495 -0.1341
-0.3009
1331
H4 = 5.r3
0.7238 0.6715 - 0.2531
0 0.1634 1.074
Once Jtk(A, b) = Jtk+ 1 (A , b), we have that x E Jtk(A, b), and so the least squares
solution is achieved; that is, xk = x.
Using the Arnoldi basis q1, . .. , qk of Jtk(A,b) described above, any y E
Jtk(A, b) can be written as y = Qkz for some z E IFk, where Qk is the matrix of
column vectors q 1 , ... , qk. It follows that
(13.18)
and define the kth approximate solution of the linear system as xk = Qkzk·
Once the Arnoldi basis for step k is known, the solution of the minimization
problem is relatively fast , both because the special structure of Hk allows us to
use previous solutions to help find the next solution and because k is usually much
smaller than n .
534 Chapter 13. Iterative Methods
Zk = argmin
zE JFk
llfikz - f3e1 II 2
= argmin
zEJFk
l Akz - (3fl~e1 I .
2
(13.20)
Since Rk is upper triangular, the linear system Rkz = (3fl~ e 1 can be solved quickly
by back substitution. In fact, since Rk is a (k + 1) x k upper-triangular matrix, its
bottom row is all zeros. Hence, we write
where Rk is a k x k upper-triangular matrix. We can also split off the last entry of
the right -hand side (3fl~e1. Setting
gives the solution of (13.18) as that of the linear system Rkzk = gk, which can be
computed via back substitution.
Now compute the least squares solution for the next iterate. Note that
(13.22)
where hk+l = [h1,k+ 1 h2,k+i hk+I ,k+I]T. Left multiplying (13.22) by the
Hermitian matrix
yields
(13.23)
If u = 0, then the rightmost matrix of the above expression is Rk+I, which is upper
triangular , and the solution of the linear system Rkz = (3fl ~e 1 can be found quickly
with back substitution.
13.3. The Arnoldi Iteration and GM RES Methods 535
(13.24)
where
and
Thus,
[~
J
where rk+l ,k+l = p2 + ~ 2 . This gives the next iterate of (13.19) without having
to compute the QR decomposition again. In particular, we have Fh+1 = nk+1Rk+1,
when~ nk+l is orthonormal (since it is the product of two orthonormal matrices)
and Rk+l is upper triangular.
We can also split up the next iterate of (13.21). Note that
It follows t hat
(13.25)
to solve wit h back substitution (since it's upper triangular) , we solve it here with
GMRES for illust rative reasons. In what follows we provide four digits of accuracy.
Since each fh is given in Example 13.3.4, we can solve each iteration fairly
easily. Fork = 1, it follows from (13.20) that
Thus,
The least squares error is 1.051. For the next iteration, we have that
z2= argmin
z EJF2
l H2z - (3e II2 =
1 argmin
z EJF2
l A2z - (3n~e1 II2 .
0.4~16] ,
0
0.9113
-0.4116 0.9113
- = [n1
!°h ~] G~ =
[0.8509
0.5254
-0.5254
0.8509
~rn
0
0.9113 -0~116 l
l
0
0 0 0.4116 0.9113
[0.8509 - 0.4788 0. 2163
0.5254 0.7754 - 0.3502 '
0 0.4116 0.9113
and
R2 = 02
[R1
~ rb2 ''] [!
0
0.9113
- 0.4116
0.4116
0
0.9113
l [10.28
0
0
-5.441]
1.603
0.7238
n28 - 5441]
1. ~58 .
l
Moreover , we have that
fiil~e1 ~ [-sn1
c~~I ]
[ 0.9113(-1.051)
1702
-0.4116( - 1.051)
[ -0.9576l
1 702 .
0.4325
13.3. The Arnoldi Iteration and GM RES Methods 537
Thus, the least squares solution satisfies the linear system (13.25)
Z2 = [- 0.1227 - 0.5446] T ,
l
which has a least squares error of 0.4325. Repeating for k = 3, we have H3 = D3 R3 ,
where
l 0
0 1 0 0
0 0 0.9899 0.1417 '
r 0 0 -0.1417 0.9899
0.8509 - 0.4788 0.2163 - 0.0307]
0.5254 0.7754 -0.3502 0.04965
0 0.4116 0.9021 - 0.1292 '
r 0 0 0.1417 0.9899
and
-5.441
1.758 - 1.8271
0.8953
0 1.153 .
0 0
Moreover, we have that
-H
(3D3 e1 = [
g2
c212 ]
- S 2 ')'2
=
r
- 1.702
0.9576
0.4281
- 0.06131
.
l
l [ l
Thus, the least squares solution is satisfied by the linear system (13.25)
which has a least squares error of 0.06131. Finally, repeating for k = 4, we have
H4 = D4R4, where G4 = h ,
~l'
0.8509 -0.4788 0.2163 -0.0307
0.5254 0.7754 - 0.3502 0.04965
D3 = 0 0.4116 0.9021 -0.1292
0 0 0.1417 0.9899
0 0 0 0
538 Chapter 13. Iterative Methods
and
-5.441 1.827
-097461
1.758 -0.8953 0.7665
R3 =
ITS 0
0
0
1.153
0
0
-0 .4654 .
1.1513
0
Moreover, we have that
I
1.7021
,Bs1~e1 = [ c~n2
g2 -0.9576
] = 0.4281 .
-82"(2 -0.06131
0
Thus, the least squares solution is satisfied by the linear system (13.25)
[
10.28
0
0
0
-5.441
1.758
0
0
1.827
-0.8953
1.153
0
-0.9746]
0.7665
-0.4654 z 2
1.1513
=
ll 1.702
-0.9576
0.4281 ·
-.0613
if rii =f. 0,
if Tii = 0,
where Tii is the ith diagonal element of R . It is straightforward to check that A is
orthonormal, and R' = AR is upper triangular with only nonnegative entries on its
diagonal. Therefore, Q' = QA H is orthonormal, and A = Q' R' is a QR decomposi-
tion of A such that every diagonal element of R is nonnegative. Thus, throughout
this section and the next , we assume the convention that the QR decomposition
has only nonnegative diagonals. If A is nonsingular, the QR decomposition may be
assumed to have only positive diagonals.
Proof. Let A= Q1R1 and A= Q2R2, where Q1 and Q2 are orthonormal and R1
and R 2 are upper triangular with all positive elements on the diagonals. Note that
AHA=R~R1 = R~R2,
which implies that each a; = bi, and since each ai and bi are positive, we must have
1
ai = bi for each i. Thus, R 1H2 = I, which implies R1 = R2· Since Q1 = AR!
1
1
and Q2 = AR.;- , it follows that Q1 = Q2. D
We then show that in the limit as k --+ oo, the subdiagonals converge to zero and
the diagonals converge to the eigenvalues of A.
(13.28)
We have
Ak+l = UkTk and Ak+i = ur AUk · (13.29)
Moreover, if A is nonsingular, then so is each Tk, and thus
and
(1·1·) ror
L'
eac h i,. we h ave 1.imk-+oo aii(k) = \
-"i·
we have that
i < j,
i = J, (13.32)
i > j as k -t oo,
Tk = AkRkRDkU -t AkRDkU,
Uk = QQkAJ; 1 -t QAJ; 1 ,
which holds by uniqueness of the QR decomposition. It follows that
A = ~ [~ ~].
By computing the QR iteration described above, we find that
.
}~Ak =
[20 1/3]
1 .
From this, we see that <J(A) = {1, 2}. In this special case the upper-triangular
portion converges, but this doesn't always happen.
542 Chapter 13. Iterative Methods
A== [~ !1] .
By computing the QR iteration described above, we find that
(-l)k]
1 .
From this, we see that a(A) = {-2, 1}. Note that the upper-triangular entry
does not converge and in fact oscillates between positive and negative one.
l
5 1
-4 12 -5
-4 01
-3
A= 10 -8 -4 3 .
-2 13 -8 -2
We can show that the eigenvalues of A are a(A) = {2, 1, 4 + 3i, 4 - 3i}. By
computing the QR iteration described above, we find that
l
9.8035 3.6791 1.6490 -18.30221
-11.6006 -1.8035 8.5913 0.7464
A 10000 = 0 0 2.0000 -1.2127 .
0 0 0 1.0000
13.5. *Computing Eigen valu es II 543
Thus we can see instantly that two of the eigenvalues are 2 and 1, but we have
to decompose the first block to get the other two eigenvalues. Since A 10000
is block upper triangular, the eigenvalues of each block are the eigenvalues of
the matrix (see Exercise 2.50), which is similar to A. Note that the block in
question is
B = [ 9.8035 3.6791 ]
-11.6006 -1.8035 '
which satisfies tr(B) = 8 and det(B) = 25. From here it is easy to see that the
eigenvalues of B are 4 ± 3i, and thus so are the eigenvalues of A 10000 and A.
Remark 13.4.10. Anytime we have a real matrix with a conjugate pair of eigen-
values, which is often, the condition (13.31) is not satisfied. In this case, instead of
the lower-triangular part of Ak converging, it will converge below the subdiagonal,
and the 2 x 2 block corresponding to the conjugate pair will oscillate. One way to
determine the eigenvalues is to compute the trace and determinant, as we did in
the previous example (see Exercise 4.3). The challenge arises when there are many
eigenvalues of the same modulus producing a large block. In the following section,
we show how to remedy this problem.
Remark 13.4.11. In practice this isn't a very good method for computing the
eigenvalues; it's too expensive computationally! The QR decomposition takes too
many operations for this method to be practical for large matrices. The number
of operations grows as a cubic polynomial in the dimension n of the matrix; thus,
when n is large, doubling the matrix increases the number of operations by a factor
of 8. And this is for each iteration! Since the number of QR iterations is likely to be
much larger than n (especially ifthere are two eigenvalues whose moduli are close),
the number of operations required by this method will have a leading order of more
than n 4 . In the following section we show how to vastly improve the computation
time and make QR iteration practical.
where au E JF, where a12 E JFn-l, and where A22 E Mn-1 (JF) . Let H1 E Mn-1 (JF)
denote the Householder matrix satisfying H 1a2 1 = v1e 1 E JFn- 1 , and let H 1 be the
block-diagonal matrix H 1 = diag(l, H1 ). Note that
Therefore, all entries below the subdiagonal in the first column of H 1 AH1 are zero.
For general k > 1, assume by the inductive hypothesis that we have
Here all the Hi are Hermitian and orthonormal, and Au E Mkxk-l(JF) is upper
Hessenberg. Also ak+l ,k E JFn-k, al,k E JFk , Aa,k+l E Mkxn-k, and Ak+ l,k+l E
Mn-k(JF) . Choose flk E Mn-k to be the Householder transformation such that
flkak+l,k = vke1 E JFn-k. Setting Hk = diag(h , Hk) gives
Continuing for all k < n shows that for Q = H 1H 2 · · · Hn-l the matrix QH AQ is
upper Hessenberg. D
Proof. The matrix A = [aij] satisfies aij = 0 whenever i + 1 > j , and the matrix
B = [bij] satisfies bij = 0 whenever i > j. The (i, k) entry of the product AB is
given by
n
(AB)ik = L Uijbjk· (13.33)
j=l
Assume i + 1 > k and consider aijbjk · If j < i + 1, then aij = 0, which implies
aijbjk = 0. If j 2:: i + 1, then since j>k, we have that bjk=O, which implies aijbjk = O.
In other words, aijbjk = 0 for all j, which implies that the sum (13.33) is zero. The
proof that the product BA is Hessenberg is Exercise 13.19. D
Definition 13.5.4. Assume that i < j. A Givens rotation of() radians between
coordinates i and j is a matrix operator G(i , j, ()) E Mn(IF) of the form
or, equivalently,
G(i,j,B) =
Ji-1
0
0
0
0
0
cos()
0
sin()
0
0
0
Ij - i- 1
0
0
0
- sin()
0
cos()
0
0
0
Ino-j
I . (13.35)
Remark 13.5.5. When a Givens rotation acts on a vector, it only modifies two
components. Specifically, we have
546 Chapter 13. Iterative Methods
Xi - l
Xi COS 8 - Xj sin 8
Xi+l
G(i,j, B)
Xj-l Xj-l
Xj XisinB+xjcose
Xj+l Xj+l
Xn Xn
(iii) G( i, j, B) is orthonormal.
Proof.
(ii) This follows by taking the transpose of (13.35) and realizing that the cosine
is an even function and the sine is an odd function.
k = 1, 2, .. . , n - 1,
Bk = Arctan
( -W-
hk~l,k) .
hk,k
(13.36)
(k+l)
h k+l,k T GTH
= ek+l k kek
= (eI+i - sinlheI - (1 - cos8k)ek+i) Hkek
= -eIHkeksin8k +eI+iHkekcosek
= -hi:k sin ek + hi~l,k cos ek
= 0,
Example 13 .5.9. Here we use the Arnoldi method to calculate the Ritz
eigenvalues of the matrix in Example 13.2.11. The square Hessenberg ma-
trices (Hk) and the Ritz eigenvalues corresponding to Arnoldi step k are as
follows:
Hk Ritz Eigenvalues
[8.75] 8.75
[ 8 75
5 . ~03
-5.472
-1.495
0.724
223?
-O.U4
0.672
l 3.416, 2.555 ± i0.977
r5.r3
8 75
-5.472
-1.495
2.230
-0.l~\4
-13311
0.301
3.000, 2.000, 2.000, 2.000
0.724 0.672 -0.253
0 0.163 1.074
Rem ark 13.5.10. When A E Mn(lF) is Hermitian, the Hessenberg matrices formed
by Arnoldi iteration are tridiagonal and symmetric. In this case, the Arnoldi method
is better understood theoretically, and is actually called the Lanczos method. In
addit ion to t he fact that A has orthogonal eigenspaces, the implementations are
able to st ore fewer numbers by taking advantage of the symmetry of A; and this
greatly reduces t he computation time.
Exercises
N ote to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
first and second line are for Section 1, the exercises between the second and third
lines are for Section 2, and so forth.
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *). We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with &. are especially important and are likely to be used later
in this book and beyond. Those marked with tare harder than average, but should
still be done.
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
Exercises 549
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
13.1. Let A E Mn(lF) have components A = [aiJ]· For each i, let Ri = Lj#i laiJ I
and let D(aii , Ri) be the closed disk centered at aii with radius Ri· Prove
that every eigenvalue A of A lies within at least one of the disks D(aii, Ri) ·
Hint : Since the eigenvalue equation can be written as L:7=l aijXj = .\xi , split
up the sum so that L #i aijXj = (.A - aii)Xi·
13.2. Assume A E Mn(lF) is strictly diagonally dominant. Prove A is nonsingular.
Hint: Use the previous exercise.
13.3. A matrix CE Mn(lF) is called an approximate inverse of A if r(I - CA) < 1.
Show that if C is an approximate inverse of A , then both A and C are
invertible, and for any x 0 E lFn and b E lFn, the map
~<a+c:<l.
b - b+c:
cn-
1
and Bos= [~ -D - B ]
D-
2
1 1
1 B ·
13.8. As an alternative proof that the matrix in (13.11) is nonsingular, show that
all of its eigenvalues are one. Hint: Use the spectral mapping theorem (see
Theorem 12.7.1) and the fact that all of the eigenvalues of a nilpotent matrix
are zero (see Exercise 4 .1) .
13.9. Assuming A E Mn(lF) is invertible, prove (13.9).
13.11 . Let
1 0 C1]
A= 0 1 C2 .
[0 0 1
Prove for any x 0 and b that GMRES converges to the exact solution after
two steps .
13.12. Prove that if rk is the kth residual in the GMRES algorithm for the system
Ax = b, then there exists a polynomial q E lF[z] of degree no more than k
such that rk = q(A)b.
13.13. Prove that if A = UHu - 1 , where H is square and properly Hessenberg
(meaning that the subdiagonal has no zero entries) , and where U E Mn(F )
has columns u 1, ... , Un, then span u1, . . . , Uj = Jtj(A, u1) for any j E
{l, . . .,n}.
by using QR iteration.
[~ Hl
13.15. What happens when QR iteration is applied to an orthonormal matrix? How
does this relate to Theorem 13.4.4?
13.16. Prove Lemma 13.4.2.
13.17. One way to speed convergence of QR iteration is to use shifting. Instead
of factoring Ak = QkRk and then setting Ak+l = RkQk, factor Ak - O"kf,
where the shift O"k is close to an eigenvalue of A. Show that for any O" E lF,
if QR = A = O"f is the QR decomposition of A - O"f, then RQ + O"f is
orthonormally similar to A.
2
1
2
4
ol1
[3 1 0
Exercises 551
13.19. Prove that the product BA of a Hessenberg matrix A E Mn(IF) and an upper
triangular matrix B E Mn (IF) is a Hessenberg matrix.
13.20. Prove Theorem 13.5.3.
13.21. Prove that Givens rotations G(i, j, e) satisfy the identity G(i, j, O)G(i, j , ¢) =
G(i,j,e + ¢).
13.22. Prove that Hk+l as defined in Theorem 13.5.7 is Hessenberg.
13.23. In the proof of Theorem 13.5.7, show that h~~i,~) = 0 for i = 1, 2, ... , k - l.
Notes
Our treatment of Krylov methods is based in part on [TB97] and [IM98].
Exercise 13.15 is from [TB97].
For details on the stability and computational complexity of the methods
described in this chapter, see [GVL13, CF13].
Spectra and
Pseudospectra
Recall from Section 7.5.4 that the eigenvalue problem can be ill conditioned; that
is, a very small change in a matrix can produce a relatively large change in its
eigenvalues. This happens when the eigenspaces are nearly parallel. By contrast
we saw in Corollary 7.5.15 that the eigenvalue problem for normal matrices is well
conditioned. In other words, when matrices have orthogonal or nearly orthogonal
eigenspaces, the eigenvalue problem is well conditioned, and when the eigenspaces
are far from orthogonal, that is, very nearly parallel, the eigenvalue problem is ill
conditioned.
When a problem is ill conditioned, two nearly indistinguishable inputs can
have very different outputs, thus calling into question the reliability of the solution
in the presence of any kind of uncertainty, including that arising from finite-precision
arithmetic. For example, if two nearly indistinguishable matrices have very different
eigenvalues and therefore have wildly different behaviors, then the results of most
computations involving those matrices probably cannot be trusted.
An important example of this occurs with the iterative methods described in
Chapter 13 for solving linear systems. In Section 13.l we described three iterative
methods for solving linear systems by taking powers of matrices. If the eigenvalues
of the matrices are contained in the open unit disk, then the method converges
(see Theorem 13.1.1). However, even when all the eigenvalues are contained in
the open unit disk , a problem can arise if one or more of the eigenvalues are very
nearly unit length and the corresponding eigenspace is nearly parallel to another
of its eigenspaces. In this case, it is possible that these matrices will be essen-
tially indistinguishable, numerically, from those that have eigenvalues larger than
one. Because of ill conditioning, these iterative methods can fail to converge in
practice, even when they satisfy the necessary and sufficient conditions for conver-
gence. Moreover, even when the iterative methods do converge, ill conditioning can
drastically slow their convergence.
This chapter is about pseudospectral theory, which gives tools for analyzing
how conditioning impacts results and methods that depend on the eigenvalues of
553
554 Chapter 14. Spectra and Pseudospectra
matrices. In the first section we define the pseudospectrum and provide a few
equivalent definitions. One of these definitions describes the pseudospectrum in
terms of the spectra of nearby operators, which gives us a framework for connecting
convergence to conditioning. Another equivalent definition uses the resolvent.
Recall that the poles of the resolvent correspond to the eigenvalues. The pseu-
dospectrum corresponds to the regions of the complex plane where the norm of
the resolvent is large but not necessarily infinite, indicating eigenvalues are nearby.
These regions give a lot of information about the behavior of these matrices in
computations.
In the second section, we discuss the transient behavior of matrix powers.
Consider the sequence (llAkll)kEN, generated by a matrix A E Mn(C). The sequence
converges to zero as k --+ oo if and only if the spectral radius r(A) of A is less than
one, but before it goes to zero, it may actually grow first. And it need not go to
zero very quickly. Since convergence of many iterative methods depends upon these
matrix powers approaching zero, it becomes important to understand not just what
happens for large values of k, but also what happens for small and intermediate
values of k. When does the sequence converge monotonically? And when does it
grow first before decaying?
In the second section we address these questions and provide upper and lower
bounds for the sequence of powers via the Kreiss matrix theorem. We also discuss
preconditioning, which is a way of transforming the matrices used in iterative meth-
ods into new matrices that have better behaved sequences with faster convergence.
These are useful when dealing with a poorly conditioned matrix with eigenvalues
close to the unit circle. The pseudospectrum gives insights into how to choose a
good preconditioner.
In the final section we prove the Kreiss matrix theorem by appealing to a key
lemma by Spijker. The proof of Spijker's lemma is the longest part of this proof.
In some sense the pseudospectra represent the possible eigenvalues of the ma-
trix when you throw in a little dirt. In mathematics we call these perturbations.
Ideally when we solve problems, we want the solutions to be robust to small pertur-
bations. Indeed, if your answer depends on infinite precision and no error, then it
probably has no connection to real-world phenomena and is therefore of little use.
Remark 14.1.3. We always have O'(A) C O'c:(A) for all c: > 0, since E = 0 trivially
satisfies llEll < c:.
A= [o.~Ol 1000]
1 '
which has eigenvalues {O, 2}. In Example 7.5 .16 we saw that if A was perturbed
by a matrix
E = [-o~OOl ~] '
the eigenvalues of A+ E became a double eigenvalue at l. Figure 14.l(a)
depicts the various c:-pseudospectra of A for c: = 10- 2 , 10- 2 · 5 , and 10- 3 . That
is, the figure depicts eigenvalues of matrices of the form A+ E, where E is a
matrix with 2-norm c:.
Theorem 14.1.6. Given A E Mn(C) and c: > 0, the following sets are equal:
for some v E lFn with JJvJJ = l. The vectors v are the c:-pseudoeigenvectors
of A corresponding to the c:-pseudoeigenvalues z E <C.
556 Chapter 14. Spectra and Pseudospectra
31 3i
2i 21
-i -1
-2i -2i
-3i'-'--~-~-~~-~~-~-~
-3 -2 -1 3 5 -3 -2 -1
(a) (b)
(14.3)
where R(A, z) is the resolvent of A evaluated at z.
Proof. We establish equivalence by showing that each set is contained in the other.
(i)c(ii) . Suppose (A+E)v = zv for some [[E[[ < c and some unit vector v. In this
case, [[(zl -A)v[[ =[[Ev[[::; [[E[[[[ v[[ < c.
(ii)c(iii). Assume (zl - A)v = su for some unit vectors v, u E en and 0 < s < c.
In this case, (zl - A)- 1 u = s- 1 v, which implies I (zJ - A)- 1 [[ 2 s - 1 > c 1 .
(iii)C(i) . If [[(zl-A)- 1 [[ > c 1 , then by definition of the norm of an operator, there
exists a unit vector u E en such that II (zl -A)- 1 u [[ > c 1 . Thus, there exists
a unit vector v E en and a positives< c such that (zl - A)- 1 u = s- 1v . In
this case, we have that su = (zl -A)v, which implies that (A+suvH)v = zv.
Setting E = suvH, it suffices to show that [[E[[ = s < c. But Exercise 4.3l(ii)
implies that
[[E[[ 2 = s 2 [[(vuH)(uvH)[[::; s 2 [[vvH[[ ::; s 2 ,
where the last inequality follows from the fact that t he largest singular value of
the matrix vvH is l. Thus, z is an eigenvalue of A + E for some
[[ E [[ <c. D
Remark 14.1. 7. Each of these equivalent definitions of the pseudospectrum has its
advantages. The definition (i) is well motivated, but perhaps more difficult to visual-
ize than t he more traditional definition (iii) in terms of resolvents, which shows that
a"e:(A) is the open subset of e bounded by the c- 1 level set of the normed resolvent.
14.1. The Pseudospectrum 557
Remark 14.1.8. Recall that the spectrum of an operator A is the locus where
the resolvent ll (zl - A) - 1 11 is infinite. The fact that definitions (i) and (iii) are
equivalent means that the locus where II (zl - A)- 1 I is large, but not necessarily
infinite, gives information about how the spectrum will change when the operator
is slightly perturbed.
A=
1+i
-i
[ 0.3i
0~5 ~
0.5 0.7
l·
Figure 14.2(a) shows a plot of the norm of the resolvent of A as a function
of z E C. The poles occur at the eigenvalues of A, and the points in the
plane where the plot is greater than c: - 1 form the c-pseudospectrum of A.
Figure 14.2(b) shows the level curves of llR(A, z)ll for the matrix A, cor-
responding to the points where llR(A,z)ll = c 1 for various choices of c.
For a given choice of c, the interior of the region bounded by the curve
llR(A, z)ll = C 1 is the c-pseudospectrum.
We can also give another equivalent form of the c-pseudospectrum, but unlike
the previous three definitions, this does not generalize to infinite-dimensional spaces
or to norms other than the 2-norm.
2i
3i/2
1.5
E
g
E 1
g: i/2
~
"'
.:::..g 0.5
"'
0
- i/2
-i
(a) (b)
0 0 0 0 0 0 0 0 0 3628800
1 0 0 0 0 0 0 0 0 -10628640
0 1 0 0 0 0 0 0 0 12753576
0 0 1 0 0 0 0 0 0 -8409500
0 0 0 1 0 0 0 0 0 3416930
A= 0 0 0 0 1 0 0 0 0 -902055
E M10(1R).
0 0 0 0 0 1 0 0 0 157773
0 0 0 0 0 0 1 0 0 -18150
0 0 0 0 0 0 0 1 0 1320
0 0 0 0 0 0 0 0 1 -55
-si~_~2~~~~~-~-1~0~12~~1•
-2 10 12 14
(a) (b)
(i)
1 1
ll( zI - A)- 11 > - -- -
- dist(z, er( A))
(14.5)
-1 1
ll(zI-A) II = dist(z,cr(A)) (14. 7)
Proof. If z E cr(A), then both ll (zI -A)- 1 11 = oo and dist(z,cr(A)) = 0 and all the
relations hold trivially. Thus, we assume below that z ¢:. cr(A).
(i) If Av = ,\v for some unit vector v E en, then (zI - A)v = (z - ,\)v. Thus,
(zI - A)- 1 v = (z - ,\)- 1 v, which implies that
1 1
ll(zI - A) - 111 > max = .
- >.EC7(A) dist(z, ,\) dist(z, cr(A))
560 Chapter 14. Spectra and Pseudospectra
(ii) Since II (zl - A) - 1 11 = llV(zl -D)- 1 v- 1 11 ::; llVll llV- 1 1111 (zl -D)- 1 11, it follows
that
1 K(V)
ll(zl - A) - 1 11::; K(V)ll(zl - D)- 1 11 :=:: K(V) max - -,
.XEa(A) 1Z - /\ 1 dist(z, cr(A)) ·
Remark 14.1.13. The condition number K(V) depends on the choice for the diag-
onalizing transformation V, which is not unique. Indeed, if we rescale the column
vectors of V (that is, the eigenvectors) with VA where A is a nonsingular diagonal
matrix, then VA still diagonalizes A, that is, (VA)D(VA)- 1 = (VA)DA- 1 v- 1 =
v vv- 1 = A, yet the condition number K(V A) is different and can be drastically so
in some cases. In fact, by making the eigenvectors in V sufficiently large or small,
we can make the condition number as large as we want to.
where K = K(V) is the condition number for V. In the special case that A is normal,
we have that
crc:(A) = cr(A) + B(O,c) = {z EC: lz - .Al < c for at least one>. E cr(A)}.
Remark 14.1.16. The Bauer- Fike theorem shows that the (absolute) condition
number of the eigenvalue problem for a diagonalizable A is bounded by the condi-
tion number K(V) for every V diagonalizing A, and therefore it is bounded by the
spectral condition number Ka(A).
Remark 14.1.17. The Bauer-Fike theorem says that the pseudospectrum is not
especially useful when A is normal because it can be constructed entirely from the
information about the spectrum alone.
14.2. Asymptotic and Transient Behavior 561
Proof.
(i) This follows from Exercise 12.15.
(ii) If r(A) > 1, then there exists .A E D"(A) such that I.A l > 1. Since l.A lk :::; llAkll
we have that llAkll --+ oo ask--+ oo.
(iii) See Exercises 14.9 and 14.10. D
Definition 14.2.2. For a matrix A E Mn(C) and any c: > 0, the c:-pseudospectral
radius re-(A) of A is given by
562 Chapter 14. Spectra and Pseudospectra
Proof. Fix c > 0. Choose zo so that lzol = rc(A). By continuity of the resolvent
and the norm, we have that ll(zol -A)- 1 11 2: c- 1 . Thus,
r (A) - 1
sup(lzl - l)ll(zl - A)- 1 11 2: c .
zEIC c
Since the supremum will never occur when lzl :::;; 1, we can restrict the domain and
write
r (A) -1
sup (lzl - l)ll(zl - A)- 1 11 2: c .
[z[>l c
Since this holds for all c > 0, we have
1 r (A) - 1
sup (lzl - l)ll(zl - A) - 11 2: sup c = K(A) .
[z[>l c>O c
To establish the other direction, fix lzl > l. Define c 1 = ll(zl -A)- 1 11 - Thus, z is
the boundary of uco· Hence, rc 0 (A) 2~ lzi. This yields
Remark 14.2.4. The Kreiss constant is only useful when r(A) :::; 1 because K(A) =
oo whenever r(A) > 1, as can be seen from (14.10).
K(A) = sup
ro:(A) - 1 r(A) +E - 1
= sup :S 1.
o:>O E o:>O E
Moreover,
. r(A)
1lm
+E - 1
= 1,
c-+oo E
which implies that K(A) = 1. D
We are now prepared to give a lower bound on the transient behavior of llAkll·
Lemma 14.2.6. If A E Mn(IF) , then
Proof. Let M = supk l Ak l · If M = oo, the result follows trivially. We now assume
Mis finite, and thus llAll :S 1. Choose z E C such that lzl > 1. By Theorem 12.3.8,
we can write
M
lzl -1 ·
Thus,
(lzl - l)ll(zI - A)- 1 11 :SM.
Since z was arbitrary, we may take the supremum of the left side to get the desired
result. D
We are now prepared to state the Kreiss matrix theorem. The proof is given
in Section 14.3.
Remark 14.2.8. The Kreiss matrix theorem gives both upper and lower bounds
on the transient behavior of l Ak II · When the Kreiss constant is greater than one,
it means that the sequence (llAkll)~ 0 grows before decaying back to zero. If the
Kreiss constant is large, then the transient phase is nontrivial and it will take many
iterations before Ak converges to zero. This means that iterative methods may take
a while to converge. By contrast, if the Kreiss constant is close to one, then the
transient phase should be relatively brief by comparison and convergence should be
relatively fast.
564 Chapter 14. Spectra and Pseudospectra
Remark 14.2.9. The original statement of the Kreiss theorem did not actually
look much like the theorem above. The original version of the right-hand inequality
in (14.12), proven by Kreiss in 1962, was supkEN IJAkJI ::; CK(A), where C,....., cnn.
Over time the bound has been sharpened through a series of improvements.
The current bound is sharp in the following sense. Although there may not be
matrices for which equality is actually attained, the inequality is the best possible
in the sense that if (JJAkJl)kEN::; Cn°'K(A) for all A, then a can be no smaller than
one. Similarly the factor e is the best possible, since if (llA~llhE N ::; Cn°'K(A) and
a = 1, then C can be no smaller thane. For a good historical survey and a more
complete treatment of the Kreiss matrix theorem, see [TE05].
vn+l = G(t:.tL1:::.t)vn ,
14.2.3 Preconditioning
In Section 13.1 we describe three iterative methods for solving linear systems; that
is, Ax = b, where A E Mn(lF) and b E lFn are given. The iterative methods are
of the form Xk+l = Bxk + c for some B E Mn(lF) and c E lFn . A necessary and
sufficient condition for convergence is that the eigenvalues of B be contained in the
open unit disk (see Theorem 13.1.1) . If for a given xo the sequence converges to
some x00 , the limit sat isfies x00 = Bx00 + c, which means that the error terms
ek = Xk - x 00 satisfy
or, equivalently, ek+l = Bk(xo - x00 ). In other words, llek+ill :S llBkllllxo - xoo ll ·
However, if l Bk ll exhibits transient behavior, the convergence may take a long time
to be realized , thus rendering the iterative method of little or no use . Therefore,
for this method to be useful, not just the eigenvalues of B must be contained in the
unit disk, but also the pseudospectrum must be sufficiently well behaved.
But even if the pseudospectrum of A is not well behaved, not all hope is lost.
An important technique is to precondition the system Ax = b by multiplying both
sides by a nonsingular matrix M- 1 so that the resulting linear system M- 1 Ax =
M - 1 b has better spectral and transient behavior for its corresponding iterative
method.
Of course M = A is the ultimate preconditioner-the resulting linear system
is x = A- 1 b, which is the solution. However, if we knew A- 1 , we wouldn't be
having this conversation. Also , the amount of work involved in computing A- 1 is
more than the amount of work required to solve the system. So our goal is to find
a precondit ioner M - 1 t hat's close to the inverse of A but is easy to compute and
leaves the resulting iterative method with good pseudospectral properties.
If we write A = M - N, then M - 1 A= I - M - 1 N, and the iterative method
of Section 13.l then becomes
566 Chapter 14. Spectra and Pseudospectra
Lemma 14.3.2. Given A E Mn(q, let r(z) = uH R(z)v for some unit vectors u
and v, and let R(z) be the resolvent of A. If r = {z EC I lzl = 1+(k+1)- 1} for
some k EN, then
sup Jr(z)J :::; (k + l)K(A) . (14.14)
zEr
Proof. Using Exercise 4.33, we note that r(z) :S: JJ R(z) II · Hence,
K(A) = sup (lzl - l)JJR(z)JJ 2 sup (Jzl - l)Jr(z) I
lzl>l lzl>l
2 sup (Jzl - l)lr(z)I =sup (k + 1)- 1lr(z)I. D
zEr zEr
Now we can state and prove a lemma that will give the right-hand side
of (14.12).
1
Ak =- . J zk R(z)dz,
27ri lr
and thus
1 )k+l
lzlk+l =
( 1 + -l+k
- < e.
1
luH Akvl::::;
27r(k + 1)
J lzlk+ 1 lr'(z)lldzl
lr
::::; 27r(ke+ 1)
en
i
lr'(z) lldzl
::::; -k- sup lr(z)I
fl+ 1 zEr
::::; enK(A).
Using Exercise 4.33, this gives llAk I : : ; en K(A). Since this holds for all k E N, we
have (14.15) . D
Proof. Let p(z) = anzn + an_ 1 zn-l + · · · + aiz + ao. Assume z(t) = peit. Note
that z(t) = pe- it = p2 z(t)- 1 . Thus,
It follows that
568 Chapter 14. Spectra and Pseudospectra
Is lr'(t)l ldzl = fo
2
7r lf'(t)I dt. (14.16)
Proof. Since f'(t) = pieitr'(peit), we have lf'(t)I = plr'(peit)I, and since z = peit
we have that ldzl = pdt. Thus,
r
ls
lr'(t) l ldzl = f
lo
2
7r lf'(t) I pdt
P
= f
lo
2
7r lf'(t)I dt. D
Lemma 14.3.6. Let g'(t) = lf'(t)I cosw(t) and h'(t) = lf'(t)I sinw(t) be the real
and imaginary parts of lf'(t)I, respectively. The following equality holds:
= 4lf'(t)I. D
Lemma 14.3.7. For fixed e E [0, 2n], define Fe(t) = g(t) cose + h(t)sine. We
have
27'
IF0(t)1dt:::; 4n sup IFe(t)I . (14.18)
1 0 tE [0, 27r]
Proof. Since F0(t) = g'(t) cose + h'(t) sine is derived from a nontrivial rational
function, it has finitely many distinct roots 0 :::; to < ti < · · · < tk - 1 < 2n. Write
tk = 2n. Thus, IF0(t)I > 0 on each interval (tj - 1, tJ) for j = 1, 2, . .. , k. If F0(t) < 0
on (tj - 1, tj), then
1ti IF0(t) I dt = 1ti F0(t ) dt =(Fe( ti) - Fe(ti- 1)) = IFe(ti) - Fe(ti- 1)1.
t i- 1 t i- 1
Thus,
£7'
· 0
2
IF0(t)1dt=t1 t; F0(t) dt == t
i =l t ,_ , i =l
IFe(ti) - Fe(ti-1 )1 :::; 2k sup IFe(t)I.
t E[0,27r]
14 3. *Proof of the Krei ss Mat rix Th eorem 569
Recall that z(t) = peit. Multiplying both sides by znq(z )q(z) , we have
Since q(z) is nonzero on Sand, by Lemma 14.3.4, the right-hand side is a polynomial
on S of degree at most 2n, we can conclude that Fe(t) has at most 2n roots.
Therefore k ~ 2n . D
= II [hmJ·[~~~~J 11
~ Jg(t) 2 + h(t) 2 V~
co_s_
2_() _ in__,.2 -
+_s- ()
Now we have all the pieces needed for an easy proof of Spijker's lemma.
Is !r'(t) ! !dz! = 1 2
71' !J'(t) ! dt
= J,~(14 },~ Ig' (t) cos () + h' (t) sin ()I d()
0 0
)
dt
= ~ 1 (1 2
71'
2
71' !Fe(t) ! dt) d()
2
= n f 11' ( sup IFe(t)!) d()
lo tE [0,271']
~2r.nsup!r(z)I. D
zES
570 Chapter 14. Spectra and Pseudospectra
Exercises
Note to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
first and second line are for Section 1, the exercises between the second and third
lines are for Section 2, and so forth.
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *). We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with &. are especially important and are likely to be used later
in this book and beyond. Those marked with t are harder than average, but should
still be done.
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
14.1. Given A E Mn(lF), show that 0-,, 1 (A) C 0- 02 (A) whenever 0 < c: 1 < c2 .
14.2. Prove the following: Given A E:'. Mn(<C), there exists 'Y > 0 and M :'.::: 1 such
that
llAkll :S M"fk for all k EN.
Hint: Given c > 0, choose A= (r(A) + c:)- 1 A and use Exercise 12.15.
14.3. Given A E Mn(lF), show that for any z E <C and c > 0, we have
14.4. Prove Proposition 14.1.10; that is, show the equivalence (14.4) and (14.2).
14.5. For any A E Mn(<C), any nonzero c E <C, and c: > 0, prove the following:
(i) a-0 (AH) = a-0 (A).
(ii) a-c:(A + c) = a- (A) + c.
0
A=[~~] ·
(i) Use induction to prove that
Ak = [~ c:1k] .
(ii) Use Exercise 3.28 to show that llAkll2 -too ask-too. Hint: It is trivial
to compute the 1-norm and oo-norm of Ak.
Exercises 571
14.9. Given A E Mn(C), assume that r(A) = 1, but that the eigennilpotent D>.. is
zero for all .X such that I.XI = 1. Prove one direction of Theorem 14.2.l(iii)
by completing following steps:
(i) Use the spectral decomposition to write
A= L AP>.+B,
i>.i=l
where r(B) < 1 and P.XB = BP>. = 0 for all A E O'(A) satisfying I.XI = 1.
(ii) Show that Ak = I:i>.i=l ,Xkp>. +Bk .
(iii) Use the triangle inequality to show that
ll Ak ll :S k (1 + ~) k K(A):::; keK(A).
Since this grows without bound, this isn't a very useful bound. It's for this
reason that the integration-by-parts step in Lemma 14.3.3 is so critical.
572 Chapter 14. Spectra and Pseudospectra
k 2en2 '
llA II :::; (k + l)(k + 2) ~~~Ir (z)I.
Notes
Much of our treatment of pseudospectra was inspired by [Tre92, Tre97, TE05], and
Exercise 14.5 is from [TE05]. All figures showing contour plots of pseudospectra
were created using Eigtool [Wri02].
For more about the spectral condition number and the choice of V in (14.6)
and in the Bauer-Fike theorem (Corollary 14.1.15), see [Dem83, Dem82, JL97].
Finally, for a fascinating description of the history and applications of Spijker's
lemma, we strongly recommend the article [WT94] .
Rings and Polynomials
-Sauron
Ring theory has many applications in computing, counting, cryptography, and com-
munications. Rings are sets with two operations,"+" and"·" (usually called addition
and multiplication, respectively). In many ways rings are like vector spaces, and
much of our treatment of the theory of rings will mirror our treatment of vector
spaces at the beginning of this book. The prototypical ring is the set Z with addi-
tion and multiplication as the two operations. But there are many other interesting
and useful examples of rings, including IF[x], Mn(IF), C(U; IF), and .@(X) for any
normed linear space X. Ring theory shows that many results that are true for the
integers also apply to these rings.
We begin with a survey of the basic properties of rings, with a special focus
on the similarities between rings and vector spaces. We then narrow our focus to
a special class of rings called Euclidean domains, and to quotients of Euclidean
domains. These include rings of polynomials in one variable and the ring of all
the matrices that can be formed by applying polynomials to one given matrix. A
Euclidean domain is a ring in which the Euclidean algorithm holds, and we show
in the third section that this also implies the fundamental theorem of arithmetic
(unique prime factorization) holds in these rings. Applying this to the ring of
polynomials, we get an alternative form of the fundamental theorem of algebra.
In the fourth section we treat homomorphisms, which are the ring-theoretic
analogue of linear transformations. We show there are strong parallels between
maps ofrings (homomorphisms) and maps of vector spaces (linear transformations).
Again the kernel plays a fundamental role, and the first and second isomorphism
theorems for rings both hold. 51
51 There is also a third isomorphism theorem both for rings and for vector spaces. We do not cover
that theorem here, but you can find it in almost any standard text on ring theory.
573
574 Chapter 15. Rings and Polynomials
One of the most important results of this chapter is the Chinese remainder
theorem (CRT), which we prove in Section 15.6. It is a powerful tool with many
applications in both pure and applied mathematics. In the remainder of the chapter
we focus primarily on the important implications of the CRT for partial fraction
decomposition, polynomial interpolation, and spectral decomposition of operators.
We conclude the chapter by describing a remarkable connection between Lagrange
interpolants and the spectral decomposition of a matrix (see Section 15.7.3).
Remark 15.1.2. A ring is like a vector space in many ways. The main differences
are in the nature of the multiplication. In a vector space we multiply by scalars that
come from a field- outside the vector space. But in a ring we multiply by elements
inside the ring.
Example 15.1.3. Even if you have seen rings before, you should familiarize
yourself with the following examples, since most of them are very common in
mathematics and will arise repeatedly in many different contexts.
(ii) The rationals Q, the reals ~., and the complex numbers CC each form a
ring (with the usual operations of addition and multiplication) .
15.1. Definition and Examples 575
(iii) Fix a positive integer n. The set Zn = {[[OJ], ... , [[n -1]]} of equivalence
classes mod n, as described in Example A.l.18(i) , forms a ring with
operations rn and 0, as described in Examples A.2.16(i) and A.2.16(ii) .
(iv) The set C 00 ((a, b); JF) of smooth functions from (a, b) to lF forms a ring,
where+ is pointwise addition (f + g)(x) = f (x) + g(x) and · is pointwise
multiplication (f · g)(x) = f(x)g(x).
(v) Given a vector space X, the set ~(X) of bounded linear operators on X
is a ring, where + is the usual addition of operators and · is composition
of operators. In particular, the set of n x n matrices Mn(lF) is a ring,
with the usual matrix addition and matrix multiplication .
(vi) Given a set S, the power set &'(S) forms a ring, where + is the sym-
metric difference and · is intersection:
(vii) The set {True, False} of Boolean truth values forms a ring, where + is
the operation of exclusive OR (XOR); that is, a+ b =True if and only
if exactly one of a and bis True. Multiplication · is the operation AND.
The additive identity 0 is the element False, and the additive inverse of
any element is itself: - a = a.
Again, you should be aware that many people use+ to denote inclusive
OR, but inclusive OR will not satisfy the axioms of a ring.
Remark 15.1.4. Note that the definition of a ring requires addition to be commu-
tative but does not require (or forbid) that multiplication be commutative. Perhaps
the most common example of a ring with noncommutative multiplication is the
ring Mn (lF).
Definition 15.1.5. A ring R is commutative if ab= ba for all a, b ER.
Example 15.1.6.
Since R[x] is itself a ring, we may repeat the process with a new variable
y to get that R[x, y] = R[x][y] is also a ring, as is R[x, y, z], and so forth.
Nota Bene 15.1. 7. As with vector spaces, a subtle point that is sometimes
missed, because it is not a eparate item in the numbered list of axioms, i
that the definition of an operation requires that the operations of addition
and multiplication take their value in R . That is, R must be closed under
addition and multiplication.
15.1. Definition and Examples 577
Unexample 15.1.8.
(i) The natural numbers N with the usual operations of+ and· do not form
a ring because every nonzero element fails to have an additive inverse.
(ii) The odd integers ((J) with the usual operations +and · do not form a ring
because the operation of + takes two odd integers and returns an even.
That is, the operation + is not an operation on odd integers, since an
operation on ((J) is a function ((J) x ((J) -+ ((J), but the range of + is not ((J).
Instead of saying that + is not an operation on ((J), many people say the
set ((J) is not closed under addition.
(iii) 0. x = o.
Proof. The proof is identical to the case of vector spaces (Proposition 1.1. 7), except
for (iv), and even in that case it is similar. To see (iv) note that for each x, y ER,
we have 0 = 0 · x = (y + (- y)) · x = y · x + (-y) · x. Hence, (ii) implies that
(- y) · x = - (y · x) . A similar argument shows that -(y · x) = y · (-x). D
Definition 15.1.11. Let u in a ring R have the property that for every r E R we
have ur = ru = r. In this case u is called unity and is usually denoted 1.
Proof. Suppose that R contains two elements u and u' such that ur = ru =rand
u'r = ru' = r for every r ER. We have u = uu' by the unity property of u', but
also uu' = u' by the unity property of u. Hence u = u'. D
578 Chapter 15. Rings and Polynomials
Multiplicative inverses are also not required in a ring, but when they exist,
they are very useful.
Proof. If there exist elements b, b' E R such that ab = 1 and b' a = 1, then we have
Example 15.1.15. In the ring Z, the only invertible elements are 1 and -1 ,
but in the ring IQ every nonzero element is invertible.
15.1.2 Ideals
Roughly speaking, an ideal is to a ring what a vector subspace is to a vector space.
But this analogy is incomplete because an ideal is not just a subring (a subset that
is also a ring)- it must satisfy some additional conditions that are important for
building quotients of rings.
Remark 15.1.17. Of course the first condition is just saying that I is closed under
addition and subtraction, but the second condition is new to us-I must be closed
under multiplication by any ring element. Roughly speaking, this is analogous to
the condition that vector subspaces :should be closed under scalar multiplication,
but here the analogues of scalars are elements of the ring R.
52 It
is common to call an invertible element of a ring a unit, but this term is easily confused with
unity, so we prefer not to use it.
15.1. Definition and Examples 579
Proof. From the definition of ideal, we have that I is closed under addition and
multiplication, and hence those operations are operations on I. The properties of
associativity for + and ., as well as commutativity of + and distributivity all follow
immediately from the fact that they hold in the larger ring R.
All that remains is to check that the additive identity 0 is in I and that every
element of I has an additive inverse in I. But these both follow by closure under
subtraction. First, for any x EI we have 0 = x - x EI . Now since 0 E I, we have
- x = 0 - x EI. D
Remark 15.1.21. Example 15.1.19 shows that the converse to the previous propo-
sition is false: not every subring is an ideal.
The next proposition is immediate, and in fact it may even seem more difficult
than checking the definition directly, but it makes many proofs much cleaner; it also
makes the similarity to vector subspaces clearer.
Example 15.1.24.
(iii) For any p E JR, the set mp= {f E C 00 (JR; JR) I f(p) = O} of all functions
that vanish at p is an ideal in the ring C 00 (JR; JR).
Unexample 15.1.25.
(i) The set 0 of odd integers is not an ideal in Z because it is not closed
under subtraction or addition.
(ii) The only ideals of IQ are {O} and IQ itself. Any ideal I c IQ that is not
{O} must contain a nonzero element x E J. But since x :/= 0, we al o have
l /x E IQ. Let s be any element of IQ, and take r = s/x E IQ. Closure
under multiplication by elements of IQ implies t hats = (s /x)x = rx E J.
(15.1)
Proof. If x, y E (S), then there exists a finite subset {x 1 , ... , xm} CS, such that
x = I;:': 1 CiXi and y = I::': 1 diXi for some coefficients C1, ... , Cm and d1, ... , dm
(some possibly zero). Since ax+ by= I::': 1 (aci + bdi)xi is an R-linear combination
of elements of S, it follows that ax + by is contained in (S). Since R is commutative,
we also have xa + yb = ax+ by E (S); hence, (S) is an ideal of R. D
Corollary 15.1.28. Let R be a commutative ring. Let {Ij }j=1 be a finite set of
ideals in R. The sum I= Ii+···+ In = {I;7=l aj I aj E Ij} = (Ii U I2 U · · · U In)
is an ideal of R.
E x ample 15.1.30.
(i) The ideal (2) in the ring Z is the set (2) = {. . . , -4, - 2, 0, 2, 4, . . . } of
even integers. More generally, for any d E Z the ideal (d) is the set
(d ) = {... , -2d, -d, 0, d, 2d, . .. } of all multiples of d.
(ii) For any ring R, and x ER, the ideal (x) is the set of all multiples of x .
(iii) The ideal (6, 9) in Z is the set of all Z-linear combinations of 6
and 9. Since 3 = 9 - 6, we have 3 E (6, 9), which implies that (3) C (6, 9).
But since 6 and 9 are both elements of (3) , we also have (6, 9) C (3),
and hence (6, 9) = (3).
(iv) In the polynomial ring C [x, y] , the ideal (x, y) is the set of all polynomials
whose constant term is zero.
582 Chapter 15. Rings and Polynom ials
(v) In the polynomial ring <C[x, y] t he ideal I= (x 2 -y+ l, 2y-2) contains the
element x 2 - y+ 1+~ ( 2y -2 ) == x 2 , and also t he element ~ (2y - 2) = y -1 ,
so (x 2 , y- 1) c I, but also x 2 --y+l = x 2 -(y-1 ) and 2y-2 = 2(y - 1), so
we have I = (x 2 - l,y- 1).
Example 15.1.32. In a commutative ring R with unity 1, for any invert ible
element u E R, the ideal (u), generated by u, is t he entire ring; that is, (u) = R.
To see this, note that for any element r ER, we haver = (ru- 1 )u E (u ).
Not a Bene 15. 1.33. When talking about elements of a ring, parentheses are
still sometimes used to specify orde:r of operations, so you will often encounter
confusing notation like (x 2 - (y - 1)). If (x 2 - (y - 1)) is supposed to mean
an element of R, then (x 2 - (y - 1)) = x 2 - y + 1, but if it is supposed to
be an ideal, then (x 2 - (y - 1)) == (x 2 - y + 1) means the ideal generated
by x 2 - y + 1, that is, the set of all elements of R that can be written as a
mult iple of x 2 - y + 1. The meaning should be clear from the context, but it
takes some getting u ed to; so you'll need to pay careful attention to identify
the intended meaning. Although this notation is confusing and is certainly
suboptimal, it is the traditional notation, so you should become comfortable
wit h it.
The proofs of the remaining results in t his section are all essentially identical
to their vector space counterparts in Section 1.5. T he details are left as exercises.
Proposition 15 .1.35. Supersets of generating sets are also generating sets, that
is, if R is a commutative ring, if (S) =I, and if S C S ' C I , then (S') =I.
Proposition 15.1.37. If Ii, ... , In are ideals in a commutative ring R, then the
product of these ideals, defin ed by IE=l
Ii = n=~= l (f1~ 1 aji) I aji E Ji }, is an
ideal of R and a subset of n~ 1 Ii .
15.2. Euclidean Domains 583
The canonical example of a Euclidean domain is the integers Z with the ab-
solute value v(x) = lxl as the valuation, but another important example is the ring
of polynomials in one variable over a field.
Definition 15.2.2. Define a valuation on the ring IF[x] of polynomials with coef-
ficients in IF by v(p(x)) = degp(x), where the degree degp(x) of p(x) = 2=7=oaixi
is the greatest integer i such that ai is not zero. For convenience, also define
deg(O) = -oo .
53 Almosteverything we do in this chapter with the ring JF[x] will work just as well with F[x],
where F is any field-not just JR and IC. For a review of fields see Appendix B.2.
584 Chapter 15. Rings and Polynomials
Proof. First, observe that lF has no zero divisors; that is, if a, /3 E lF are both
nonzero, then the product a/3 is also nonzero. To see this, assume that a/3 = 0.
Since a =f. 0, it has an inverse, so f3 = (a- 1 a)/3 = a- 1 (a/3) = a - 1 · 0 = 0, a
contradiction. So the product of any two nonzero elements is not zero.
For (i), if a and b are nonzero, then writing out the polynomials a = ao +
a 1 x + · · · + amxm and b = bo + bix + · · · bnxn, with bn and am both nonzero, so that
deg(a) = m and deg(b) = n, we have
Theorem 15.2.4. The ring JF[x] is a Euclidean domain with valuation given by the
degree of the polynomial v(p(x)) = degp(x) .
Proof. First observe that JF[x] has no zero divisors because if a, b E JF[x] are both
nonzero, then deg(ab) = deg(a) + deg(b) > -oo = deg(O).
Given a E JF[x] and any nonzero b E JF[x], let S = {a - bq I q E JF[x] }. If
0 E S, then the proof is done, since there is a q such that a = bq. If 0 .;_ S, let
D = { deg(a - bq) I (a - bq) ES} c N be the set of degrees of elements of S. By
the well-ordering axiom of natural numbers (Axiom A.3.3), D has a least element
d. Let r be some element of S with deg(r) = d, so r =a - bq for some q E JF[x].
We claim d = deg(r) < deg(b). If not, then b = b0 + b1 x + · · · + bnxn and
r = r 0 + r 1 x + · · · +rmxm with m :::'.'. n. But now let r' = r - b:: xm-n ES, so that
the degree-m term of r' cancels. We have deg(r') < deg(r)- a contradiction. D
The degree function also gives a good way to characterize the invertible ele-
ments of JF[x].
r
Proof. For any invertible f E JF[x], we have 0 = deg(l) = deg(f 1 ) = deg(!)+
deg(f - 1 ), which implies that deg(!) = 0. Conversely, if deg(!) = 0, then f = ao E lF
ri
and f-=/:- 0, so = a 01 E lF C JF[x]. D
Proof. Let S = { n E Z 3i EI""- {O}, n = v(i)} be the image of the valuation map
J
v : I""- {O} --+ N. By the well-ordering axiom of the integers, the set S must have a
least element u, and there must be some element d EI ""- 0 such that v(d) = u.
Let (d) c R be the ideal generated by d. Since d EI, we have (d) CI. Given
any i E J, apply the division property (Definition 15.2.l(iv)(a)) to get i = qd + r
for some r with v(r) < v(d) = u. Since d, i E I , we must haver E I , but v(r) < u
contradicts the minimality of u unless r = 0. Therefore, i = qd and i E (d). This
proves that I = (d). D
(ii) The element d divides both a and b, and any element d' with d' la and d' lb
must satisfy d' ld.
Moreover, any element that satisfies one of these properties must satisfy the other
property, and if elements d, e satisfy these properties, then d = ue, where u is an
invertible element of R.
Proof. By the previous proposition, the ideal (a, b) is generated by a single element
d, and because d E (a, b) , we have d = ax+ by. In fact , an element c can be written
as c = ax' + by' for some x' , y' E R if and only if c E (a, b). Moreover, any
c E (a, b) must be divisible by d; hence, by the multiplicative property of valuations
(Definition 15.2.l (iv)(b)), we have v(c) :::'.'. v(d); so v(d) is least in the set of all
nonnegative integers of the form v(ax' +by').
586 Chapter 15 Rings and Polynomials
Since (d) = (a , b) we have dja and djb; conversely, given any d' with d'ja and
d'jb, we immediately have d'j(ax +by) = d. Now given any element d E R of the
form ax + by such that v( d) is least among the nonnegative integers of the form
v(ax' +by'), then the previous proposition shows that (d) = (a, b); hence the second
property must hold.
Conversely, if d is an element such that the second property holds, then by
Proposition 15.2.7 we have (a, b) = (e) for some e E (a, b), and the second property
must also hold fore. Thus we have dje and ejd, so e = ud and d = u'e . Therefore
e = u(u'e) = (uu')e, so e(l - uu') = 0 . Since e is not zero and since R has no zero
divisors, we must have 1 - uu' = 0, or 1 = uu'. Sou is an invertible element of R .
Moreover, (d) = (e) = (a, b), so the first property must also hold. D
Remark 15.2.11. The previous discussion shows that two elements a and b in a
Euclidean domain R are relatively prime if and only if the identity element 1 can
be written 1 = as + bt for some s, t E R .
b == roq1 + r1.
Repeating the process, eventually the remainder will be zero:
a ==bqo + ro,
b == roq1 + r1,
ro == r 1q2 + r2,
r1 == r2Q3 + r3,
15.2. Euclidean Domains 587
Proof. The algorithm terminates in no more than v(b) + 1 steps, because at each
stage we have that 0 ~ v(rk) < v(rk - i), so we have a sequence v(b) > v(ro) >
v (r1) > · · · 2': 0 that decreases at each step until it reaches 0.
Since rn divides Tn-1 , and Tn-2 = Tn - lqn- 1 + Tn, we have that rn divides
rn-2· Repeating the argument for the previous stages shows that Tn divides rn_ 3
and each rk for k = (n - 3), (n - 4) , ... , 1, 0. Hence Tn divides both b and a.
Conversely, given any c that divides a and b, the first equation a = bq0 + ro shows
that clro. Repeating for each step gives clrk for all k . Hence clrn. This implies that
Tn = gcd(a, b). D
0 = (~ + 3) (2x 3 - 12x
2
+ 22x - 12)
a ----.,.__... b
Example 15.2.15. Applying the EEA to the results of Example 15.2. 13 gives
r1 = b - q1ro = b - q1(a - qob) = (1 + qo)b - q1a, sos= -qi and t = (1 + qo)
gives r 1 = as + bt.
Proof. If ab= ac, then ab - ac = 0 and so a(b - c) = 0. Since a =f. 0 and since R
has no zero divisors, then we must have (b - c) = 0, and hence b = c. D
Proof. Since albc, we have be= ad for some d ER. Since a, bare relatively prime,
we have gcd(a, b) = 1, so 1 = ax+ by for some x, y E R . Thus c = axe+ bye =
axe+ yad = a(xc +yd), which implies that ale. D
we have that x = mnmn-1Z is divisible by m~_ 1 = mnmn-1· Now make a new list
m1, ... , mn - 2, m~ _ 1 . This list has length n - 1, and the elements in it are pairwise
relatively prime, and each divides x. By the induction hypothesis x is divisible by
m1m2 . . . mn - 2m~-l = I1~ 1 mn. D
Corollary 15.3.6. If Ii, h, ... , In are ideals in a Euclidean domain such that
Ij = (mj) and the generators mi are pairwise relatively prime, then the ideal
(m1m2 · · · mn) generated by the product m1m2 · · · mn satisfies
n n
(m1m2 .. ·mn) = n
i=l
Ii= II h
i=l
Proof. If a E R is prime, and if a = be, then a divides be, so alb or ale. Without
loss of generality, assume alb. We have b = ax for some x ER, and a· l = be = axe.
The cancellation property (Proposition 15.3.1) gives 1 = xe, which implies that
both x and e are invertible.
Conversely, assume an irreducible element a divides be for some b, e E R . If
gcd(a, b) = 1, then Proposition 15.3.4 gives ale, as required.
If gcd(a, b) =dis not invertible, then a= dx for some x E R. By irreducibility
of a, the element x is invertible in R. Thus, the element ax- 1 = d divides b, and
hence a divides b. D
Remark 15.3.9. Not every prime element in IF[x] has degree l. In particular, the
element x 2 + 1 E JR[x] is irreducible in JR[x] but has degree 2.
Before proving the main result of this section, we need one more result about
the multiplicative property of the valuation in a Euclidean domain.
Proof. If tis invertible, then v(s) ::::; v(st) ::::; v(stC 1) = v(s), so v(s) = v(st).
Conversely, if v(s) = v(st), then s = (st)q + r for some r E R with v(r) < v(st) .
But r = s - (st)q = s(l -tq); so v(s) ::::; v(s(l -tq)) = v(r) unless 1-tq = 0. Hence
t is invertible. D
Proof. Let S be the set of all nonzero, noninvertible elements of R that cannot be
written as an invertible element times a product of primes. The set V = {v(s)js E
S} is a subset of N and hence has a smallest element v 0 . Let s E S be an element
with v(s) = v 0 . Since s is not a product of primes, it is not prime, and hence
not irreducible. Therefore, there exist a, b E R such that ab = s and a and b are
not invertible. Proposition 15.3.10 implies v(s) > v(a) and v(s) > v(b), and hence
a and b can be written in the desired form . But s = ab implies that s can also be
written in the desired form. Therefore, V and S must both be empty.
To prove uniqueness, assume that a = O:P1P2 · · · Pn and a = f3q1q2 · · · qm are
two decompositions that do not satisfy the conclusion of the theorem, that is, either
n # m or n = m, but there is no rearrangement of the qj such that every qi = UiPi
for invertible ui . Assume further that n is the smallest integer for which such a
counterexample exists.
If n = 0, then a= o: = (3q 1 · · · qrn, so a is invertible. Thus, qija implies that qi
is also invertible for every i, and hence qi is not a prime. So we may assume n > 0.
Since Pn is prime, it must divide qi for some i E { 1, ... , m}. Rearrange the qj
so that Pn divides qm . Thus PnUn = qm for some Un. But since qm is prime, it is
irreducible. Since Pn is not invertible,, Un must be invertible.
Now divide Pn out of both sides of the equation O:P1P2 · · · Pn = f3q1q2 · · · qm to
get O:P1P2 · · · Pn-1 = f3q1 q2 · · · qm-1 Un. Redefining qm-1 to be qm-1 Un gives two new
decompositions into primes O:P1P2 · · · Pn-1 = f3q1 q2 · · · qm-1, and by the minimality
assumption on the original counterexample gives m-1 = n-1, and (after reordering)
Pi= uiqi, where each ui is an invertible element in R. But this proves that the
supposed counterexample also satisfies the conclusion of t he theorem. D
Example 15.3.12.
(i) The integer 1728 can be written as 33 26 or (-2) 3 23 (-3) 3 and in many
other ways, but the fundamental theorem of arithmetic says that, after
rearranging, every prime factorization of 1728 must be of the form
±3 . ±3 . ±3 . ±2 . . . ± 2 .
'----.._....--'
6
15.3. The Fundamental Theorem of Arithmetic 591
(x 2 + 1) 2 (x - 7) 3 (x 2 + x + 1),
Remark 15.3.14. This theorem says that every element is almost unique as a
product of primes. In both cases R = IF[x] and R = Z we can make the uniqueness
more explicit.
If R = Z, then the only invertible elements are 1 and - 1; if a > 0 we may
require all primes to be positive. We have ui = 1 for all i, and the decomposition
is completely unique (up to reordering).
If R = IF[x], then the invertible elements are precisely the elements of IF (cor-
responding to degree-zero polynomials). If both a and all the primes have their
leading (top-degree) coefficient equal to 1, then we can again assume that all the Ui
are 1, and the decomposition is completely unique (up to reordering).
Proof. Assume, by way of contradiction, that p(x) is prime in C[x] and has degree
n > 1. By the fundamental theorem of algebra (Theorem 11.5.4), p(x ) has at least
one root, which we denote >.. Dividing p(x) by (x - >.) gives
where q(x) has degree n-1, and where r(x) has degree less than deg(x->.n); hence
r is constant . Moreover, 0 = p(>.) = 0 + r, and hence r = 0. Therefore, (x - >.)
divides p(x), and pis not prime. D
15.4 Homomorphisms
A linear transformation is the right sort of map for vector spaces because it preserves
all the key properties of a vector space- vector addition and scalar multiplication.
Similarly, a ring homomorphism is the right sort of map for rings because it preserves
the key properties of a ring-addition and multiplication.
Just as with vector spaces, kernels and ranges (images) are the key to un-
derstanding ring homomorphisms, and invertible homomorphisms (isomorphisms)
allow us to identify which rings are ''the same."
15.4.1 Homomorphisms
The next proposition is immediate, and in fact it may even seem more difficult
than checking the definition directly, but it makes many proofs much cleaner; it also
makes the similarity to linear transformations clearer.
Example 15.4.3.
(i) For any n E Z the map Z--+ Zn given by x H [[x]]n is aring homomorphism.
(ii) For any interval [a, b] C JR, the map IF'[x] --+ C([a, b]; IF') given by sending
a polynomial f(x) E IF'[x] to the function on [a, b] defined by f is a
homomorphism of rings.
15.4. Homomorphisms 593
(iii) For any p E lF, the evaluation map ep : lF[x] ---+ lF defined by f(x) H f(p)
is a homomorphism of rings.
(iv) For any A E Mn(lF), the evaluation map eA : IF[x] ---+ IF[A] C Mn(IF)
defined by f(x) H f(A) is a homomorphism of rings .
Unexample 15.4.4.
(i) For any n E z+, the map Cn((a, b); lF) ---+ cn- 1 ((a , b) ;IF) defined by
f(x) H ~~) is not a homomorphism-it preserves addition
d(f + g) = df + dg
__d_x_ dx dx'
Remark 15.4. 7. Unlike its vector space analogue, a homomorphism does not nec-
essarily map ideals into ideals. But if the homomorphism is surjective, then it does
preserve ideals.
594 Chapter 15. Rings and Polynomials
Proof. Note that JV (f) and Im fare both nonempty since f(O) = 0.
Example 15.4.11. &. For any A E Mn(IF), let eA : IF[x] -+ IF[A] C Mn(IF) be
the evaluation homomorphism defined by f(x) f--t f(A) . The kernel JV (eA)
consists of precisely those polynomials p(x) such that p(A) = 0. In particular,
JV (eA) contains the characteristic polynomial and the minimal polynomial.
Since every ideal in IF[x] is generated by any element of least degree (see
Proposition 15.2.7) , and since the minimal polynomial is defined to have the
least degree of any element in IF[x] that annihilates A, the ideal JV (eA) must
be generated by the minimal polynomial.
= 1- 1(f(ax1 + bx2))
= ax1 + bx2
= f- 1(s)r 1(y1) + 1- 1(t)r 1(y2) .
Therefore 1- 1 is a homomorphism. D
Example 15.4.17.
(i) Let R = {True, False} be the Boolean ring of Example 15.1.3(vii) , and let
S be the ring Z 2 . The map Falser-+ 0 and Truer-+ 1 is an isomorphism.
596 Chapter 15. Rings and Polynomials
(ii) Let S C M 2 (JR) be the set of all 2 x 2 real matrices of the form
with a, b ER
Proof. The proof is identical to its counterpart for invertible linear transformations
(Theorem 2.2.12). D
Propos ition 15.4.20. Let {R1, R2, ... , Rn} be a collection of rings. The Cartesian
product
n
R =IT Ri = R1 X R2 x · · · x Rn= {(a1, a2, .. . , an ) I ai E Ri}
i= l
forms a ring with additive identity (0, 0, . . . , 0) and with componentwise addition
and multiplication. That is, addition and multiplication are given by
(i) (a1 , a2, ... , an)+ (b1, b2, .. . , bn) = (a1 + b1, a2 + b2, ... , an+ bn),
(ii) (a1 , a2, ... , an) · (b1, b2, ... , bn) == (a1 · b1, a2 · b2, ... , an · bn)
for all (a1 , a2, ... , an), (b1, b2, . .. , bn) ER.
Proposition 15.4.22. Given a collection of rings Ri, ... , Rn, and given any i E
{1, ... , n}, the canonical projection Pi: TI7=l Rj-+ R i, given by (x1, . .. , Xn) H Xi,
is a homomorphism of rings.
Proof. We check Pi((x1, . . . ,Xn) + (y1, ... ,yn)) = Pi((x1 +y1, ... ,Xn +Yn)) =Xi +
Yi = Pi(x1, ... , Xn) + Pi(Y1, . . . , Yn)· The check for multiplication is similar. 0
Remark 15.4.23. It is straightforward to check that all the results proved so far
about finite Cartesian products also hold for infinite Cartesian products.
[[y]] = y +I.
We call these equivalence classes cosets of I and write them as either y +I or [[y]]r.
If the ideal I is clear from context, we often write [[y]] without the I.
Because they are equivalence classes, any two cosets are either identical or
disjoint. That is, if (x+I) n (y+J) = [[x] n [[y]] # 0, then x+I = [[x]] = [[y]] = y+I.
Proof. Given any [[a]] E Rf I, the division property gives a= bq+r with v(r) < v(b),
so r ES. Define a map¢: Rf I--+ S by [[a]]1 r-t r. First we show this map¢ is
well defined. Given any other a' with [[a']] = [[a]], we have a' - a EI= (b); so there
15.5. Quotients and the First Isomorphism Theorem 599
Remark 15.5.6. The previous proposition means that we can write the set Z/(n)
as the set {[[OJ], [[1]], [[2], . . . , [[n - 1]]}, and, similarly, any coset in JF[x]/(J) can be
written uniquely as [[r]] for some r of degree less than deg(!).
The set of all cosets of I has a natural addition and multiplication that makes
it into a ring.
Lemma 15.5.7. Let I be an ideal of R. The operations EB: R/I x R/I ~ R/I
and D: R / I x R/I ~ R/I given by
Example 15.5.10.
[[ax+ b]] D [cx+d]] = [[acx 2 +(ad+ bc)x+ bd]] = [[(ad+ bc)x + (bd - ac)]].
R f S
f\~
L
Rf JV (f) Imf,
i
where the right-hand vertical map i is just the obvious inclusion of the image off
into S, and the bottom horizontal map J is an isomorphism.
Example 15.5.17.
(i) If Risa ring, then Rf R ~ {O} . Define the homomorphism L: R-+ {O}
by f(x ) = 0 for all x E R . The kernel of f is all of R, and the first
isomorphism theorem gives the isomorphism.
(ii) For any p E [a, b] c JR let mp = {f E C([a, b]; IF) I f(p) = O}. We
claim that C([a, b]; IF)fmp ~ IF. To see this, use the homomorphism
ep : C([a, b]; IF) -+ IF given by ep(f) = f(p). The map ep is clearly
15.6. The Chinese Remainder Theorem 601
(iii) Example 15.5.lO(ii) shows that the ring of complex numbers C is iso-
morphic to JR [x]/(x 2 + 1). The first isomorphism theorem gives another
way to see this: define a map JR[x] -+ C by f(x) r-+ f(i), where i E C
is the usual square root of -1. It is easy to see that this map is a ho-
momorphism. Its kernel is the set of all polynomials f E JR[x] such that
f(i) = 0, which one can verify is equal to (x 2 + 1). Therefore, we have
JR[x]/(x 2 + 1) = JR[x]/JY (f) ~ C.
Example 15.5.18. & Example 15.4.12 shows that for any p E IF' the kernel
of the evaluation map eP : IF'[x] -+ IF' is given by JY (ep) = (x - p). The
evaluation map is surjective because for any a E IF' the constant function
a E IF' [x] satisfies ep(a) = a. The first isomorphism theorem implies that
IF'[x]/(x - p) ~IF' .
Example 15.5.19. & Example 15.4.11 shows that for any A E Mn(C)
the kernel of the evaluation map eA : C[x] -+ C[A] is given by JY (eA) =
(p(z)) , where p(z) is the minimal polynomial of A. But the definition of eA
shows that it is surjective onto C[A], so the first isomorphism theorem gives
C[x]/(p(z)) ~ C[A].
x = a1 (mod m1),
x = a2 (mod m2),
(15.3)
x =an (mod mn)
has a unique solution in R/(m 1m2 · · · mn)· We sometimes call the system (15 .3) the
Chinese remainder problem, or the CR problem. It is easy to show that if
the solution exists, it is unique. In the case that R = Z, it is also easy to give
a nonconstructive proof that there is a solution, but often we want to actually
solve the CR problem-not just know that it has a solution. Also, in a more general
ring, the nonconstructive proof usually doesn't work.
The way to deal with both issues is to construct a solution to the CR problem.
To do this we need either the Lagrange decomposition or the Newton decomposition.
These give an explicit algorithm for solving the CR problem, and as a nice side
benefit they also give a proof that every rational function has a partial fraction
decomposition.
f = Ji + · · · + Jn (mod M),
with each Ji =0 (mod mj) for i ::/:- j. Alternatively, we can also write
f = Ji + · · · + f n + H,
with each fi =0 (mod mj) and 0 :S: v(fi) < v(M) for every j ::/:- i, and with H =0
(mod M) .
Proof of the Lagrange Decomposition Algorithm. For each i E {1, ... , n} let
7ri = TI#i mj. Since all the mk are pairwise relatively prime, each 7ri is relatively
prime to mi, and hence there exist elements Si, ti E R such that 7riSi + miti = 1
(these can be found by the EEA). Let
15.6. The Chinese Remainder Theorem 603
By definition, Li = 1 (mod mi) and Li = 0 (mod mj) for every j -=/:- i. Let L =
2:~ 1 Li , so that L = 1 (mod mi) for every i E {1, .. . , n }. This means that 1- L E
(mi) for every i, and by Corollary 15.3.5 the product M must divide 1 - L.
For all i E { 1, ... , n} let Ji = f · Li, and let fI = f · ( 1 - L), so that we
have f = 2:~ 1 Ji+ fI with h = =
f · Li 0 (mod mj) for every j -=/:- i and fI 0=
(mod M).
Now using the division property, write each h uniquely as QiM + fi for some
Qi and some Ji with 0:::; v(fi) < v(m1m2 · · · mn). This gives
Remark 15.6.3. The Lagrange decomposition is often used in the special case
where R = IF[x] and mi = (x - Pi) for some collection of distinct points P1, .. . , Pn E
IF. In this case, we do not need the EEA to compute the Li because there is a simple
formula for them:
x - P]
Li= II
- -- .
#i Pi - Pj
(15.4)
and 'lriSi + miti = 1 means precisely that 'lriSi = 1 (mod mi); thus, Si is the
multiplicative inverse of 'lri after reducing modulo mi = (x - Pi)· As shown in
604 Chapter 15. Rings and Polynomials
Example 15.5.18, the first isomorphism theorem says that reducing modulo the ideal
(x - Pi) is exactly equivalent to evaluating at the point Pi, thus 'Tri = TI#i(Pi - PJ)
(mod (x - Pi)) and si = TI#i Pi ~Pj.
Proof. Use the division property to write f = m1q1 + bo , with v(bo) < v(m1),
and then again q1 = m2q2 + b1, with v (b1) < v(m2), and so forth until qn-1 =
mnbn + bn-1 · This gives
Corollary 15.6.5. For any field lF and any f, g E lF[x] , then for any positive integer
n, we may write f =Jo+ fig+ f2g 2 + · · · + fngn such that for every i < n we have
deg(fi) < deg(g) .
Remark 15.6. 7. Here is an equivalent way to state the conclusion of the CRT:
Given pairwise relatively prime elements m 1, ... , mn in a Euclidean domain R, for
any choice ([[a 1]]m 11 . • • , [[an]]m,J E R/(m1) x · · · x R/(mn) , there exists a unique
[[x]]m E R/ (m) such that [[xlJm, = [[ai]]m; for all i E {1, . . . , n}. That is, the system
of n equations
(15 .6)
Set b0 = a 1. Since m 1 is relatively prime to mz, the EEA gives s1, s2 such that
s 1m 1 + s2m2 = 1, so that s 1m1 = 1 (mod m2)· Set b1 = s1(a2 - bo) . For each
606 Chapter 15. Rings and Polynomials
i E {1, ... , n - 1}, since mi+ 1 is relatively prime to m1m2 · ··mi, we can find Si such
that sim1m2 · ··mi= 1 (mod mi+1) . Setting
(15.7)
Example 15.6.11. The Lagrange decomposition and the CRT can simplify
the process of dividing polynomials, especially when the dividend has no re-
peated prime factors . For example, consider the problem of computing the
remainder of a polynomial f when dividing by g = (x)( x - l)(x + 1). If we
have already computed the Lagrnnge interpolants Li for the prime factors
m1 = x, and m2 = x - 1 and m3 = x + 1 of g, then division by g is fairly
quick, using the CRT.
15.6. The Chinese Rem ainder Theorem 607
1
L3 = (x - l)(x - 0)/((-2)(-1)) = (x 2 - x). (15.10)
2
The CRT says that the map
¢ : JR[x]/((x)(x - l)(x + 1))-+ JR[x]/(x) x JR[x]/(x - 1) x JR[x]/(x + 1)
is an isomorphism, and that the inverse map
'ljJ : JR[x]/(x) x JR[x]/(x - 1) x JR[x]/(x + 1)-+ JR[x]/((x)(x - l)(x + 1))
= (x - 1) 4 + G) (x - 1) 3 + G) (x - 1)
2
+ G) (x - 1) + 1+5(x - 1) + 7
= 9(x - 1) + 8 mod (x - 1) 2.
608 Chapter 15. Rings and Polynom ials
f S1 Sn
- =h+ - + ·· ·+ - ,
G 91 9n
with h, s1, . .. , Sn E IB'[x], and deg(si) < deg(gi) for every i E {1, . . . , n}.
15.6. The Chinese Remainder Theorem 609
Proof. The elements gi satisfy the hypothesis of the Lagrange decomposition theo-
=
rem (Theorem 15.6.1), so we may write f = fi + · · ·+ fn + H, with fi 0 (mod gJ)
and v(fi) < v(G) for all i -j. j , and H = 0 (mod G).
Thus, we have
The relation f i = 0 (mod gJ) for each j -1- i is equivalent to gJ If i for each j -j. i.
Since the element s 9i are relatively prime, we must have (IJ#i 9J) Ik Hence fi =
(IJ#i9J)s i for some si E IF[x]. This gives
f S1 Sn H
-G = -91 +···+-+-.
9n G
f S1 Sn
- = - +···+ - +h.
G 91 9n
Finally, we have
f Sn1 Sn2
-=h+
G
(Su
-+
9~1
S12
-9r1- l
S1a 1 )
-+···+-
91
+···+ ( -+
9~n
- - +·· ·+ -Snan)
9~n -l 9n '
with h E IF [x] and siJ E IF[x] such that deg(siJ) < deg(9i) for all i and j .
Theorem 15.7.1 (Lagrange Interpolation). Given points (x1 , Y1), ... , (xn, Yn)
E lF 2 with all the Xi distinct, let
(15.11)
The polynomial
n
f(x) = L YiLi(x) (15.12)
i=l
is the unique polynomial of degree less than n such that f (xi) = Yi for every i E
{l, ... , n} .
Remark 15. 7.2 . Naively computing t he polynomials Li in the form above and
using (15.12) is not very efficient - the number of operations required to compute
the Lagrange interpolants this way grows quadratically in n . Rewriting the Li in
barycentric form makes them much better suited for computation. We write
where
which is known as is the first barycentric form. Moreover, noting that the identity
element (the polynomial 1) satisfies
n
1 =p(x) """"'
L _ w.J_ '
j=i X - Xj
I: __3!2_
""""' __3!2_ y .
L
j=i
n
x-x
J
J
(15.13)
Lx~ x
p(x) 1
j=iX - Xj j=i J
We call (15.13) the second barycentric form. It is not hard to see that the number
of computations needed to compute (15.13) grows linearly in n once the weights
wi, ... , Wn are known. Efficient algorithms exist for adding new points as well; see
[BT04) for details.
Theorem 15.7.3 (Newton Interpolation). Given points (xi, Yi), ... , (xn, Yn) E
lF 2 , for each j E { 0, ... , n - 1} let
No(x) = 1,
N1(x) = IJ(x - xi), and
i-S:,j
N1(xk) = IJ (xk - xi)
i-S:,j
and define coefficients !3i E lF recursively, by an analogue of the Garner
formula (15.7):
and
f3 _ YJ+i - (f3o + f3ini(XJ+i) + · · · + f31-in1 - i(xJ+i))
f3o = Yi J - nj(Xj+i) .
(15.14)
The polynomial
n-i
f(x) = L f3iNi
i= O
is the unique polynomial of degree less than n satisfying the conditions f (xi) = Yi
for every i E {1, ... , n} .
612 Chapter 15. Rings and Polynomials
Remark 15.7.4. Regardless of whether one uses Newton or Lagrange, the final
interpolation polynomial created is the same (provided one uses exact arithmetic).
The difference between them is just that they use different bases for the vector space
of polynomials of degree less than n . Each of these bases has its own advantages and
disadvantages. When computing in floating-point arithmetic the two approaches do
not give exactly the same answers, and the Newton approach can have some stability
problems, depending on the order of the points (x1, Y1), ... , (xn, Yn)· Traditionally,
Newton interpolation was preferred in most settings, but more recently it has be-
come clear that barycentric Lagrange interpolation is the better algorithm for many
applications (see [BT04]).
~+R7
II C[x]/(x - >-r"
(15 .15)
.AEo-
Here the map 7f takes f(x) to the tuple (f mod (x - >- 1 r"1, ...,
f mod (x - >-kr"k ),
where u(A) = {>-1, . . . ,>.k}. Also, 'I/; is t he map that takes an element aol +
alA + · · · + aeAe E C[A] and sends it to 7r (ao + a 1 x + · · · + aexe). In the case
that A is semisimple (m>-. = 1 for every A E u(A)), the map 7f simplifies to
1f(f) = (J(>-1), ... ,f(>-k)) .
The Lagrange decomposition (Theorem 15.6.1) guarantees the existence of
polynomials L>-. E C[x] for each >- E u(A) such that 2= >-. Eo-(A) L>-. = 1 mod p(x) and
L>-. = 0 mod (x - µ)m,, for every eigenvalueµ =f. A. This is equivalent to saying that
7r(L>-.) = (0, ... , 0, 1, 0, ... , 0), where the 1 occurs in the position corresponding to>..
In the case that m>-. = 1 for every A E u(A), the L >-. are just given by Theorem 15.7.1,
x- µ
L>-.(X) = II
µ Eo- (A)
>. - µ'
µ-cop)..
but if A is not semisimple, the multiplicities m>-. are not all 1 and the formula is
more complicated (see (15.18) below) .
A=[~~~],
0 0 5
which has minimal polynomial p(x) = (x - 3) 2 (x - 5). We have
L3 = -
(x - 3)(x - 5)
4
x- 5
- - - = - (x - 5)
2
(x--3
-+-
4
1) 2
and
Ls = (x - 3)2
4
614 Cha pter 15. Rings and Polynomials
(x-3)(x-3--2) x - 3-2
L3 = - 4 - 2
= 2(x 4- 3)
- -
x -
2
3
+ 1 = 1 mod (x - 3) and
2
L3 = 0 mod (x - 5),
[H ~] ~~
Similarly,
Note also that eA((x - 3)L3) =(A- 3I)P3 = D3 is the eigennilpotent associ-
ated to .A= 3.
The next theorem shows that the appearance of the eigenprojections and eigen-
nilpotents in the previous example is not an accident.
T heorem 15.7.7. Given the setup described above for a matrix A E Mn(C), the
eigenprojection P>. is precisely the image of L>. under the map eA : C[x] --7 C[A] c
Mn(C) . That is, for each A E u(A) we have
(15.16)
(15 .17)
(i) Q~ = Q>..
(ii) Q>.Qµ = 0 when .A =J. µ.
Since 'I/; is an isomorphism (see (15 .15)), it suffices to verify these properties for
'I/;( QA) = n(LA)· For (i) we have
and a similar argument gives (ii). Item (iii) follows from the fact that LA LA =
1 mod p(x) where p(x) is the minimal polynomial of A, and (iv) follows from the
fact that
and the fact that C[A] is commutative. Finally, for (v) observe that
= n(x) L n(LA)
AE<r(A)
= n(x) = 'l/;(A),
B = [~ ~] '
with minimal polynomial x 2 - 5x. The CRT gives an isomorphism <p: C[B] ~
C[x]/x x C[x]/(x-5) ~ C x C. The Lagrange interpolants are Lo= -(x - 5)/5
and L 5 = x/5, and again we have <p(Lo) = (1,0) and ip(L5) = (0, 1). Applying
the evaluation map gives
1
Po= eB(Lo ) = -5(B - 51) = -51 [- 4 2]
-l
2
and
1
P5 = eB(L5) = 5(B).
2
It is easy to see that P6 = Po, that (iB) = iB, and that BPo = 0, as
expected for the eigenprojections.
616 Chapter 15 Rings and Polynomials
Proposition 15.7.9. Given f E lF[x], given p E lF, and given a positive integer n,
if f =
0 (mod (x - p)n), then for every nonnegative integer j < n, we have
Proof. The statement f =0 (mod (:r - pr) means that f = (x - p)ng for some
g E lF[x] . Repeated application of the product rule gives
from which we immediately see that (x - p)n-j divides JU), which means that
JU) = 0 (mod (x - p)n-j). D
Proposition 15. 7 .10. Given f E lF [x] and p E F and a positive integer n, we have
n-1
f =L i=O i .
~; (x - p)i mod (x - p)n
if and only if for every nonnegative integer j < n we have JU) (p) = ai.
Proof. If f = L,"::01 'Tf- (x - p) i mod (::c - p) n, then the previous proposition gives
-dj . ( J -
dxJ
n-1 .
'"""a, (x - p)i
~ i!
)
=0 (mod (x - p)n-j)
i=O
and hence
n- j - 1
j Ul - L a%~k (x - p)k =O (mod (x - pr- j).
k=O
in IF, let M = I:7= 1 mj, and let P(x) = IT:'.: 1 (x -xi)mi+ 1. Use the partial fraction
decomposition ( Corollary 15. 6.15) to write
For each i E { 1, ... , n} define L i to be P times the i th term of the partial fraction
decomposition,
and define
The function
n
f = L fi L i
i=l
is the unique polyn omial in IF [x] of degree less than M such that jUl(xi ) = Yij ) for
every i E {1 , ... , n} and every j E {O, ... , mi}.
Proof. By Proposition 15.7.10 the condition that for each i ::; n and each j ::; mi
we have jUl(p) = Yij) is equivalent to the condition that f = fi (mod (x - xi)m;+l)
for each i < n.
By construct ion, we have 1 = I::'.: 1 Li with Li= 0 (mod (x - x 1 ))m1 for any
j -=J. i. Thus, we also have Li = 1 (mod (x - xi))m 1 and f = f i (mod (x - xi)m' )
for any i E {1, ... , n }. T herefore, the derivatives off take on the required values.
To prove uniqueness, note that the ideals ( (x - Xi) m; ) are all pairwise relatively
prime and the int ersection of these ideals is exactly (P), hence the CRT guarantees
618 Chapter 15. Rin gs and Polynomials
there is a unique equivalence class [[g]]P in IF[x]/(P) such that for every i we have
[[glJ(P) = [[fi]](P)· Finally, by P roposition 15.5.5, there is a unique f E IF[x] with
deg(!) < deg( P) such that f E [[g]] (P) . Hence the solution for f is unique. D
Moreover, we have
1 0 2 1
!1 = - (x-l) + - (x-1) =2x - l
O! l! '
h = ~(x
o. -
0
2) = 3,
and hence
Exerci ses
Note to the student: Each section of this chapter has several corresponding
exercises, all collected here at the end of the chapter. The exercises between the
first and second line are for Section l, the exercises between the second and third
lines are for Section 2, and so forth.
You should work every exercise (your instructor may choose to let you skip
some of the advanced exercises marked with *) . We have carefully selected them,
and each is important for your ability to understand subsequent material. Many of
the examples and results proved in the exercises are used again later in the text.
Exercises marked with .& are especially important and are likely to be used later
in this book and beyond . Those marked with tare harder than average, but should
still be done.
Exercises 619
Although they are gathered together at the end of the chapter, we strongly
recommend you do the exercises for each section as soon as you have completed
the section, rather than saving them until you have finished the entire chapter. The
exercises for each section are separated from those for other sections by a
horizontal line.
15.1. Prove that for any positive integer n the set Zn described in Example 15. l.3(iii)
satisfies all the axioms of a ring. Prove, also, that multiplication is commu-
tative and that there exists a multiplicative identity element (unity).
15.2. Fill in all the details in the proof of Proposition 15.1.9.
15.3. Prove that for any ring R, and for any two elements x, y E R we have
(-x)( - y) = xy.
15.4. In the ring Z, identify which of the following sets is an ideal, and justify
your answer:
(i) The odd integers.
(ii) The even integers.
(iii) The set 3Z of all multiples of 3.
(iv) The set of divisors of 24.
(v) The set {n E Z In = 3k or n = 5j} of all multiples of either 3 or 5.
15.5. Provide an example showing that the union of two ideals need not be an ideal.
15.6. Prove the following:
(i) The ideal (3, 5) in Z generated by 3 and 5 is all of z. Hint: Show
1 E (3, 5).
(ii) The ideal (x 2 ,x 2 +x,x+1) in JF'[x] is all oflF'[x].
(iii)* In the ring q x, y] the ideal (x 2 - y 3 , x - y) is a proper subset of the
ideal (x + y , x - 2y).
15.7.* Prove Proposition 15.1.34.
15.8.* Prove Proposition 15.1.35.
15.9.* Prove Theorem 15.1.36.
15.10.* Prove Proposition 15.1.37.
15.11. Let a= 204 and b = 323. Use the extended Euclidean algorithm (EEA) to
find gcd(a, b) in Z as well as integers s, t such that as+ bt = gcd(a, b). Do
the same thing for a= x 3 - 3x 2 - x + 3 and b = x 3 - 3x 2 - 2x + 6 in Q[:r].
15.12. Prove that if pis prime, then every nonzero element a E Zp has a multiplica-
tive inverse. That is, there exists x E Zp such that ax = 1 (mod p). Hint:
What is gcd(a,p)?
15.13. Find the only integer 0 < a < 72 satisfying 35a = 1 (mod 72) . Now find
the only integer 0 < b < 72 satisfying 35b = 67 (mod 72). Hint: Solve
35a + 72x = 1 with the EEA.
15.14. Find a polynomial q of degree 1 or less such that (x+l)q = x+2 (mod x 2 +3).
620 Chapter 15. Rings and Polynomials
15.15. Prove that for any composite (nonprime) integer n, the ring Zn is not a
Euclidean domain. If p is prime, prove that Zp is a Euclidean domain. Hint:
Use Exercise 15.12.
15.16. Prove that JF[x, y] has no zero divisors but is not a Euclidean domain. Hint:
What can you say about the ideal (x, y) in JF[x, y]?
15.17.* Implement the EEA for integers in Python (or your favorite computer lan-
guage) from scratch (without importing any additional libraries or methods).
Your code should accept two integers x and y and return gcd(x, y) , as well
as a and b such that ax+ by= gcd(x, y).
15.32. Let I = (3) C Z12 be the multiples of 3 in Z 12. Write out all the elements
of Z12/ I and write out the addition and multiplication tables for Z 12/ I. Do
the same for I = (8) c Z 1 2.
15.33. Prove that the ring JF[x]/(x - a) is isomorphic to lF for any a E JF.
15.34. Prove that the ring JF[x]/(x - 1) 3 is isomorphic to the set {a 0 + a 1 (x -1) +
a2(x - 1) 2 I ai E lF} with addition given by
More generally, for any ,\ E lF show that the ring JF[x]/((x - .\)n) can be
written as u=~=~ ak(X - >.)k I ai E lF} with the addition given by
15.35. If,\ E lF and if 7r: lF[x] --t lF[x]/((x - .\)n) is the canonical epimorphism, then
for any k EN, write 7r (xk) in the form I:]:~ aj(x - ,\)J.
15.36. Recall that an idempotent in a ring is an element a such that a 2 = a. The
element 0 is always idempotent, as is 1, if it exists in the ring. Find at least
one more idempotent (not 0 and not 1) in the ring IR[x]/(x 4 + x 2 ). Also find
at least one nonzero nilpotent in this ring.
15.37. Prove Lemma 15.5.14.
15.38. Prove Theorem 15.5.15.
15.39.* Prove that in any commutative ring R the set N of all nilpotents in R
forms an ideal of R. Prove that the quotient ring R/ N has no nonzero
nilpotent elements.
622 Chapter 15. Rings and Polynomials
15.40.* Prove the second isomorphism theorem (compare this to Corollary 2.3.18,
which is the vector space analogue of this theorem): Given two ideals I, J of
a ring R, the intersection I n J is an ideal of the ring I, and J is an ideal of
the ring I+ J. We have
15.41. In each of the following cases, compute the Lagrange and the Newton decom-
positions for f ER relative to the elements m1, ... , mn:
(i) f = 11 in R = Z, relative to m1 = 2, m2 = 3, m3 = 5.
(ii) f = x 4 - 2x + 7 in R = Il~[x], relative to m1 = (x - 1),
m2 = (x - 2),
m 3 = (x - 3). Hint: Consider using the method of Example 15.6.11 to
reduce the amount of dividing you have to do.
15.42. A gang of 19 thieves has a pile containing fewer than 8,000 coins. They try
to divide the pile evenly, but there are 9 coins left over. As a result, a fight
breaks out and one of the thieves is killed. They try to divide the pile again,
and now they have 8 coins left over. Again they fight, and again one of the
thieves dies. Once more they try to divide the pile, but now they have 3 coins
left.
(i) How many coins are in the pile?
(ii) If they continue this process of fighting, losing one thief, and redividing,
how many thieves will be left when the pile is finally divided evenly with
no remainder?
15.43. Let A E Mn(q be a square matrix with minimal polynomial p(x) and eigen-
r
values >. 1 , .. . , >.k with p(x) = (x - >. 1 1 · · • (x - >.k)mk. Prove that the ring
C[A] is isomorphic to the product of quotient rings C[x]/(x - >. 1 )m 1 x · · · x
C[x]/(x - >.k)mk . Hint: Use the result of Example 15.5.19.
15.44. Fix a positive integer N and let w = e- 27ri/N . Prove that the map§ : C[t] ---+
<C x ... x <C, defined by §(p(t)) = (p(w 0),p(w 1), ... ,p(wN-l)), induces an
isomorphism of rings from C[t]/(tN - 1) to the ring <CN = <C x · · · x <C.
This isomorphism is called the discrete Fourier transform, and it plays an
important role in signal processing and many other applications. Hint: Use
the results of Exercise 15.33
15.45.* Implement the Lagrange decomposition algorithm for integers in Python
(or your favorite computer language) using only your previous implementa-
tion of the EEA (Exercise 15.17), without importing any other libraries or
methods. Your code should accept an integer x and a tuple (m 1 , . .. , mn) of
pairwise-relatively-prime integers and return a tuple (x 1 , ... ,xn) such that
x = 2::::~ 1 Xi (mod TI~=l mi) and such that Xi = 0 (mod mj) whenever
i=/=j.
15.46.* Implement the Newton- Garner algorithm for solving the CR problem for
integers in Python (or your favorite computer language) using only your
previous implementation of the EEA (Exercise 15.17), without importing
Exercises 623
any other libraries or methods. Your code should accept a tuple (a1, ... , an)
of integers and a tuple (m 1 , .. . , mn) of pairwise-relatively-prime integers and
return an integer 0 :::; x < IT~ 1 Imi I such that x = ai (mod mi) for every i.
15.47. Find a polynomial f(x) E Q[x] such that f(l) = 2, f(2) = 3, f(3) = 5, and
f(4) = 7 using
(i) Lagrange interpolation and
(ii) Newton interpolation.
15.48. Let
A= 0
7 1
7 0 .
ol
[0 0 2
B ~ [H i]
(i) Find the minimal polynomial of B. Hint: Do not compute the resolvent.
(ii) Write a ring of the form IT.xEu(A) C[x]/(x - .\)m>- that is isomorphic to
C[B].
(iii) Compute the Lagrange-Hermite interpolants L.x for each .\ E a(B).
Verify that L.x = 0 mod (x - µr,. for every µ =f. .\, and that L.x =
1 mod (x - .\)m>-.
(iv) Compute the eigenprojections P.x = es( L.x) E Mn(C) by evaluating the
Lagrange-Hermite interpolants at B.
(v) Compute the eigennilpotents D.x in a similar fashion; see (15.1 7).
15.50. Given A E Mn(C), with spectrum a(A) and minimal polynomial p(x)
IT.xEu(A)(x - .\)m>-, for each.\ E a (A), let L.x be the corresponding polyno-
mials in the Lagrange decomposition with L.x = 0 mod (x - µ)m,. for every
eigenvalueµ =f. .\,and with 2.:::.xEu(A) L.x = 1 mod p(x). Use the results of this
section to show that the Drazin inverse AD lies in C[A] C Mn(C), and that
D ( (x - .\)k(- lt)
A = eA L L.x
m>.-l
L ,\k+l '
AEu(A) k=O
15.51. Use the techniques of this section (polynomial interpolation) to give a formula
for t he inverse of the discrete Fourier transform of Exercise 15.44.
Notes
Some references for the material in this chapter include [Art91 , Her96, Alu09].
Several of the applications in this chapter are from [Wik14] .
Variations of Exercise 15.42 date back to Qin Jiushao's book Shushu Jiuzhang
(Nine Sections of Mathematics), written in 1247 (see [Lib73]), but we first learned
of this problem from [Ste09].
Part V
Appendices
Foundations of
Abstract Mathematics
It has long been an axiom of mine that the little things are infinitely the most
important.
-Sir Arthur Conan Doyle
Example A.1.2. Here are some sets that we use often in this book:
(i) The set with no elements is the empty set and is denoted 0. The empty
set is unique; that is, any set with no elements must be equal to 0.
627
628 Appendix A. Foundations of Abstract Mathematics
(viii) The set JR 2 of all pairs (a , b), where a and bare any elements of R
Example A.1.3.
(i) Sets may have elements which are themselves set s. The set
A = {{1,2,3},{r,s,t,u}}
has two elements, each of which is itself a set.
(ii) The empty set may be an element of another set. The set T = {0, {0}}
has two elements, the first is t he empty set and the second is a set
B = {0} whose only element is t he empty set. If t his is confusing, it
may be helpful to think of the empty set as a bag with nothing in it,
and the set T as a bag containing two items: one empty bag and one
other bag B with an empty bag inside of B .
Example A.1.5.
(i) The integers are a subset of the rational numbers, which are a subset of
the real numbers, which are a subset of the complex numbers:
ZclQlc!RcC.
(iii) Given a set S , the power set of S is the set of all subsets of S . It is
somet imes denoted f!JJ(S) or 28 . For example, S = {a , b, c}; then the
power set is
The fact that two sets are equal if and only if they have the same elements
leads to the following elementary, but useful, way to prove that two sets are equal.
Proposition A.1.6. Sets A and B are equal if and only if both A C B and B C A.
Definition A.1. 7. We often use the following convenient shorthand for writing a
set in terms of the properties its elements must satisfy. If P is some property or
formula with a free variable x , then we write
{x ES I P(x)}
Example A.1.8.
Remark A.1.9. In many cases it is useful to use set comprehensions that do not
specify the superset from which the elements are selected, for example, {(r, s, t) I
f (r, s, t) = O}. This can be very handy in some situations, but does have some
serious potential pitfalls. First, there is a possibility for misunderstanding. But
even when the meaning seems completely clear, this notation can lead to logical
paradoxes. The most famous of these is Russell's paradox, which concerns the
comprehension R = {A I A is a set, and A tj. A}. If R itself is a set, then we have
a paradox in the question of whether R contains itself or not.
A proper treatment of these issues is beyond the scope of this appendix and
does not arise in most applications. The interested reader is encouraged to consult
one of the many standard references on set theory and logic, for example, [Hal74] .
Definition A.1.10. There are several standard operations on sets for building new
sets from old.
(i) The union of two sets A and B is
AU B = {x Ix E A or x EB}.
An B = { x Ix EA and x EB}.
(iv) If tz1 is a set of sets, the intersection of all the sets in tz1 is
n
A Ed
A= {x I x EA for every A Ed}.
Nate that writing Ac only make.s sense if the superset S is already given.
(vii) The Cartesian product (or simply the product) A x B of two .sets A and B
is the .set of all ordered pairs (a , b), where a E A and b E B:
(viii) If tz1 = [A 1 , ... , An] i.s a finite (ordered) list of sets, the product of all the
sets in tz1 i.s
n
II Ai= IlAi={(x1, ... ,xn) \xiEAiforeachiE{l,2, . . . ,n}} .
A., Ed i=l
(A.I)
and
(AnB)c = AcuBc.
More generally, if tz1 is a set of sets, then
(
LJ
AEd
A
'
)
c
=
(
n
AEd
Ac
)
and
(
n
AEd
A\
)
c
=
(
LJ
AEd
Ac )
.
A.1. Sets and Relations 631
Proof. We will prove (A.1). The proofs of the rest of the laws are similar. By
definition, x E (AU B)c if and only if x E S and x '/.AU B , which holds if and only
if x '/.A and x '/. B. But this is the definition of x E Ac n Be. D
Example A.1.13.
(i) The less-than symbol defines a relation on IR, namely, the subset L =
{(x,y) E IR 2 Ix< y} C !R 2 . So we have x < y if (x,y) EL.
{(a, b) I a divides b} C Z x Z.
(v) Let F be the set of formal symbols of the form a/b , where a, b E Zand
b i= 0. More precisely, we have F = Z x (Z '\. 0). Define a relation ,.....,
on F by a/b ,....., c/d if ad = be. This is the usual relation for equality
of fractions.
Unexample A.1.15.
Example A.1.16.
Example A.1.18.
(i) Fix any integer n, and let = be the equivalence relation = mod n of
Example A.l.13(iv). The equivalence classes are
That is, for each i E Z, we have [[i]] = {x E Z : nl(x - i)}, and there
are precisely n distinct equivalence classes. We denote this set of n
equivalence classes by Zn.
As a special case, if we take n = 2, then the two equivalence classes of
Z2 are the even integers and the odd integers.
(ii) If the equivalence relat ion on S is equality (= ), then the equivalence
classes are just the singleton sets [[x]] = {x }.
(iii) Consider the equivalence relation on <C given by x rv y if 1-'.l:I = IYI· For
each nonnegative real number r there is exactly one equivalence class
[[r]], consisting of the circle [[r]] = {z E <C : lzl = r }. Every complex
number z lies in one of the equivalence classes [[r]], and no two distinct
nonnegative real numbers lie in the same equivalence class.
(iv) For the relation of Example A.l.13(v) the equivalence class of a given
element a/b consists of sets of all equivalent fractions. For example, we
have [[1/2]] = {1 / 2, 2/4, 3/6 , -1/ - 2, 57 /11 4,. .. }.
Proof. Items (i) and (ii) follow immediately from the definition of an equivalence
relation. To prove item (iii), note that for any a E [[x]] we have a,..__, x and y ,. __, x, so
by symmetry and transitivity a rv y. But because y E [[z]], we also have y rv z and
hence a rv z, which shows that a E [[z]]. This shows that [[x]] C [[z]]. An essentially
identical argument shows that [[z]] c [[x]], and hence that ITxJ] = [[z]. D
Example A.1.21.
(i) The integers have a partition into the even and odd integers. The
partition is d = {E,O} with E = {... ,-4,-2, 0,2,4, ... } and 0 =
{... ,-3,-1,1,3, 5, ... ,}. This is a partition because Z = EUO and
Eno= 0.
(ii) Taking d = { S} gives a (not very interesting) partition of S into just
one set- itself.
(iii) If cv is any equivalence relation on a set S, then the equivalence classes
of cv define a partition of S by d = {[[x]] [ x E S}. The first condition,
that UAE d A = S, follows immediately from Proposition A.1.19(i) . The
fact that the equivalence classes are disjoint is Proposition A.1.19(iii).
that y is an element of the ,. . ., equivalence class [[xlJ~ of x, but y is not in the ::::
equivalence class [[x]h:: of x . Thus the corresponding partitions are not equal. D
A.2 Functions
Functions are a fundamental notion in mathematics. The formal definition of a
function is given in terms of what is often called the graph of the function.
The following proposition is immediate from the definition but still very useful.
Proposition A.2.3. Two functions l, g are equal if and only if they have the same
domain D and for each x ED we have l(x) = g(x).
(go f)(x)
x y z
g h
~
~
x y x y x y
Nota Bene A. 2.8. Many people use the phrase one-to-one to mean injec-
tive and onto to mean surjective. But the phrase one-to-one is misleading
and tends to make students mistakenly think of the uniqueness condition for
a function rather than the correct meaning of injective. Moreover, most stu-
dents who have encountered the phrase one-to-one before have heard various
intuitive definitions for this phrase that are too sloppy to actually use for
proofs and that tend to get them into logical trouble. We urge readers to
carefully avoid the use of the phrases one-to-one and onto as well as avoid
all the intuitive definitions you may have heard for those phrases. Instead,
we recommend you to use only the terms injective and surjective and their
formal definitions.
Proof.
(i) If x,x' EX and g(f(x)) =(go f)(x) =(go f)(x') = g(f(x')), then since g
is injective, we have that f(x) == f(x'), but since f is injective, we must have
x=x'.
(iv) If x, x' E X and f(x) = f(x'), then (go f)(x) = g(f(x)) g(f(x'))
(g o !) (x'). Since g o f is injective, we must have that x = x'.
f -B
A --
!' g
g'
C --- D
R u S
~ T
commutes if w o u = v .
A.2. Functions 639
Definition A.2.11. Given any set I (called the index set), and given any list of
sets d = [Aa I a E I ] indexed by I , we define the Cartesian product of the sets in
d to be the set
Remark A.2.12. Note that the list d is not necessarily a set itself, because we
permit duplicate elements. That is, we wish to allow constructions like S x S, which
corresponds to the list [S, SJ, whereas as a set this list would have only one element
{S, S} = {S}.
Example A.2.13.
(ii) Following the outline of the previous example, one can show that if the
index set is finite, then the new definition of a product is equivalent to
our earlier definition for a Cartesian product of a finite number of sets .
Cartesian products have natural maps called projections mapping out of them.
T - - --Ai
Note that the same map f makes the above diagram commute for all i EI.
640 Append ix A. Foundations of Abstract Mathematics
Proof. For each t ET, let f(t) E TijEI Aj be given by f(t)(j) = qJ(t) E Aj· Clearly
f(t) E TIJEI AJ· Moreover, (Pio f)(t) == f(t)(i) = qi(t), as desired. Finally, to show
uniqueness, note that if there is another g : T ---+ TijEI Aj with (Pio g)(t) = qi(t)
for every i E I and every t E T , then for each t E T and for every i E J, we have
g(t)(i) = qi(t) = f(t)(i); so g(t) = f(t) for every t ET. Hence g = f. D
In this case f is well defined, because any time [a]] = [[a']] we have a'= a (mod 4),
which implies that 41 (a' - a) and hence 2l(a' -a), so the equivalence classes (mod 2)
are the same: [[a]h = [[a'Jl 2.
Whenever we wish to define a function f whose domain is a set of equivalence
classes, if the definition is given in terms of specific representatives of the equivalence
classes, then we must check that the function is well defined. That means we must
check that using two different representatives a and a' of the same equivalence class
[[a]] = [[a']] gives f ([[a]]) = f ([[a']]).
Example A.2.16.
(i) For any n E Z, define a function EEi: Zn xZn --t Zn by [[a]]EEl[[b]] = [[a+b]].
To check that it is well defined , we must check that if [[a] = [[a']] and
[[b]] = [[b']], then [a+ b]] = [[a'+ b']] . We have a' = a+ kn and b' = b + fn
for some k, f E Z, so a' +b' = a + b+kn+Cn, and thus [[a' +b']] = [[a+b]],
as required.
A.2.5 Inverses
Definition A.2.18. Let f : X -7 Y be a function.
(i) The function f is right invertible if there exists a right inverse, that is, if
there exists g : Y -7 X such that fog= Jy, where ly : Y -7 Y denotes the
identity map on Y.
(ii) The function f is left invertible if there exists a left inverse, that is, if there
exists g : Y -7 X such that go f = Ix, where Ix : X -7 X denotes the identity
map on X.
Proof.
(i) (==?)Let y E Y, and let g: Y -7 X be a right inverse off. Define x = g(y).
This implies f(x) = f(g(y)) = y. Hence, f is surjective.
(~)Since f is surjective, each set in f!8 =
1
u-
1(y)}yEY is nonempty. By the
axiom of choice, there exists <P : f!8 -7 UyEY f- (y) = X such that ¢(!- 1(y)) E
f- 1(y) for each y E Y. Define g : Y -7 X as g(y) = ¢(f- 1(y)), so that
f(g(y)) = y for each y E Y.
(ii) (==>)If f(x1) = f(x2), then g(f(x1)) = g(f(x2)), which implies that x 1 = x 2.
Hence, f is injective.
( ~) Choose Xo EX. Define g: Y -7 X by
This is well defined since l is injective (f(x 1) = l(x2) implies x 1 = x 2). Thus,
g(f(x)) = x for each x EX.
(iii) ( ==?) Since f is bijective, there exists both a left inverse and a right inverse.
Let 1-L be any left inverse, and let 1-R be any right inverse. For all y E Y
A.3. Orderings 643
Proof.
(i) Since g is a bijection, we have that g- 1 : X-+ Y exists, and thus, g- 1 (g(y)) =
y for ally E Y. Thus, for all x EX, we have that g- 1 (g(f(x))) = f(x) and
g- l(g(f(x))) = g-l(x).
(ii) Since g is a bijection, we have that g- 1 : X -+ Y exists, and thus, g(g- 1 (x)) =
x for all x EX. Thus, for all x EX, we have that f(g(g - 1 (x))) = f(x) and
f(g(g- 1 (x))) = g- 1 (x) . D
Example A.2.22. The fundamental theorem of calculus says that the map
Int: cn- 1 ([a, b]; IF)-+ cn([a, b]; IF) given by Int f = I: f(t) dt is a right inverse
of the map D = lx: cn([a, b];IF)-+ cn - 1 ([a, b];IF) because Do Int= Idcn- 1
is the identity. This shows that D is surjective and Int is injective. Because
D is not injective, it does not have a left inverse. But it has infinitely many
right inverses: for any constant C E IF the map Int +C is also a right inverse
to D.
A.3 Orderings
A.3.1 Total Orderings
Definition A.3.1. A relation ( <) on the set S is a total ordering (and the set is
called totally ordered) if for all x, y, z E S, we have the following:
(i) Trichotomy: If x, y E S, then exactly one of the following relations holds:
x < y, x = y, y < x.
The relation x :::; y means that x < y or x = y and is the negation of y < x. A total
ordering is also often called a linear ordering.
Definition A.3.2. A set S with a total ordering < is well ordered if every
nonempty subset X c S has a least element; that is, an element x E X such
that x:::; y for every y EX.
By the WOA, if Sis not empty, there is a least element s E S. Ifs is prime,
then s divides itself, and the claim is satisfied. If s is not prime, then it must
have a divisor d with 1 < d < s. Since d €f_ S, there is a prime p that divides d,
but now p also divides s, since p divides a divisor of s. This is a contradiction,
so S must be empty.
Remark A.3.5. Note that the set Z of all integers is not well ordered, since it does
not have a least element . However, for any n E Z, the set z>n = {x E Z Ix> n}
is well ordered, since we can define a bijective map f : z>n -7 N to the natural
numbers f(x) = x - n - 1, which preserves ordering. In particular, the positive
integers ;t:;>O = z+ are well ordered.
A. 3.2 Induction
The principle of mathematical induction provides an extremely powerful way to
prove many theorems. It follows from the WOA of the natural numbers.
The first step-P holds for 1-is called the base case, and the second step is
called the inductive step. The statement P holds for the number k is called the
induction hypothesis.
A.3. Orderings 645
Proof. Assume that both the base case and the inductive step have been verified
for the property P. Let S = { x E z+ I P does not hold for x} be the set of all
positive integers for which the property P does not hold. If S is not empty, then
by the WOA it must have a least element s E S. By the base case of the induction,
we know thats> l. Let k = s - l. Since k < s, we have that P holds fork. Since
k > 0, the inductive step implies that P also holds for k + 1 = s, a contradiction.
Therefore S is empty, and P holds for all n E z+ . D
Example A.3. 7.
(ii) We use induction to prove for any integer n > 0 that I:: 1 i = (nil)n .
The base case I:i=i i = 1 is immediate. Now, given the inductive hy-
pothesis that I:7=1 i = (kil)k, we have I:7~} i = (k + 1) + I:7=l i =
(k + 1) + (k i l )k = (k+I)2(k+ 2), as required. So the claim holds for all
positive integers n.
Corollary A.3.8. The principle of induction can be applied to any set of the form
N +a= {x E Z Ix:'.'.'. a}
by starting the base case at a . That is, a property P holds for all of N +a if
(i) P holds for a;
(ii) for every k E N +a, if P holds fork, then P holds fork+ 1.
Proof. Let P'(n) be the property P(n -1 +a). Application of the usual induction
principle to P' is equivalent to our new induction principle for P on T. D
Definition A.3.10. An ordered set is said to have the least upper bound property,
or l.u.b. property, if every nonempty subset that is bounded above has a least upper
bound (supremum).
Example A.3.11.
(i) The set Z of integers has the 1. u.b. property. To see this, let S c Z be any
subset that is bounded above, and let T = {x E Z I x > s Ifs E S} be the
set of all upper bounds for S. Because Tis not empty (Sis bounded),
the WOA guarantees that T has a least element t. The element t is an
l.u.b. for S.
(ii) The set Q of rational numbers does not have the l.u.b. property; for
example, the set E = {x E <Q) I x 2 < 2} C Q does not have an l.u.b. in
Q- but the l.u.b. for E does exist in JR, namely, v'2.
(iii) The real numbers JR have t he l.u.b. property. We take this as an axiom
of the real numbers.
Theorem A.3.12. Let S be an ordered set with the l.u.b. property. If a nonempty
set E c S is bounded below, then its i:nfimum exists in S .
Proof. Assume that E c Sis nonempty and bounded below. Let L denote the set
of all lower bounds of E, which is nonempty by hypothesis. Since E is nonempty,
then any given x E E is an upper bound for L, and thus a = sup L exists in S. We
claim that a= inf E. Suppose not. Then either a is not a lower bound of E or a is
a lower bound, but not the greatest lower bound. If the former is true, then there
exists x E E such that x < a, but this contradicts a as the l.u.b. of L since x would
also be an upper bound of L. If the latter is true, then there exists c E S such that
a< c and c is a lower bound of E. This is a contradiction because c EL, and thus
a cannot be the l.u.b. of L. Thus, a== inf E . D
Theorem A.4.1 (Axiom of Choice). Let I be a set, and let .sz1 = {Xa}aEJ be a
set of nonempty sets indexed by I. There is a function .f : I---+ UaEJ Xa such that
f(a) E Xa for all a EI.
The axiom of choice has two other useful equivalent formulations-the well-
ordering principle and Zorn's lemma.
Remark A.4.3. It is important to note that the choice of ordering may well differ
from the natural order on the set. For example, JR has a natural choice of total order
on it, but it is not true that JR with the usual ordering is well ordered-the subset
(0, 1) has no least element . The ordering that makes JR well ordered is entirely
different, and it is not at all obvious what that ordering is.
Theorem A.4.4 (Zorn's Lemma). Let (X, :S) be a partially ordered set. If every
chain in X has an upper bound in X , then X contains a maximal element.
We do not prove this theorem here, as it would take us too far afield. The
interested reader may consult [Hal74].
Definition A.4.6. Let (X, :::;) be a well-ordered set. For each y E X , define
s(y) = {x EX Ix< y}, and define s(y) to be s(y)Uy . We say that a subset S <:;:;; X
is a segment of X if either S = X or there exists y EX such that S = s(y). If the
complement s(y)c = X"' s(y) is nonempty, we call the least element of s(y)c the
immediate successor of y.
Example A.4. 7.
A.5 Cardinality
It is often useful to think about the size of a set-we call this the cardinality of
the set. In the case that the set is finite, the cardinality is simply the number of
elements in the set. In the case that the set is infinite, cardinality is more subtle.
Definition A.5.1. We say that two sets A and B have the same cardinality if
there exists a bijection f : A -+ B.
Example A.5.3.
(i) The cardinality of the even integers 2/Z is the same as that of Z. This
follows from the bijection g : Z -+ 2/Z given by g(n) = 2n, which is
bijective because it has a (left and right) inverse g- 1 : 2/Z-+ Z given by
g-l(m) = I[f.
(ii) The cardinality of the set z+ of positive integers is the same as the
cardinality of the set of integers (it has cardinality N0 ). To see this , use
the function h: z+ -+ Z given by f(2n) = n and f(2n + 1) = - n. T his
function is clearly bijective, so the cardinality of z+ is N0 .
A.5. Cardinality 649
1/ 1 2/ 1 3/ 1 4/ 1
1/ 2 2/ 2 3/ 2 4/2
1/ 3 2/ 3 3/ 3 4/ 3
///
[Im 2/ 2 I3/ 2 I 4/ 2
l/ / 3/ 3 4/ 3
(v) The real numbers in the interval (0, 1) are not countable, as can be seen
from the following sketch of Cantor's diagonal argument. If the set (0, 1)
were countable, there would have to exist a bijection c : z+ -+ (0, 1).
Listing the decimal expansion of each real number, in order, we have
We claim that 'l/; is a bijective map onto {l , 2, . .. , n-1} . First, for any x E (B \_ b),
if ¢ (x) i= n, then 'l/;(x) i= n, while if ¢(x) = n, then since x i= b, we have 'l/;(x) =
¢(b) i= ¢(x) = n . So 'l/; is a well-defined map 'l/; : (B \_ b)-+ {1, 2, ... , n - 1}.
For injectivity, consider x, y E (B \_ b) with 'l/;(x) = 'l/; (y) . If ¢(x) i= n and
¢ (y) i= n, then ¢(x) = 'l/;(x) = 'l/;(y) = ¢(y), so by injectivity of¢, we have x = y. If
¢(x) i= n but ¢ (y) = n, then ¢(x) = 'ljJ(x) = 'l/;(y) = ¢(b), so x = b, a contradiction.
Finally, if ¢(x) = n = ¢(y), then by injectivity of¢ we have x = y.
For surjectivity, consider any k E {l, 2, ... , n - 1}. By surjectivity of¢ there
exists an x E B such that ¢(x) = k . If xi= b, then since k i= n, we have 'l/;(x) =
¢(x) = k. But if x = b, then there exists also a y E B \_ b such that ¢(y) = n, so
'l/;(y) = ¢(b) = k. D
Since any injective map gives a bijection onto its image, we have, as an im-
mediate consequence, the following corollary.
Corollary A.5.6. If A and B are two finite sets and f : A ---+ B is injective, then
IAI ::; IBI.
The previous corollary inspires the following definition for comparing cardi-
nality of arbitrary sets.
Definition A.5.7. For any two sets A and B , we say IAI ::; IBI if there exists an
injection f : A---+ B, and we say IAI < IBI if there is an injection f : A---+ B, but
there is no surjection from A to B.
Example A.5.8.
(i) ~o < !JR.I because there is an obvious injection Z ---+ JR, but by
Example A.5.3(v), there is no surjection z+ ---+ (0, 1) and hence no sur-
jection Z ---+ R
(ii) For every nonnegative integer n we haven < ~o because there is an obvi-
ous injection { 1, 2, ... , n }---+Z, but there is no surjection {l, 2, ... , n }---+Z.
We conclude with two corollaries that follow immediately from our previous
discussion, but which are extremely useful.
Corollary A.5.10. If A and B are finite sets of the same cardinality, then any
map f : A ---+ B is injective if and only if it is also surjective.
Remark A.5.11. The previous corollary is not true for infinite sets, as can be seen
from the case of 2Z c Z and also Z C Q. The inclusions are injective maps that
are not surjective, despite the fact that these sets all have the same cardinality.
652 Appendix A Foundations of Abstract Mathematics
Corollary A.5.12 (Pigeonhole Principle). If IAI > IBI, then given any function
f: A--+ B, there must be at least two elements a, a' EA such that f(a) = f(a').
This last corollary gets its name from the example where A is a set of pigeons
and B is a set of pigeonholes. If there are more pigeons than pigeonholes, then
at least two pigeons must share a pigeonhole. This "principle" is used in many
counting arguments.
The Complex
Numbers and Other
Fields
For every complex problem there is an answer that is clear, simple, and wrong.
-H. L. Mencken
In this appendix we briefly review the fundamental properties of the field of complex
numbers and also general fields.
653
654 Appendix B The Complex Numbers and Other Fields
a-bi
z-
1
== zizl- 2 = a2 + b2.
ez = "zk! .
00
L,,
k
k=O
B.1. Complex Numbers 655
3i
2i
- 1 1 2 3
- i
, -? z+w
'' I
I
z
I
I
'W
One of the most important identities for complex numbers is Euler's formula (see
Proposition 11.2.12):
z:w = (rp)ei(s+t)
/ w = peis
s+ts / z = reit
t
Graphical Representation
The complex numbers have a very useful graphical representation as points in the
plane, where we associate the complex number z = a+ bi with the point (a, b) E JR 2 .
See Figure B.l for an illustration. In this representation real numbers lie along the
x-axis and imaginary numbers lie along the y-axis. The modulus lzl of z is the
distance from the origin to z in the plane, and the complex conjugate z is the image
of z under a reflection through the x-axis.
Addition of complex numbers is just the same as vector addition in the plane;
so, geometrically, the complex number z + w is the point in the plane corresponding
to the far corner of the parallelogram whose other corners are 0, z, and w . See
Figure B. 2.
We can represent any point in the plane in polar form as z = r( cos( B) +i sin( B))
e
for some E [O, 27r ) and some r E JR with r ~ 0. Combining this with Euler's formula
means that we can write every complex number in the form z = reilJ . In this form
we have
Multiplication of two complex numbers in polar form multiplies the moduli and
adds the angles; see Figure B.3.
Similarly, z- 1 = zlzl - 2 = re - itr - 2 = r- 1 e- it, so the multiplicative inverse
changes the sign of the angle (t H - t) and inverts the modulus (r H r - 1 ). But
the complex conjugate leaves the modulus unchanged and changes the sign of the
angle; see F igure B.4.
B.1. Complex Numbers 657
(a) (b)
Figure B.4. Graphical representation of multiplicative inverse (a) and
complex conjugate (b). The multiplicative inverse of a complex number changes the
sign of the polar angle and inverts the modulus. The complex conjugate also changes
the sign of the polar angle, but leaves the modulus unchanged.
The nth roots of unity are uniformly distributed around the unit circle, so
their average is O. The next proposition makes that precise.
658 Appendix B_ The Compl ex Numbers and Other Fields
W3 = e21ri/3
1 1
Figure B.5. Plots of all the 3rd (on the left) and 10th (on the right) roots
of unity. The roots are uniformly distributed around the unit circle, so their sum
is 0.
Proof. The sum L:;;:-0 (w~)£ is a geometric series, so if k ¢- 0 (mod n), we have
1
"°'
n-1
wk£
L.., n
(wk)n _ l
= _ n___ =
wk - 1
(wn)k _ 1
n
wk - 1
=0
·
£=0 n n
nl~ wn n1~ 1
L..,
£=0
k/l,
= L..,
£=0
= 1. D
We conclude this section with a simple observation that turns out to be very
powerful. The proof is immediate.
(B.5)
Vista B .1.8. The relation (B.5) is the foundation of the fast Fourier trans-
form (FFT). We discuss the FFT in Volume 2.
B.2 Fields 659
8.2 Fields
B.2.1 Axioms and Basic Properties of Fields
The real numbers JR and the complex numbers re are two examples of an important
structure called a field. Although we only defined vector spaces with the scalars
in the fields JR or re, you can make the same definitions for scalars from any field,
and all of the results about vector spaces, linear transformations, and matrices in
Chapters 1 and 2 still hold (but not necessarily the results of Chapters 3 and 4).
Definition B.2.1. A field F is a set with two binary operations, addition (a, b) ->
a + b and multiplication (a, b) -t ab, satisfying the following properties for all
a,b, c E F:
Example B.2.2. The most important examples of fields for our purposes are
the real numbers lR and the complex numbers re. The fact that re is a field
is the substance of Proposition B.1.4. It is easy to verify that the rational
numbers IQ also form a field.
Unexample B.2 .3. The integers Z do not form a field , because most nonzero
elements of Z have no multiplicative inverse. For example, the multiplicative
inverse (2- 1 ) of 2 is not in Z.
The nonnegative real numbers [O, oo] do not form a field because not all
elements of [O, oo] have an additive inverse in [O, oo] . For example, the additive
inverse -1of1 is not in [O, oo].
660 Appendix B. The Complex Numbers and Ot her Fields
Propos ition B.2.4. The additive and multiplicative identities of a field F are
unique, as are the additive and multiplicative inverses for a given element a E F.
Moreover, for any x, y, z E F the following properties hold:
(i) 0 . x = 0 = x . o.
Unexample B.2.5. The set Z4 = {O, 1, 2, 3} with modular addition and mul-
tiplication is not a field because the element 2 has no multiplicative inverse.
To see this, assume by way of contradiction that some element a is the mul-
tiplicative inverse of 2. In this case we have
B.2. Fields 661
2 = (a · 2) · 2 = a· (2 · 2) = a· 0 = 0,
a contradiction.
In fact , if there is a d "/=. 0, 1 (mod n) which divides n, a very similar
argument shows that Zn is not a field, because d "/=. 0 (mod 0) has no multi-
plicative inverse.
A = r~~~ ~~~
am1 am2
with m rows and n columns, where each entry aij is an element oflF. We also write
A= [ai1] to indicate that the (i,j) entry of A is aiJ · We denote the set of m x n
matrices over lF by Mmxn(lF).
Matrices are important because they give us a nice way to write down linear
transformations explicitly (see Section 2.4). Composition of linear transformations
corresponds to matrix multiplication (see Section 2.5 .1).
663
664 Appendix C. Topics in Matrix Analysis
and
Proof. If x = ei is the ith standard basis vector, then Aei is the ith column of A.
Since this is zero for every i, the entire matrix A must be zero. D
C.1.3 Inverses
Proof. We have
and
A=[~ ~].
Here the number of rows in B and C are equal, the number of rows in D and E are
equal, the number of columns in B and D are equal, and the number of columns in
C and E are equal. There is no requirement that A, B , C , D , or Ebe square.
If for every pair (Aik, Bkj) the number of columns of Ajk equals the number of
rows of Bkj, then the product of the matrices is formed in a manner similar to that
of regular matrix multiplication. In fact, the (i, j) block of the matrix is equal to
2=:= 1 AikBkj·
and B = [Bn
B21
Block matrix multiplication is especially useful when there are patterns (usually
involving zeros or the identity in the matrices to be multiplied).
and
_1 [ (A - BD- 1c)- 1 - (A - BD- 1 C)- 1 BD- 1 ]
M = -D - 1 C(A - BD- 1 C)- 1 D- + D- 1c(A - BD- 1C)- 1BD - 1 . (C. 5)
1
The matrices A- BD- 1C and D - CA- 1B are called the Schur complements of A
and D, respectively.
Proof. These identities can be verified by multiplying (C.3), (C.4), and (C.5),
respectively. 0
Proof. This follows by equating the upper left blocks of (C.4) and (C.5). D
C.3 . Cross Products 667
so we have
where det (A) is t he determinant of A; see Sections 2.8 and 2.9. Geometrically,
I det(x, y , z) I is the volume of the parallelepiped having x , y, z as three of the sides;
see Remark 8.7.6.
(v) ll x x Yll is the area of the parallelogram having x and y as two of the sides.
(vi) The cross product ofx = (xi,x2,x3) with y = (yi,y2,y3) can be computed as
Proof. Since we may factor <I> into elementary matrices, it suffices to show that
(C.6) holds when <I> is elementary (see Section 2.7.1). If <I> is type I, then the
determinant is -1, and we have x' = y and y' = x, so the desired result follows
from (i) in Proposition C.3.1. If <I> is type II, corresponding to multiplication of the
first row by a, then det( <I>) = a, and it is easy to see that x' x y' = a(x x y). Finally,
consider the case where <I> is type III, corresponding to adding a scalar multiple of
one row to the other row . Let us assume first that <I> = ( 6~). We have x' = x + ay
and y' = y. We have det(<I>) = 1 and
669
Bibliography
[AB66] Edgar Asplund and Lutz Bungart. A First Course in Integration. Holt,
Rinehart and Winston, New York, Toronto, London, 1966. [360]
[Art91] Michael Artin. Algebra. Prentice-Hall, Englewood Cliffs, NJ, 1991. [624]
[BL06] Kurt Bryan and Tanya Leise. The $25,000,000,000 eigenvector: Thelin-
ear algebra behind Google. SIAM Rev., 48(3):569-581, 2006. [517]
671
672 Bibliography
[BroOO] Andrew Browder. Topology in the complex plane. Amer. Math. Monthly,
107(5):393-401, 2000. [406]
[CB09] Ruel V. Churchill and James Ward Brown. Complex Variables and Ap-
plications. McGraw-Hill, New York, eighth edition, 2009. [456]
[Con16] Keith Conrad. Differentiation under the integral sign. http: I /www.
mat h.uconn . edu/-kconrad/blurbs/analysis/diffunderint . pdf ,
2016. Last accessed 5 April 2017. [360]
[Dem97] James W. Demmel. Applied Numerical Linear Algebra. Society for In-
dustrial and Applied Mathematics, Philadelphia, PA, 1997. [315]
[Dri04] Bruce K. Driver. Analysis tools with examples. http : I /www. math. ucsd.
edu;-bdri ver /DRIVER/Book/ anal. pdf, 2004. Last accessed 5 April
2017. [380]
[DSlO] Doug Smith, Maurice Eggen, and Richard St. Andre. A Transition to
Advanced Mathematics. Brooks Cole, seventh edition, 2010. [627]
[Gre97] Anne Greenbaum. Iterative Methods for Solving Linear Systems. Vol-
ume 17 of Frontiers in Applied Mathematics. Society for Industrial and
Applied Mathematics, Philadelphia, PA, 1997. [524]
[GVL13] Gene H. Golub and Charles F. Van Loan. Matrix Computations. Johns
Hopkins Studies in the Mathematical Sciences. Johns Hopkins University
Press, Baltimore, MD, fourth edition, 2013. [315, 551]
[Hal07a] Thomas C. Hales. The Jordan curve theorem, formally and informally.
Amer. Math. Monthly, 114(10):882- 894, 2007. [406]
[Hal07b] Thomas C. Hales. Jordan's proof of the Jordan curve theorem. Studies
in Logic, Grammar, and Rhetoric, 10(23):45-60, 2007. [406]
674 Bibliography
[HH99) John Hamal Hubbard and Barbara Burke Hubbard. Vector Calculus,
Linear Algebra, and Differential Forms: A Unified Approach. Prentice-
Hall, Upper Saddle River, NJ, 1999. [406]
[HRW12] Jeffrey Humpherys, Preston Redd, and Jeremy West. A fresh look at the
Kalman filter. SIAM Rev., 54(4) :801- 823, 2012. [286]
[HS75) Edwin Hewitt and Karl Stromberg. Real and Abstract Analysis.
A Modern Treatment of the Theory of Functions of a Real Vari-
able. Volume 25 of Graduate Texts in Mathematics, Springer-Verlag,
New York-Heidelberg, 1975. [201)
[IM98) Ilse C. F. Ipsen and Carl D. Meyer. The idea behind Krylov methods.
Amer. Math. Monthly, 105(10):889- 899, 1998. [551)
[JL97) Erxiong Jiang and Peteir C. B. Lam. An upper bound for the
spectral condition number of a diagonalizable matrix. Linear Algebra
Appl., 262:165-178, 1997. [572]
[Jon93) Frank Jones. Lebesgue Integration on Euclidean Space. Jones and Bartlett
Publishers, Boston, MA, 1993. [360)
[Kat95] Tosio Kato. Perturbation Theory for Linear Operators. Classics in Math-
ematics. Reprint of the 1980 edition. Springer-Verlag, Berlin, 1995. [516]
[Lay02] D.C. Lay. Linear Algebra and Its Applications. Pearson Education, 2002.
[30, 85]
Bibliography 675
[LM06] Amy N. Langville and Carl D. Meyer. Google's PageRank and Beyond:
The Science of Search Engine Rankings. Princeton University Press,
Princeton, NJ, 2006. [517]
[MacOO] C.R. MacCluer. The many proofs and applications of Perron's theorem.
SIAM Rev., 42(3):487-498, 2000. [517]
[Mae84] Ryuji Maehara. The Jordan curve theorem via the Brouwer fixed point
theorem. Amer. Math. Monthly, 91(10):641-643, 1984. [406]
[Mikl4] P. Mikusinski. Integrals with values in Banach spaces and locally convex
spaces. ArXiv e-prints, March 2014. [360]
[NJN98] Gail Nord, David Jabon, and John Nord. The global positioning system
and the implicit function theorem. SIAM Rev., 40(3):692-696, 1998. [301]
[OS06] Peter J. Olver and Chehrzad Shakiban. Applied Linear Algebra. Pearson
Prentice-Hall, Upper Saddle River, NJ, 2006. [30, 85]
[RT92] Satish C. Reddy and Lloyd N. Trefethen. Stability of the method of lines.
Numer. Math ., 62(2):235-267, 1992. [565]
[Rud87] Walter Rudin. Real and Complex Analysis. McGraw-Hill, New York,
third edition, 1987. [360]
[Rud91] Walter Rudin. Functional Acnalysis. International Series in Pure and Ap-
plied Mathematics. McGraw-Hill, New York, second edition, 1991. [137,
176]
[Sch12] Konrad Schmiidgen. Unbounded Self-adjoint Operators on Hilbert Space.
Volume 265 of Graduate Texts in Mathematics. Springer, Dordrecht,
2012. [470]
[Soh14] Houshang H. Sohrab. Basic Real Analysis. Birkhiiuser/Springer,
New York, second edition, 2014. [360]
[Str80] Gilbert Strang. Linear Algebra and Its Applications. Academic Press
[Harcourt Brace Jovanovich, Publishers], New York, London, second
edition, 1980. [30 , 85]
[Str93] Gilbert Strang. The fundamental theorem of linear algebra. Amer. Math.
Monthly, 100(9):848- 855, 1993. [163]
679
680 Index
in a basis, 47 differentiable
polar, 352 complex function, 408
spherical, 354 continuously, 253
coset, 23, 598 function, 241, 246, 252
operations, 24, 599 dimension, 18
countable, 648 formula, 46
cover direct sum, 14, 16
open, 203 Dirichlet function , 332
Cramer's rule, 77, 217 divides , 585
cross product, 667 division property, 583
CRT, see Chinese remainder theorem domain
curl, 402 Euclidean, 583
curve of a function, 635
differentiable, 242 dominated convergence theorem, 342
fitting, 129 dot product, 89
piecewise-smooth, 382 Drazin inverse, 500, 501
positively oriented, 398 dual space, 248
simple closed, 381
smooth, 382 Eckart- Young, Schmidt, Mirsky
smooth parametrized, 381 theorem, 167
smooth, oriented, 382 EEA, see extended Euclidean algorithm,
smooth, unoriented, 382 587
cutoff phenomenon, 564 eigenbasis, 151
eigennilpotent, 483
Daniell integral, 320, 328 eigenprojection, 463, 475
data compression, 168 eigenspace, 140
De Morgan's Laws, 630 generalized, 465, 468, 486
decay matrix, 564 eigenvalue, 140
decomposition semisimple, 492
LU, 62 simple, 307, 496
polar, 165 eigenvector
QR,103 generalized, 468
singular value, 162 elementary
Wedderburn, 500 matrix, 59
dense, 190 product, 68
derivative, 242 empty set, 627
directional, 244 equivalence
higher, 266 modulo a subspace, 22
linearity, 256 class, 22, 598, 632
of a complex function , 408 modulo n, 21, 631 , 633
of a parametrized curve , 242 modulo an ideal, 598
second, 266 relation, 598, 631
determinant, 65 Euclidean
De Moivre's formula, 655 algorithm, 586
diagonal matrix, 162 domain, 573, 583
diagonalizable, 151 extended algorithm, 587
orthonormally, 157 Euler's formula, 411, 415, 655
diffeomorphism, 349 extension by zero, 325
682 Index
orthogonal, 91 Perron
complement, 105, 123 root, 496
projection, 93, 96 theorem, 496
orthonormal, 87 Perron- Frobenius
matrix, 98, 132 eigenvalue, 496
set, 95 theorem, 497
transformation, 97 piecewise smooth, 382, 416
orthonormally pigeonhole principle, 652
diagonalizable, 157, 158 polar decomposition, 165
similar, 155 polarization identity, 131
outer product expansion, pole of a function, 439
164, 174 polygonal path, 421
overdetermined, 127 polynomial
monic, 143, 586
PageRank algorithm, 498 ring, 575
parallelogram identity, 131 poset, see partially ordered set
parametrized
positive
contour, 416
definite, 159
curve
semidefinite, 159
equivalent, 382
potential function, 388
smooth, 381
power
manifold, 389
method, 492
equivalent, 389
set, 628
measure of, 394
preimage, 186, 635
oriented, 389
preimage of a function, 187
tangent space, 390
prime, 588
unoriented, 390
surface, 389 primitive
parametrized manifold, 381 matrix, 497
partial primitive root of unity, 657
derivative, 245 principal ideal domain
kth-order, 266 Euclidean domain is a, 585
in a Banach space, projection, 136, 460, 639
255 canonical, 597
ordering, 646 complementary, 460
sums, 212 map, 33
partially ordered set, 646 spectral, 475
partition, 22, 598, 633 pseudoeigenvalue, 554
path, 416 pseudoeigenvector, 555
connected, 226 pseudoinverse
independent, 420 Drazin, 500, 501
periodic function, 236 Moore- Penrose, 166
permutation, 66 pseudospect ral
even, 67 radius, 561
inversion, 67 pseudospect rum, 554
odd, 67 Pythagorean
sign, 67 law, 92
transposition, 67 theorem, 97
Index 687
QR ring, 574
decomposition, 102, 103 commutative, 575, 576
decomposition, reduced, 103 homomorphism, 592
iteration, 538 of polynomials in a matrix , 576
quadratic convergence, 286 Ritz eigenvalues, 547
quasi-Newton method, 290 root
quotient of unity, 657
of a ring by an ideal, 598 simple, 304
of a vector space by a subspace, rotation map, 34
23 Rouche's theorem, 448
row
R-linear combination, 581 echelon form, 61
radius echelon form, reduced, 61
pseudospectral, 561 operations, 58
radius of convergence, 414 reduction, 58
range, 35 RREF, see row echelon form, reduced
rank, 43 Russell's paradox, 629
rank-nullity theorem, 43
Rayleigh quotient , 174 scalar, 4
reduced row echelon form, 61 Schmidt, Mirsky, Eckart-Young
REF, see row echelon form theorem, 167, 168
reflection, 172 Schur
regulated complements, 666
integrable functions, 324 form, 140, 156
integral lemma, 155
multivariable, Banach valued, second
324 barycentric form, 611
single variable, Banach valued , derivative, 266
228 isomorphism theorem, 45
relation, 631 segment, 648
relative condition number, 303 self-adjoint, 155
relatively prime, 586 seminorm, 111, 361
reparametrization, 382 semisimple
replacement theorem, 17 eigenvalue, 492
residual vector , 93 matrix, 151
residue spectral mapping theorem, 153
of a function, 440 separated
theorem, 442 spectrally, 540
resolvent, 470 sequence, 193
of an operator, 470 Cauchy, 195, 263
set, 141, 470 convergent, 193
reverse Fatou lemma, 358 uniformly convergent, 210
Riemann integral, 325 sequentially compact, 206
Riemann's theorem, 426 series, 212
Riesz representation theorem, 120 sesquilinear, 89
right set, 627
eigenvector, 154 complement, 630
invertible, 642 difference, 630
688 Index