Professional Documents
Culture Documents
~ E E ETRANSACTIONS
AC-16, NO. 6,
DECEMBER
1971
~~~
~~
~~~
689
estimate f of t.he corresponding- value of the ra.ndom vector The corresponding minimum valueof (2) is then
E{llX - f[l21Y= y]
E{ilXllY = y ) -
(4)
llf112
a),
~~~~
~~~~
~~~~
~~~
~~
~~
~~
~~
~~
~~~
~~
~~
E(llX - zl121Y
y]
E{XX - 2zX
E{XXlY
+ zz(Y = y)
y)
- 2ZE{X(Y = y)
=
we t,hen have
22
~ { l lz E { X I Y= y)lj2]
+ EtllX1I21Y
-
(7)
Y)
liE{XlY = y}p.
(2)
(6)
E{XIY = y).
(3)
JPam
IIX
IlXlY(XlY) dx
fY@) dY
' , '
690
IEEE TRANSACTIONS
ON AUTOMATIC
CONTROL, DECEMBER
1971
a ) .
2 = E{XIY}
E { X I Y } - E { X l Y ) = 0.
of X given Y , and t,he corresponding minimum mean3) For any nonnegative nlatrix F , 2 = E{XIY} minisquare erroris the conditional varianceE {\\X - E{XIY}j\z}. mizes E { [ X - g ( Y ) ] ' F [ X- g ( Y ) ] )over all funct.ions g:
It is important t.o note that, in Problem 1 we seek an R" + R", and f = E{Xl Y = y } minimizes E { [ X estimate, viz., the vector f that mininlizes overall n- z ] ' F [ X - z ] ( Y= y} over all n-vectors z .
vectors z the conditional expectation E{llX - zl12:Y= y}
P.roof: The proofs of 1 and 2 are included in t.he st,ategiven that Y has value y, whereas in Problem 1' we seek ments of these properties. Property 3 follows by a direct
an estimator, viz., the function 2 t,hat minimizes the (un- modification of the proof of-Proposition 1. If F is positive
conditioned)expectation E{jlX - g(Y)I:2)overall func- definite t,he proof isunchanged if JJq!12
is interpreted to
tions g mapping Em into R". As long as thcfunctions g a.re mean q'Fq a.nd z'q is replaced by z'Fq; if F is nonnegative
unconstrained in any way (such as being cont.inuous or definite the same identifications may be made but j1qiJ =
linear) we canconstruct the solut.ion to Problem 1' by [q'Fq]1'2
is in this case only a seminorm, and while f =
solving Problem 1 for each y with f&) > 0, as we have E(XIY = y } minimizes t*hefirst t.erm on t,he right. side of
done above. If, on the other hand, t,he functions g over c2), it does not do so uniquely.
which we seek a solution t.o Problem 1' are consbrained to
The following property of the least squares estimator is
be, for example, linear or continuous, then this approach somewhat less trivial and, like the above t.hree propert,ies,
is no longer valid (unless, of course, the unconstrained has far reaching implications in est.imation t,heory.
leastsquaresestimat.orturnsout
t.0 have t.he desired
Proposition 5%: The e: ti nation errorX 4 X - 2 in the
property of linearity or continuit.y), and an alt.ernative least squares estimator .f = E {XIY ) is uncorrelabed with
approach must. be adopted. Depending on the joint, prob- any function g of the random vector Y , i.e.,
abilitydensityfunctioninvolved,
it may well turn out,
E{g(Y)X'j = 0
(8)
that. in the
constrained casethere maybe some functions g
and values y for which
and,
fact,
in
.rl x nlbut 2 is still t.hc best "on t.he average" in the sense that (7)
holds for allfunct,ions g satisfying the required constrainbs.
E { g ( y ) [ X - 2(Y)I'lY
Yl
IV. LEASTSQUARES
ESTIAIATION
OF Garrss~a~
RAKDONVECTORS
Gaussian random vect,ors play a major role in probabilit.y theory and system theory. Their importance.
s t e .~
m
largely from.two
fa.ct.s:first, they possess many dist,inctive
~_________~.._
dismathemat.ica,lproperties
and,
second, the Gaussian
.
.
. ..
.~~
tribution bears close resembla.nce to t.he _
probability
laws
_
~
~
~~~
~~
691
fz(z)
(11)
where I B I is the determinantof Z. The corresponding characteristic function of 2 is
pz(u)
....................
(2rr)-r121z1-1/2exp[--4(2 - m ) ' z - l ( z - m ) ]
E{exp
..'W
(12)
z.
Using the well-known [I 1 , [ 4 ] properties of the characterist,ic functionto compute t,he moments of Z we have,
in pa.rticular,
[Y, X ] cov [ Y , Y ]
(13b)
x 4W %ffi'L
Then t,hefollowing properties
hold.
Property 1: If W = A 2 where A isanynonrandom
q x r matrix t,hen, from (12),
= LCOV
pw(v) =
E{exp ( j u ' w ) }
= pz(A'u) =
=
EIexp Ij(u'A)Z]}
XYY).
(xy$
cov [Z,
Z]
1 d2&
= 7 . - (0)
dv2
22
- mm'
F%Y
X.
~~~
~~~~
~~
_
.
......
K exp - +[x'Sxxx
+ x'Sxyy + y'Syxx
.....
~~
b/=/M+r/L
so tha.t,, expandingS z
sxxxxx
I , we ha.ve
SXYZYX =
(15a.)
692
CONTROL,IEEE
AUTOMATIC
TRANB.4CTIONS ON
SXXZXY
SYXZXX
SYXZXY
+ SXYZYY
+ SYYZYX
+ SYYzYY
= 0
I.
f x l ~ ( x l v=
> K exp
- 8[Ilx + ~xx-1SxyyJlsxx2
sxx-'
= XXX
+~ x x - ' s x y z Y x
V. THELINEARLEASTSQUXRES
ESTIMATOR
I n this section we restrict attention to the
special case of
Problem 1' in which the estimat.or 2 is rest,ricted to being a
li?zeu.r funct.ion of t,he random vect,or Y. The problem of
interest is therefore the follom4ng.
Problem 2: Co-gider- t y o iointly dist,ributed--random
vectorsX and Y whose means and covariances
are as.
sumed known,2 i.e., it. is assumed that the mean and c_; _
variance of the composite random vector 2 defined by
2 = [X', Y']' are givenby (13). Find thelinear est,imaJo_r
~.
2 = A" Y bo of X in terms of Y that & best in thesense
that .? minimizes
~
Sxx-'Sxy
- ZxyZyy-'
sxx-1
z x x - ZxyZyy-'Zyx.
syy - syxsxx-1sxy
mx
zXYxYY-l(y
~~~~~
~~~
~~
A E { [ X - AY - b]'[X - AY - b]}
li. = E { X I Y )
~~
&.&'
L L V L
A ~FFMZI&&
argument from (15d) a,nd ( l j b ) yields
zyy-1 =
1971
(15b)
= 0
DECEMBER
- my). (17)
(18)
mz
Y:
~~
~~~~
~~
~~
~~~~~
693
t r [ z , - A Z y x - ZxyA'
2 = 2(y)
+ AZyyA' + bb']
(19)
a
-
ab
E{iJX- AY - b1I2) = 0
- S{llX -
aA
AY
- AY - b112) = t,r ( x x x ) - ( A , ~
2b"
bl;'] = 0
=
E{I;x
BxyZyy-'y.
AOZyy'
+ A O Z y y - Z x y - Zyx'
x y ~ ~ Y - 1 )
2 = mx + ZxyZyy-yY
- my)
(23)
x
A
ZxyB yy-1Y
i = x - f
is given by
cov [X, 21
E{X? )
Z x x - ZxyZyy-'Zyx.
E*{XIY} =
2=
ZxyByy-'Y.
(24
694
IEEE TRANSLCTIONS
1971
process which plays a major role in later sections of the [ X - 2, Y ] = 0. In fact,X is uncorrelated wit.h any linear
function of Y and, in particular, cov [ X , 21 = 0.
paper.
Property 2: If Y and 2 a.re uncorrelated then the best
As before, we separate the trivial properties, which we
linear estimat,or of X in t.erms of b0t.h Y an.d 2 (i.e., in
give first., from thosethat are somewhatless trivial.
Proposition 4a (Properties of the Best Linear Estimator): terms of the composite vector [Y',2'1') may be written
Let X , Y,and 2 be jointly distributed random vectors and
E*{XIY,2 ) = E*{XIY) E * { X / Z )
(25)
let .? = E* {XI Y ) be the best linear estimator of X in
terms of Y in the sense defined by Problem 2. Then we andthe corresponding estimat.ion errorhascovariance
have the following properties.
z x x - zxyzyy-1Zyx
Z ~ z Z z z - 1 Z z x = B x- x- is the covariance of X i y 4
P r o p u t y 1: The best linear estimator (24) and the co- ZxzBzz-lz;zx, where
variance (22) of the corresponding estimation error X = X - E*{X(Y{.Alternatively, with convenience for later
X - 2 depend only on the
first and second momentsof the applications in mind, we can m i t e
random vect,ors X and Y and noton their entire probability
E*{XIY,2 ) = E*{XIY) E * { X l y l Z )
(26)
density functions. Thus jointly distributed random vectors
as Egg with the same means and covariances but. different prob- and the covariance of theestimationerror
where ~ 2 %
is as aboveand
z2z =
ability density functions have
the same estimator X =
E*{XIY).
cov [XIY, 21.
Property 3: If Y and 2 are correlated, then the best linear
Property 2: When X and Y are joint.ly Gaussian the
(unconstrained) least squares estimator E { X \ Y ] is linear estimat,or of X in terms of both Y and Z may be written
in Y and coincides svit,h the linear least squares estimat,or
E*{XIY].
Property 3: If X and Yare uncorrelated then
E* {XI Y ] =
E { X ) = E{X -
W)
= O
cov [&,
or, equivalently,
E{2j
X,,l - cov
+ ClY] = ME"(X\Y) + c
cov
g(Y)= AY
[ Z , y ,X,,].
E*{X
P , Y , i , Y l [ C O V P , Y , 2IY11-1
E{XJ.
E*{MX
W,k+l
2lk
+ E*{X,klYk+llkl
(27)
Xlkl.
(29)
b of Y.
Proof: Properties 1 a.nd 2 are trivia.1 observations included for completeness.Properties 3 and 4 follow immediately on setting Z x y = cov [ X , Y ] = 0 in (24) and
takingtheexpectation
of both sides of (24) [or, more
genemlly, (23)], respectively. Propert.ies 5 and 6 are est,ablished by direct modification of the proof of Proposition 3.
Pmposition 4b (Furthe?.P.roperties of the Best Linear
Estimator):
Property 1: Thelinearleastsquaresestimator
X =
E*{XIY) is characterized by the condition t,hat the estimation error X = X - 2 is uncorrelated with Y,i.e., cov
.cov
[Yk+llk,
Y we have
cov [X - AY, Y ] = B x y - A Z y y
which (assuming Y is nondegenerate so t,hat B y y is positive definite) is zero if and o n l y if A = z x ~ x y y - which,
~,
in turn, uniquely defines the best linear estimator (21).
Thus cov [ X - AY, Y ] = 0 if and only if A Y = 2 is the
best, linear ehmator. Furthermore, for anymatrix M ,
cov [X,M Y ] = E{XY'jM' = 0.
695
[Y,
21x e ha.ve
zlk
0.1
a,
e ,
e ,
69 6
state
More
isvector.
a random
specifically, we first direct
attention to the following prediction problem formulated
as Problem 3. In contrast to our earlier practiceof distinguishing b e b e e n random vectors &ndiheir sample values
by denoting the former wit,h capit,al letters and the lat.ter
wit.h lower case letters, we -henceforth follov-- st,andard
pract.ice of writ.ing b0t.h random vectors a,nd t,heir sa.mple
values as lower case letters. No confusion should arise if,
unless specificalliindicated to the contrary, lowercase
let.ters are interpreted as random
vect.ors.
For the first part, of this section~we- depart
t,emporarily
from our earlier standing assumptionthat all vectors have
zero mean, and include in the descript-ion of t,he dynamic
system a det,erministic control input. and a nonzero-mean
-~
initial state. Ret,aining these terms in the inductive
proof
is no more difficult t.han omitting them, and it is perha.ps
constructive to clearly exhibit the role they play by including t.hem inthe proof.
Problem 3: Consider the discrete-t,ime n-dimensional
1inea.r dynamic system whose stat.e X , at time k is generated by thedifference equation
~~
~~
Xk+l
~~~~~
BkU,
-k
(32)
(37)
Find, for each k = 0, 1 , 2 , . . .,the best l-&zrestimator of
t.he system stateat time k
1 in terms of the
output
se. ___
__
quence up to and including time
k, i.e., find the best-1inea.r
estimator & + q R
L& E* {Xk+ll&) of therandom
vec&
Xk+l in terms of the sequence 2,of random vectors zo, zl,
~.
_-
. . . , 2%.
~-
ik+llk
Akl?klk-I
201-1
c,x, +
(33)
ek.
E{xo}
= rno
= Bo.
(34)
+ Bkui: +
Lk[zk
ck&]R-l]
(38)
1971
= rno
E{xo}
(39)
AkZ,C,[C,Z,C,
(40)
E{<,} = 0
[Ek,E j ]
cov [e,, e j ]
cov [E,, e,]
cov
xk+l
E(O,} = 0
B ~ C ~ [ C k ~ ~ COr]-CaZk]Ak
p
6kj
(35)
1,
k = j
0,
k f j
Furthermore, it. is assumed tha.t the initial state x. is uncorrelatedwith the dist.urbances 6 and e, for all k =
0, 1, 2, . +,i.e.,
x01
0.
+ DAErDr
(43)
with initia.1condition
0, positive definite
cov [Et,
(36)
The matrices Ak, B,, C,, and D, are assumed tlo be known
and nonrandom for all k and t,o have t.he appropriate dimensions. For each k = 0, l l 2, . . . , denote by 2, the
of (random)
output
vect,ors up t o and
sequence
including time k , i.e.,
Bo
cov [xo, X O ] .
(44)
697
__--
MODEL OF SYSTEM
-------1k+l/k
I
I
I
DELAY
Xk/k-l
+I
I
I
*k
iklk-1
+0
E*{zklz,-lJ = CkE*{Xk[Z&l)
= C&&1
zk
- iklk-1
= zk
Ck.fkl.Z-1
ckXklk-1
= Ak-IAk-2.
*E,,
Similarly, for k
cov
[Xklj-l,
.. A ,
I.
(49)
>j ,
ijlj-l]
cov [x, -
fkl,-l,
ijIj-11
2
(Pk,jB,Cj.
(50)
(45)
~ l
+ 6,.
(46)
&+llk
2 E*(xk+llz?:]
=
E*{Xk+llZk-l)
&+llk-l
E*(xx+llZk-l, zx}
Lk[zk
E*{.Yk+llk-lliklk-l)
iklk-11
(m
698
TR4XShCTIONS
IEEE
1971
a.nd
matrices
gain The
Gdetermined
using
k , * can be
the charact,eriz,ation of 2,l,-1 as the linear estimator whose estimaL, = cov [ i , + l I , - 1 , ixlr-1l[cov [ i k l k - 1 , i,1,-1]]-1
t.ion error is uncorrelated with all past data, i.e.,
which,using (47) and (50), becomes (40). Furthermore,
tcj 5 k - 1.
cov [x, - f , [ k - l , ijlj-11 = 0,
can be
from (29), the covariance of X,+~J~; = x,+l - ik+llk
mrritten
from
Substit.ution
(56) yields
- cov
.cov
i=O
Znlr-lI[COV
[i.k+llt-l,
[iklk-1,
ikle-lll-l
tcj<k-1
IikIR-1,X,+ll,-11
and, because {Zjlj-l) is a white-noise process, the summaon t,he right reduces to a single term:
in
fact
from
(47)
AkZrCk
[C,BkCk
Ot]-lCkBkAk.
(53)
AJ3*{xr,]Zk-1)
Akfxlk-1
+ B ~ u ,+
Gk,j
C
j=O
=
=
0jI-l.
(57)
AkOk,jf o r j 5 k - 1,
AkGknjfor j 5 k - 1
k-1
I;
D,E*(&lZ,-1]
(54)
Ok,,SjCj[CjXjCj
+ BrU,
Gk+l,jZj\j-l
A&~-l
A,
f&.+jlj-l+
j=O
AkBPCk[CkZECE
GP+l,tikIk-1
. [z, - c,n,,,,]
where E*{&\Zk-l) vanishesbecause Zk-l dependsonly
(and linearly) on xo, { & ] j = o - 2 and { & } j = ~ - l , all of which which is (38) with u, = 0. The initial condition follom-s as
are, by assumption, uncorrelated 1vit.h t;,. Furthermore, a before. Subtraction of (38) from (32) yields
similar a.rgument showstha.t i k l k - l and & are uncorrelated,
&+Ilk
= (A, - L k C X ) i k l E - l - L A
a.nd since ut isdeterministic,the covarince of t,he corresponding est.imation error is easily found to be
from which the difference equa.tion (43) and boundarycondition (44) for the error covariance follow by direct calcov [ i t + 1 1 , - . 1 , i k + ~ l r i - l l = 8,cov [i&l,t-~,
X~lt-~lAt
culat.ion. Thus the alt,ernative proof is complete. Q.E.D.
DE cov [&, Ekl&
Remark l-The Gaussian Case: When the initial state
x.
and
t.he
random
processes
are
joint.ly
Gaussian,
the
system
= AEZkAk
D&Dk
(55)
state x,+l and the output. sequence 2 , (or the innovations
using either the formula in Pr0pert.y
5 of Proposit,ion 4a or sequence {ijlj-l]jsoi)
are jointly Gaussian for all k, since
bydirectcalculationaftersubt,racting
(54) from (32). linear transformat.ions and sums of joint,ly Gaussian vecSubstit,ut,ion of (54) and (55) into (51) and (53) then torsare themselves joint,ly Gaussian. As notedearlier,
yields the iterative expressions (38) and (43).
under these circumstances the unconstrained least squares
The induct>iveproof is complet,edby n&ng that, by estimator is linear and thus coincides with the best linear
choice of t,he boundary conditions (39) and (44), Proposi- estimator: thus the Kalman predictor defined by Proposition 5 holds t,rivia.llyfor k = 0, since, in the absence of any tion 5 is in this case t,he unconstrained lea.& squares esti=
outputdatainterms
of which to
estimate xo,
mat.or E{xk+llZk}of X,+]. in terms of 2,.
E { x ois] the correct best linear estimator of xo and ZO=
Remark !&--Prediction Beyond T i m k t 1: For any
cov [xo,xo] thecovariance of the corresponding estimation integer i 2 1, the best linear estimator &+i+ilk = E*{x,+il
error.
Q.E.D. Z,] of xkti in terms of 2, may be obtained from
&+1llc
For the alternat,ive proof of Proposition 5 we return to using the formula
our earlier st<anding assumption that all random vectors
have zero mean a.nd t,ake u, = 0 and rno = 0.
Alternative Proof of P ~ o p o s i t w n5: In view of the
equivalence of information (insofar aslinearestimation
where Ok,?
is defined by (49). This result follom-s by a.pplyis concerned)
conveyed
by
the innovations
sequence
ing t,he linearity of the least linear estimator to thesolu{ZjIj-1jjak
and the output process {zj],=gk,ft~s-lmay be
tion of (32) at time k
i and recalling t,hat, as noted
written
earlier, Z,-l is uncorrelated wit.h t j for j 2 k.
6-1
Remark 3-FiZtekg: If A, is invertible, then itlx
f , , , - l = C G k , sii 1 i-1.
(56)
E*{x,lZ,) may be obtained from f k + l l x using the relation
i=O
699
2klk = Ag-l[.&+i+llk
B k ~ k ] .
A&I,
+ BKu~
+
+ B,u,)l
In thissection we consider the continuous-time count.erpart. of t.he discrete-time stateestimation problem examined in the preceding section and derive t,hecorresponding
with boundary condition [recall (34)]
continuous-time Ihlman-Bucyfiker
using t,he contin2010= mo
MO[ZO
- Coma]
uous-timeanalog of thealternative(innovations)
proof
given
in
Section
VII.
This
derivation
is
formal
t.0
t,he
exwhere
tent that it involvessomefamiliarformalmanipulaMk = ZkC,[CkMkC,
@E]-
tions with white noise and omits t.he step of rigorously
and Zk sat.isfies (43) withbounda,ry condit.ion (44). The provingthe equivalence of t,he output) process and the
proof of this assert.ion follows from a simple modification innovations process insofar aslinearest,imation is concerned. A more precise treat.ment, would requiret<he
of t,he proof of Proposition 5.
theory
of stochasticintegralsandstochasticd3erentia.l
Remark 4-Smoothing: Even if the matrix
e
equations
(see, e.g., [15]-[lS]) and would includea rig-4-1. . . A j is invertible for all j < k, the best linear est*imaorous
proof
t,hat the linear transformat,ion that, genemtes
tor (smoot,her) 2jlk E* ( x j / & ] is not in general given by
the
innovations
process fromthe output process is causally
k
invertible.
The
innovations
approach
to est,imationand
fjla= @ k + l ,ijf-Z
l +llk
~ k + l , ~ + l B ~ u (not
~ true)
2=3
detection problems is duet.0 Kailath a.nd his c.ollaborators,
if j is strictly less than k. Briefly, this is because tiis clearly t.0whose works the reader is referred for more detailed
correlat,ed with zi+l and all future output,s ziC2, zi+3, . . ; t,reatment,and extensions, includingnonlinearproblems
thus E*{&lZ,} is not in general t.he zero vector for i < k . ([7l-t121).
Problem 4: Consider the smooth .rz-dimensional linear
The discrete-t,ime smoothing problem can be a.t.tacked by
dyna.mic
syst,em with sta.t.eequat,ion
methods analogous taothose discussed later in connection
vcit,h the continuous-time state est.imation problem.
i ( t ) = A(t)x(t)
B(t)u(t) D(t)T(t)
(58)
Remark5-CorrelatedNoises:
If t h e noise sequences
{EZ f and {ea1 are whit.e, zero-mean, and uncorrelated with a.nd m-dimensional output
x0 but correlat.ed with each other, so that.
z(t) = C ( t ) x ( t ) 8(t).
(59)
1M,,l[Z,+l
Ck+l(AZifklk
cov
[I,, %I
w k j ,
= [AkBkCk
+ Dkrk][CkX,C,+
E { x ( t o ) }= mo
cov [ x ( t o x) ,( t o ) ]
BO.
(60)
The cont,rol input, u(t) E R is assumed known and nonrandom for all t, while the dist,urbances t(.) and e ( . ) are
assumed to be whitmezero-mean stochastic processes that
8,+1 = ArBkAl: - [A,X,C,
D k r k ] [ C , ~ k C , r O1;]-l are uncorrela.ted wit-h each other and with xo, and have
known covariances, i.e., the q- and m-dimensional random
. [CEBkAk
r,D,]
D&D,.
vectors t(t) and O(t) havethe
followingsecond-order
The only modifica.tion that, is necessary in the induct,ive st,atistical propert,ies for all t, s:
proof isinthepriorcalculation
of cov [Xk+lIk-l,ipl,-l]
E{l;(t)j = 0
E{0(t)j = 0
given by (50):t,he correlation bet.ween & and BE leads to a.n
cov [ W , a s ) I = = ( t ) w - s),
addit,ional term in(50), which in this case becomes
cov
[&+Ilk-l, Zklk-l]=
AgXkC,
+ Dkrl;
cov
@(t)posit,ive definite
[t(t),
X(&)]
700
I EAUTOZ4TIC
E E "FLAKSACTIOXS ON
CONTROL, DECEMBER
1971
1.
?(TIT)
17'
a
)
.
cov [x - f, y ( u ) ]
cov [ x , y ( u ) ]
- l T H o ( T )COV
Y(u)
[Y(T>,
1 (17 = 0. (64)
-p
E-#
i(tlt) = z(t)
=
w Y ( ' )
d.113
1
T
1(.
- J)P(r)y(T)
- 2~ ([x -
lT
dT/I2+
Ho(T)y(T) d7]'
&(T)y(T)
j-
(IT1'2)
T
fi(T)Y(T)
dT/I2}
ST
~ ( n ) y ( u )d o )
(65)
where HO( .) is defined implicitly by (64) and f i ( t ) 2
H(t) - Ho(t).Nomy note that thesecond t.ermon the right
because it
can be written
side of (65) vanishes
- C(t)f(tlt) = [x(t)
C(t)
C(t)i(tlt)
+ e(t)
- i(tlt) 1
+ O(t)
(68)
COV
[ x ( t ) , i(TIT)] = @(i,T)X\(T)C'(t)
the integrand of which is, by (64), identically zero. The matrix associated
first term on the right side of (65) is independent of H ( . ) , A ( t ) x ( t ) .
-with t,hed8erentia.lequation
x(t) =
701
JG ( ~ , T ) ~ (dTr ~ T )
i(tlt) =
z(t)
4 cov
i ( t l t ) ] = Q(t, $ O ) & ~ ( t ,
+ s, t ( t ,
(71)
to
where the mat.rix-valued gain function G(t, can be calculated using t,he characterization (64) of Z(tIt), viz.,
[x(@),
to)
dt
T ) ~ ( T ) c ( T ) O - ( T ) C ( T ) ~ ( T ) ~ T( )t ,
lo
* ( t , T ) D ( T ) ~ ( T > D ( TT )) dt
~(~,
(77)
e )
COV [ X ( t ) ,
i ( ~ I u )=
]
i(t)
[A(t) - z ( t ) C ( t ) ~ - ( t ) C ( t ) ] ~ ( t )
(72)
+ B ( t ) [ A ( t )- ~ ( t ) C ( t ) O - ( t ) C ( t ) ]
+ D(t)?(t)D(t) + z(t)C ( t ) W l ( t ) C ( t ) r , ( t )
A ( t ) z ( t ) + B(t)A(t) - z ( t ) C ( t ) o - l ( t ) C ( t ) Z ( t )
+ D(t)S(t)D(t)
(78)
to
to
< t.
G(t, U ) = * ( t , U ) X ( U ) C ( ~ ) @ - ( U ) , t o
i(tlt) =
=
1;
~ ( t ) ~ ( tT)i(TlT)
,
A(t)f(tlt)
dT
< t.
+ ~ ( tt)i(tlt)
,
+ G(t, t)i(tlt)
E{X(t)}= * ( t , tO)mO
(73)
1:
and this sum mustbe added to the right side of (71). This
leads to an addit.iona1 term B(t)u(t) on the right side of
(75) and c.hanges its init,ia.l conditionfrom zero to mo.
The differential equa.tions for i(tit) a.nd Z ( t ) remain una,ffected.
I n summary, we have the following solution t o Problem
f(tlt)
@(t, T)G(T,T ) = * ( t , T ) ~ ( T ) C ( T ) O - ~ ( T )
+ B(t)u(t) + z ( t ) C ( t ) O - * ( t )
A(t)i(tlt>
[z(t) - C(t)i(tlt)]
(74)
with initial condition
i(tlt) = A(t)i(tlt)
T ) B ( T ) U ( T ) (17
4.
@(t, ~ ) G ( TT ,) ~ ( T ) T dr
)
G(T, T ) = Z ( T > ~ ( T ) @ - ~ ( T )
+ l:@(t,
~ ( t o I t o ) = mo = ~ { x ( t o ) }
t(tlt) = [ A ( t ) - X(t)C(t)@-l(t)C(t)]i(t(t)
+ D(t)t(t) - x ( t ) C f ( t ) w l ( t ) e ( t )
from which we can write
i(tlt>
d!(tJ to)i(tolto>
A(t)X(t)
+ F(t)A(t) - B(t)C(t)@-(t)C(t)B(t)
+ D(t)S(t)D(t)
r,(to)
[(T)t(T)
c(~)
o - y T ) e(T) 1 dr
l+
-z
i(t)
(76)
= x0 =
702
u(t)
1971
MODELOF SYSTEM
a )
cov [ ~ ( t e(s)l
),
r(t)&(t
- S)
i ( t ) = A(t)X(t)
+ X(t)A(t) - [ X ( t ) C ( t )
+ D(t)r(t)l@-W[C(t)r,(t) + r(t>D(t)l
+ D(t)3(t)D(t)
.z
H ( M )
+ J(t)v(t)
@Ob)
w(t)&(t
= v(t)s(t- T)
Cov [w(t), w ( T ) ] =
cov [v(t), u ( T ) ]
which follows immediat,ely by applying the linearity
of the
best linear estimator t.0 the solution of (58) and recalling
that, as not.ed ea.rlier, T(s) is uncorrelated n<tah 2, for
s 2 t. A direct calculation shows tha.t the covariance
of the
corresponding estimation error is given by
cov [ i ( t
t+
.)
7)
a.nd, in a.ddition,
Jz(t)V(t)Jz(t) > 0,
vt.
(81)
703
f ( t J T )=
14T
H(t, T; ~ ) i ( ~ l rdr
)
cov [ x ( t ) , i(uIu)]
L T
H(t, T;
to
u)@(u),
G(t, T ) ~ ( T ~ d7
T )
+ I*
%(ill!) +
[x(t), i ( T l T ) ] @ - ( T ) i ( T l 7 )
COV
iT
COV
T)
(86)
IVow for t
< 7 we have
that
cov [ x ( t ) , i(TIT)] =
=
=
COV
[x(t), X(71T)]c(T)
+ i ( t l t ) , X(71T)]C(T)
cov [X(tlt),
+0
X(7(7)]C(T)
P(t, T)C(7)
(87)
P(t, 7 )
COV
+0
cov [ i ( t l t )
forsomecontinuousmatrix-valued
funct,ions P(.) and
S ( - ) and somecontinuously difTerent.iable, symmetric,
nonnegative definite, mat.rix-valued function Q( - ) whose
derivative Q( .) also has nonnegative definite values. The
notation t A 7 denotes the minimum of t and T [20]-[22].
If the combined process n( .) is wide-sense st,at,ionary,
so that R(t, 7) = R(t - 7 , 0) = R(T - t , 0), an equivalent
characteriza,t,ion can be given in t>he frequency domain in
t.erms of the Laplace transform8( of R ( .,0) :a necessary
and suAicient condition for t,he wide-sense stationary
process n( to be repre~ent~able
as the output of a stable,
constant*finite-dimensional linear dynamic system of the
form (82) is t,hatits spectrum A ( . ) = e[E{n(t)n(O)]] is a
rational function of the form
C&
[ x ( t ) , i ( T 1 T ) ] @ - ( T ) i ( T ( 7 ) d7.
e)
R(t, 7) = P(t)Q(t A T ) P ( T )+ S ( t ) S ( ~ ) s ( t
< T.
i(tlT) =
We not,e that the requirement (81) ensures that, t,headditive white noise in the mea.surements (82) is nondegenerate in the sense that it, has positive definite cova.riance
nmtrix, which conformswith our earlier standing assumption (61) on the additive whit.e measurement noise.
If we rest,rict, attention to white-noise processes v( * )
a.nd w( t.hat are uncorrela.ted, a necessary and sufficient
condition for the combined process n( .) defined by (79)
to be representable as t,he output of a finite-dimensional
linear dynamic system of the form (82) is that its autocorrelation funct.ion R(t, 7) = E{ n(t)n(T)I can be expressed
for a.11t and 7 a.s
[x(tlt), ?(717)]
T,
z(f)d!(T,t ) ,
t 5
(88a)
where, as before, a(7, t ) is t,he transition matrix associated
Ttit.h zb(t) = [A(t) - x ( t ) C ( t ) @ - l ( t ) C ( t ) ] w ( t We
) . not,e inpassing t>hat, fort >_ 7,we have
X(717)]
lJ(t, T ) Z ( T ) ,
7.
(88b)
a )
e )
Thus
Z(tlT) = f ( t l t )
+ B(t) J*q(7,
t)C1(7)@-1(7) [z(.)
- i(71.)]
d7
(89)
and subtracting each side of this equation fromx ( t ) yields
~~~~
704
r(t)A
cov [i(tlT),i ( t l T ) ]= X ( t )
- X(t>
[iT
$(T,
1971
~ ) C ' ( T ) @ - ~ ( T ) C ( Tt )) dr
~ ( ~ B(t).
,
e )
x-'(to)
zo-1
IX. THEDUALITY
BETWEEN LEAST
SQUARES
ESrIMATION ~ N LEAST
D
SQUARES CONTROL
Consider the least squares regulator problem involving
the linear dynamic system
b(t) =
-A'(t)P(t) - C ' ( t ) ~ ( t )
(924
v(t> = D'(t)P(t)
(92b)
P(h)
=PI
(93)
J [ w ]=
x+
[W'(t)@(t)W(t)
+ u ' ( t ) 8 ( t ) u ( t ) ]d(-t)
i ( t > = A(t)x(t)
YO)
+ D(t>dt>
(984
(98b)
C(t>x(t)
+ L(t)z(t)
4(tlt) = [ A @ ) - L(t)C(t)]Z(tlt)
(99)
(0 =
- [A(t) -
v(t) =
L ( t ) C ( t )1)"
--L'(t)#(t)
(100a)
(100b)
P'(t0)xoMo)
lot'
+ P'@o)xoP(to).
+ P'(t)D(t)8(t)D'(t)#(t)]dt
[w'(t)@(t)w(t)
705
TABLE I
Least
Squares
Estimation
Least
Squares
Problem
x
+
Solution
wo
- ( A - LC)pO
(S*)
-LpO
L = KC@-1
K = A K + KA
- KCO-lCK
DED
K(t0) = Eo
Transition + ~ ( t ,s) = e A ( s - 0
=
Transition
mat,rix
s) = e A ( t - d
matrix
Controllability
matrix
[D, A, D, ...
Observability
An-ID] matrix
2 completely
controllable
Observablity
mat.rix
[C,AC, . . .
Controllability
(A)n-;C] matrix
2 completely
observable
[D, A D,
. .,
An-lD]
2* completely
observable
[C,AC, . .
(A)-lC]
2 * comulet.elv
corkrollible
e ,
AB,
+ :E,A
- z,CO-~CB,
+ DsD
0.
(101)
Ai(t1t)
+ Z,CO-l[z(t)
- Ci(tlt)] (102)
+
+
0)
706
solution is that all theeigenvalues of A have strictly negative real parts. [a]
Under these circumstances the problem becomes identical tothe classical Wienerfilteringproblem,and
the
steady-state Kalman filter (102) is the optimum realizable
Wiener filter for this problem.
Perhaps of more practical importance, however, is that
the steady-state Kalman filter (102) is trivially the solution to the finite-time estimation problem of Section VI11
if the covariance at the finite initial time to is taken to be
cov [ x ( t o ) ,x ( t o ) ] = 8,. Also of importanceis the result
that, even if cov [ x ( t o ) , x ( t o ) ] is not 8 , but some other
value &, the error in using the steady-state Kalman filter
(102) instead of the correct time-varying filter approaches
zero as t + m [25], [26].
ACKNOWLEDGMENT
Thecommentsand
suggestions of D. 1,. Snyder, T.
Kailath, and A. S. Gilman are gratefully acknowledged.
REFERENCES
E . Parzen, Modern Probability Theory and ItsApplications.
New York: Wiley, 1960.
R. W. Brockett, FiniteDimensional
LinearSystems.
New
York: Wiley, 1970.
W. M. Wonham, On the separation theoremof stochastic control, S I A M J. Contr., vol. 6, no. 2, pp. 312-326, 1968.
H. Cramer, MathematicalMethods
of Statistics. Princeton,
N.J.: Princeton University Press, 1946.
R. Bellman, Introduction to MatrixAnalysis, 2nd ed. New
York: McGraw-Hill, 1970.
M. Athans, The matrix minimum principle, Inform. Contr.,
vol. 11, pp. 592-606, Nov./Dec. 1967.
. . T. Kailath. The innovations amroach to detection and estimation theory, Proc. I E E E , v 6 i 58, pp. 680-695, May 1970.
[8] --,
An innovations approach to least-squares estimationPart I: Linear filtering in additive white noise, I E E E Trans.
Automat. Contr., vol. AC-13, pp. 646-655, Dec. 1968.
[9] T. Kailath and P. Frost, An innovations approach t o least11: Linear smoothing inadditive
squares estimation-Part
white noise, I E E E Trans.Automat. Contr., vol. AC-13, pp.
655-660, Dec. 1968.
[lo] P. A. FrostandT.Kailath,
An innovationsapproachto
least-squaresestimation-Part
111: Nonlinear estimation in
Contr., vol.
white Gaussian noise, I E E E Trans.Automat.
AC-16, pp. 217-226, June 1971.
[ l l ] T. Kailath and R. Geesey, An innovations approach to leastIV: Recursive estimation given lumped
squares estimation, Part
covariance functions, this issue, pp. 720-727.
[12] P. Frost, Nonlinear estimation in continuous-time systems,
Ph.D. dissertation, Dep. Elec. Eng., Stanford Univ., Stanford,
Calif., June 1968.
[13] D. G. Luenberger, Optimization by Vector SpaceMethods.
New York: Wiley, 1969.
[14] R. E. Kalman, A new approach to linear filtering and predic-
1971