You are on page 1of 2

11

Certainty-equivalent control

We show that the addition of noise to a linear system with quadratic costs does not change the optimal control, as a function of state. Consider the realised stochastic controllable dynamical system (G, (n )n 1 ), where G(x, a, ) = Ax + Ba + , x Rd , a Rm ,

and where (n )n 1 are independent Rd -valued random variables, with mean E() = 0 and variance E(T ) = N . Thus the controlled process, for a given starting point X0 = x, is given by Xn+1 = AXn + BUn + n+1 where Un = un (X0 , . . . , Xn ) is the control. We study the n-horizon problem with nonnegative quadratic instantaneous costs c(x, a) and nal cost c(x), as in the preceding section. Thus c(x, a) = xT Rx + xT S T a + aT Sx + aT Qa, c(x) = xT 0 x. Set
u Vn (x) n1

Eu x
k =0

c(Xk , Uk ) + c(Xn ) ,

u Vn (x) = inf Vn (x). u

Suppose inductively that Vn (x) = xT n x + n . This is true for n = 0 if we take 0 = 0. By a straightforward generalization25 of Proposition 2.1, Vn+1 is given by the optimality equation Vn+1 (x) = inf {c(x, a) + E(Vn (Ax + Ba + ))}.
a

We have E(Vn (Ax + Ba + )) = E((Ax + Ba + )T n (Ax + Ba + )) + n = (Ax + Ba)T n (Ax + Ba) + E(T n ) + n and we showed in the preceding section that inf {c(x, a) + (Ax + Ba)T n (Ax + Ba)} = xT r (n )x,
a

with minimizing action a = K (n ). Also E(T n ) =


i,j

E(i (n )ij j ) =
i,j

E(Nij (n )ij ) = trace(N n ).

So Vn+1 (x) = xT n+1 x + n+1 , where n+1 = r (n ) and n+1 = n + trace(N n ). By induction, we have proved the following result.
We have moved out of the setting of a countable state space used in Section 2. For a function F on S A, instead of writing P F as a sum, we can use the formula P F (x, a) = E(F (G(x, a, ))).
25

31

Proposition 11.1. For the linear system Xn+1 = AXn + BUn + n+1 , with independent perturbations (n )n 1 , having mean 0 and variance N , and with nonnegative quadratic costs as above, the inmal cost function is given by Vn (x) = xT n x + n and the n-horizon optimal control is Uk = K (n1k )Xk . This is certainty-equivalent control as the optimal control is the same as for = 0.

32

You might also like