Professional Documents
Culture Documents
7. One Dimensional
Unconstrained Optimization
Definition of Optimization
Given a function ( ), the optimization is the process of finding to
minimize or maximize the function.
The function ( ) is called the objective function or cost function.
The point is called a minimizer (denoted as = argmin ( ) ) or a
maximizer (denoted as = argmax ( )) of the objective function.
The value ( ) is called a minimum or maximum of the objective
function, i.e., ( ) = min ( ) or ( ) = max ( ) . Or we can
simply say ( ) is an optimum of the objective function.
2
Lecture Notes for MBE 2036 Jia Pan
Convex functions
From Note 1, we know that local optimum may not be a global
optimum. However, for a special type of functions called convex
functions, their local optimum must be a global optimum.
Figure 3 shows a convex 1D function. Intuitively, a function is convex
if connecting any two points on the curve, the segment is always
above the curve. More formally,
, , [0,1], ( + (1 ) ) ( ) + (1 ) ( ).
This property is called the convexity property.
If ( ) is a convex function, then ( ) is called a concave function.
3
Lecture Notes for MBE 2036 Jia Pan
4
Lecture Notes for MBE 2036 Jia Pan
Golden Ratio
Before discussing a numerical method for the optimization, we first
deviate a bit from the optimization topic and discuss a bit about the
Golden ratio.
In mathematics, two quantities are in the golden ratio if their ratio is
the same as the ratio of their sum to the larger of the two quantities.
Figure 5 illustrates the geometric relationship. Expressed
5
Lecture Notes for MBE 2036 Jia Pan
Figure 5: Geometric Relationship of Golden ratio. The point A is called the Golden Ratio Point of the
segment.
In Figure 5, the point which splits the segment into two sub-
segments with lengths and satisfying the Golden ratio
relationship. The point is called the Golden ratio point of the
segment . According to the symmetry, there are two Golden ratio
points on the segment : in Figure 6, we denote them as point
and point respectively. In addition, we note one important and
interesting property: the Golden ratio point , which is a Golden ratio
point of the segment , is also a Golden ratio point of the sub-
segment . This is due to the relationship = = . Similarly,
the point , which is a Golden ratio point of the segment , is also a
Golden ratio point of the sub-segment . We will see that this
property will play an important role in the numerical algorithm that
will be discussed below.
6
Lecture Notes for MBE 2036 Jia Pan
Figure 6: Two Golden ratio points C and D on the segment AB. Note that the point C is also the Golden
ratio point of the sub-segment BD. Similarly, the point D is also the Golden ratio point of the sub-
segment AC.
Iterative search
Iterative search is a technique for finding the minimum or maximum
of a strictly unimodal function by successively narrowing the range of
values inside which the minimum or maximum is known to exist.
Unlike finding the root finding, where two function evaluations with
opposite sign are sufficient to bracket a root, when searching for a
minimum or maximum, three values are necessary.
Definition: A function ( ) is a unimodal function if it belongs to the
following two cases. Case 1: For some value , ( ) is monotonically
increasing for and monotonically decreasing for ; Case
2: For some value , ( ) is monotonically decreasing for and
monotonically increasing for . In Case 1, the maximum value of
( ) is ( ) and there are no other local maxima; in Case 2, the
minimum value of ( ) is ( ) and there are no other local minima.
As a result, we call Case 1 the unimodal maximum case, and Case 2
7
Lecture Notes for MBE 2036 Jia Pan
Next, we can look into the details of the iterative search. Lets first
Figure 8: Two examples of unimodal but are not convex or convex
consider the unimodal maximum case (i.e., Case 1). Suppose we what
to achieve the maximum of a unimodal function within the interval
( ) ( )
[ , ], how can we successively narrow this search range?
( ) ( )
One solution is as follows: first, we choose two points and
( ) ( ) ( ) ( )
inside the segment [ , ] with < . These two points
( ) ( )
divide the segment [ , ] into three sub-segments I, II, and III, as
shown in Figure 9.
8
Lecture Notes for MBE 2036 Jia Pan
( ) ( )
Then we evaluate the function values ( ) and ( ) at points
( ) ( )
and . According to the relative magnitude of these two
values, there are two difference cases.
( ) ( )
The Case 1 is when ( ) ( ), as shown in Figure 9. In which
regions among I, II and III can the maximizer of a unimodal
function appear?
( ) ( )
Figure 9Case 1 for a unimodal maximum function where f(x ) f(x ).
Can the maximizer locates in region II and III? Yes, it can; and two
sample unimodal functions are shown in the first two sub-figures in
Figure 10, corresponding to the situations where the maximizer
locates in region II and III respectively.
Can the maximizer locate in region I? No, it is impossible;
otherwise the function cannot be a unimodal function as shown in
( ) ( )
last sub-figure in Figure 10. Why? Because < , but
( ) ( )
( ) ( ), and thus to the right of the function is not
monotonically decreasing. This is in contradict with the property of
the unimodal maximum function and thus not possible.
9
Lecture Notes for MBE 2036 Jia Pan
Figure 10: How the function will look if its maximizer locates in region II, III and I, respectively.
Since the maximizer can only locate in region II and III, we can
narrow the search in the next iteration. In particular, in the next
( )
iteration, the lower bound should be updated from to
( ) ( ) ( ) ( )
= , the upper bound is unchanged, i.e. = ; and we
( ) ( )
need to generate two new points and within the interval
( ) ( )
[ , ].
( ) ( )
The Case 2 is when ( ) ( ), and the discussion is similar.
We can conclude that the maximizer can only locate in region I
and II. As a result, in the next iteration, the lower bound is
( ) ( )
unchanged, i.e. = , and the upper bound should be updated
( ) ( ) ( )
from to = . We also need to generate two new points
( ) ( ) ( ) ( )
and within the interval [ , ].
Okay. We have finished discussing the main part of the algorithm for
the unimodal maximum function, which can be briefly summarized as
( ) ( ) ( ) ( )
follows. We initialize with four points < < < and
then go into iterations. In the i-th step, we first evaluate the values of
() () () ()
( ) and ( ). If ( ) ( ) (case 1), we perform the
update:
( ) () ( ) ()
and ;
10
Lecture Notes for MBE 2036 Jia Pan
() ()
If ( ) ( ) (case 2), we perform the update:
( ) () ( ) ()
and .
( ) ( )
Then we generate two points < in the narrowed
( ) ( ) ( )
segment [ , ]. With the new set of four points <
( ) ( ) ( )
< < , we can start the next step of iteration.
The iteration stops when the stopping criteria is satisfied, which we
will discuss later.
For the unimodal minimum function, there are some tiny differences
with the algorithm presented above for the unimodal maximum
function. The details of these differences are left as Exercises. [Hint:
() ()
the case 1 should now become ( ) ( ) where is in region
() ()
II and III, and case 2 should become ( ) ( ) where is in
region I and II.]
Golden-Section search
In the iterative search algorithm presented above, we need to
perform two function evaluations (i.e. evaluating the values of
() ()
( ) and ( )) in each step of iteration. Suppose we run
iterations, then we need to perform 2 function evaluations. If the
function is complex (e.g., ( ) = sin ), then the
computation can be slow. Can we reduce the number of function
evaluations that are necessary in the iterative search?
The answer is, yes we can. The key point lies in the property we
discussed before in Figure 6: suppose the segment is the
() () ()
segment [ , ], and we choose the point as the Golden ratio
()
point and the point as the Golden ratio point . Suppose we
11
Lecture Notes for MBE 2036 Jia Pan
( ) ( )
meet case 1, and the narrow segment becomes , = .
( ) ( )
We further need to generate and as the Golden ratio
()
point on . But according to our discussion before, the point
( ) ()
(that is ) is also the Golden ratio point of , i.e. = . We
( )
only need to generate one new Golden ratio point . In this way,
( )
we only need to evaluate the function value ( ) in the next
( ) ()
iteration, since the other function value ( )= ( ) has
already been evaluated in the i-th iteration. Similarly, for case 2, we
( ) ()
have = , and we only need to generate a new Golden ratio
( )
point and evaluate its value.
() ()
By using the Golden ratio points as , , given iterations, we
only need to perform one function evaluation in each iteration,
( )
except the first one, where we need to evaluate both ( ) and
( )
( ). As a result, we only need to take 2 + ( 1) = + 1
function evaluations in total. Comparing to the general iterative
search, the Golden-section search only takes 50%
computation, which is a great improvement.
12
Lecture Notes for MBE 2036 Jia Pan
( )
return as the approximate solution to , and the error bound
( ) ( ) ( ) ( ) ( )
| | max , .
( ) ( ) ( ) ( )
For the Golden ratio case, we know that = =
( ) ( ) ( ) ( ) ( ) ( )
0.381( ), and = (5
( ) ( ) ( ) ( )
2) 0.236( ). As a result, the error is always
( ) ( ) ( ) ( )
bounded by = (1 ) .
( ) ( )
( )
.
13
Lecture Notes for MBE 2036 Jia Pan
Figure 11Iterative search may not find the global optimum for a non-unimodal function.
14
Lecture Notes for MBE 2036 Jia Pan
Newtons method
Newton-Raphson method is an approach for finding the root of a
function such that ( ) = 0. It is an iterative method, and at the i-th
( )
step there is = .
( )
15
Lecture Notes for MBE 2036 Jia Pan
8. Multi-Dimensional
Unconstrained Optimization
Analytical method
Given a multi-dimensional differential function ( ), a necessary
condition for the point = [ , , ] to be the minimizer or
maximizer is ( ) = 0, where ( ) is the gradient evaluated at
point :
( )=
= ,
16
Lecture Notes for MBE 2036 Jia Pan
( ) = ( )+ ( ) ( )+ ( ) ( ) + (|| || ).
( ) = ( )+ ( ) ( ) + (|| || ).
17
Lecture Notes for MBE 2036 Jia Pan
18
Lecture Notes for MBE 2036 Jia Pan
= argmax ( + );
19
Lecture Notes for MBE 2036 Jia Pan
= argmin ( + ).
Example: Using the steepest ascent method for finding the maximum
point of ( , ) = 2 + 2 2 with the initial guess =
( , ) = (1,1).
2 +22 6
Solution: = ( , )= = .
2 4 ( , ) 6
Then = argmax ( + ).
( + ) = (1 + 6 , 1 6 ) = 180 + 72 7 = ( ).
The optimal needs to satisfy ( ) = 0, and thus = 0.2.
= + = (0.2, 0.2).
After several steps, we can get the solution sequence as show in
Figure 14.
20
Lecture Notes for MBE 2036 Jia Pan
From the above example, we can observe that the search directions
seems to be perpendicular to . Is this just a coincidence?
No, actually this is always true for the steepest decent method.
Theorem: In steepest descent method, the descent direction is
perpendicular to , the descent direction in the last step.
Proof: Let ( ) = ( + ), according to the selection of steepest
descent, there is ( ) = 0, i.e. ( + ) = 0, this actually
means = 0.
This property actually implies one important limitation of the
steepest descent method: if the initial guess is not good, the solution
sequence will be zigzag and converge very slowly, as shown in Figure
15.
Figure 15 Comparison of different initial guesses for the steepest descent algorithm while solving an
optimization problem.
21
Lecture Notes for MBE 2036 Jia Pan
( ) = ( )+ ( ) ( )+ ( ) ( )+
(|| || ), where is the Hessian matrix evaluated at point .
At the optimal point ( ) = 0, and thus ( ) + ( ) = 0. As
a result, we can choose the next point as = ( ).
The Newtons method is more complex than the steepest descent
method, but usually it can converge much faster than the steepest
descent method, as shown in Figure 17.
Figure 16: Comparison between the steepest descent method and Newtons method while maximizing
( , )= + 5 log(1 + ) , where the black curve is the steepest descent, and the Newtons
method is the blue curve.
22
Lecture Notes for MBE 2036 Jia Pan
23
Lecture Notes for MBE 2036 Jia Pan
Figure 17: Polynomials with different orders M, shown as red curves, fitted to the data set shown as blue
dots.
24
Lecture Notes for MBE 2036 Jia Pan
= .
25
Lecture Notes for MBE 2036 Jia Pan
( )= ( ( , ) )
1
= 1
1
1
Let = , = 1 , the above function can
1
be reformulated as ( ) = ( ) ( ). The optimal
needs to satisfy ( ) = 0, which results in = ( ) .
26
Lecture Notes for MBE 2036 Jia Pan
Similarly, = ( ) sin( ) .
27
Lecture Notes for MBE 2036 Jia Pan
1, < < 0
Try by yourself: ( ) = , where = 24 . What is
0, 0< <
28