Professional Documents
Culture Documents
Objective function
Minimize
go(x) , x = {x1 , x2 , ... , xn}
Subject to
gi(x) Gi , i = 1..m
x2 g (x) = G
2 2
g (x) = G1
1
x1
Definitions
Global minimum
f(x)
f(x)
x
x
f(x)
x* x
Existence of the solution
Weierstrass’ theorem
A closed set: Lj xj Uj
An open set: Lj < xj < Uj
f(x) = -1/x.
f(x)
Necessary condition:
f’(x) = 0
Sufficient condition:
f’’(x) > 0
This will identify only local optima.
For general functions, there are no
conditions that will ensure a global
optimum.
x
Optimality conditions
Multi-dimensional problems without constraints
Necessary condition:
∂f
= 0, for j = 1.. n
∂xj
x
Convexity
When can we be sure to find an optimum?
x
Convexity (cont’d)
In two dimensions
In 2D, a convex function forms a surface that “curves upwards”
everywhere.
Convex Non-convex
Convex sets
For a convex set, S, the following is true:
For any pair of points, (P,Q) belonging to S, a straight line
connecting P and Q will be completely contained in S.
This applies in any number of dimensions.
P Q P Q
Convex Non-convex
Convex optimization problems
If the objective function is convex, and the feasible domain is a convex set,
then the optimization problem is convex.
If all the constraint functions are convex, then the feasible domain is
convex.
Convex optimization problems have only one optimum - the global one.
This is very algorithmically convenient. If we have found a stationary point,
then we know that it is the global solution. The necessary conditions are
also sufficient.
There are no good algorithms for treatment of non-convex problems. Most
algorithms assume that the problem is convex. Many problems are not, so
beware!
It is usually very difficult to check if a function is convex. If the function is
implicit, then it is impossible. A good understanding of the physical nature
of the problem is usually very helpful.
Linear problems are always convex.
Optimization algorithms
We shall develop an algorithm for general constrained
optimization in multiple dimensions.
f(x)
A B x
Golden Section Search (cont’d)
- pruning
f(x)
We don’t know the graph of
the function, but based on
the function values in and
, and the assumption that
we have only one minimum
in the interval, we can
deduce that the minimum
cannot be to the right of .
A x So we prune that part.
B
Golden Section Search (cont’d)
- development
f(x)
We could continue like this,
pruning the interval until it
gets small enough.
We would have to compute
two new function values for
each pruning.
Is there a way to save some
of these function
x evaluations?
A B
Golden Section Search (cont’d)
- re-using function values
(k+1) (k+1) τ 2 + τ − 1= 0
I(k+1) = I(k)
1
I(k+1) = τ = 0.618 = = The Golden Section
2 (k)
I
1618
.
(k+2)
The Golden Section
- the idea
f(x)
If we have gradient
information, then we know to
which side of a computed
function value, the function
decreases.
In that case, we can cut the
interval in half each time and
obtain faster convergence.
A B x
The bisection method
- pros and cons
Not advisable for use in general algorithms, but very useful for special
applications.
Unconstrained minimization
- in multiple dimensions
x2
Choose a search direction, d (k)
Minimize along the search
direction (by golden section).
Step = (k) d(k).
Repeat until convergence
x ( k + 1)
= x (k )
+α (k )
d (k )
x1
Choice of direction
Steepest descent.
The obvious choice when
x2
minimizing a function is to choose
the path that goes as much
downhill as possible. This
algorithm is known as “steepest
descent”.
Some people call this type of
algorithm “greedy”.
x1 In real life, being greedy is often
only profitable in the short term.
d (k )
=−
∇f
Steepest descent
Zig-zagging by nature
It is possible to show mathematically, that
each new direction in steepest descent is
perpendicular to the previous one.
x2
This means that the algorithm
approaches the optimum using only very
few directions.
In 2-D, only two different directions are
used.
The steps in each direction tend to get
smaller for each iteration.
They may become so small that the
algorithm thinks too soon that it has
x1 converged.
In any case, convergence can be very
slow.
Steepest descent
- may work well or terribly
On problems with “similar scales” in the different variable directions,
steepest descent often works well.
If the level curves are circular, then the optimum is found in the first
chosen direction.
If the level curves are “longish”, then the algorithm typically requires many
iterations.
The conjugate gradient method
Evening the zig-zags
d ( k ) = − ∇ f ( x ( k ) ) + β ( k ) d ( k − 1)
2
∇ f (x ) (k )
β (k )
=
∇ f ( x ( k − 1) )
x1
The conjugate gradient method
Why it works
Steepest descent
Gradient zero
d ( k ) = − ∇ f ( x ( k ) ) + β ( k ) d ( k − 1)
x2
2
∇ f (x ) (k )
β (k )
= Correction
∇ f ( x ( k − 1) )
We know that the gradient vanishes at the
optimum. This means, that if the process is going
well, then the gradient gets smaller for each
iteration. If this is true, then (k) is a small number,
and we don’t get much correction from the
x1 steepest descent method.
If the gradient does not get smaller, then we need
more correction, and this is precisely what we get.
Penalty methods
- a poor man’s approach to constrained optimization
There is a very easy way of making sure that an optimization process stays
within a defined set of constraints: tax eventual violations.
This is the basis of the so-called penalty methods.
They are also called “transformation methods” because they replace the
original constrained problem with an equivalent one without constraints.
The transformed problem can then be solved by an unconstrained
algorithm.
Penalty methods
- basic idea
Consider the problem to the left.
We want to minimize F(x) provided
g1 and g2 are negative.
Golden section would solve this
right away, but for the sake of the
argument, let us just assume that
we cannot impose the constraints.
Instead, we can penalize them.
Minimize f (x)
subject to gi (x) ≤ 0
is converted to
Minimize φ (x) = f (x) + r ⋅ P(g(x))
Penalty methods
- penalization
[max(0, g (x))]
m
2
φ ( x) = f ( x) + r i
i =1
Optimum
Exterior penalty
- example
Penalized problems, r
=0.1 and r = 1.0.
The optimum
approaches the
solution to the original
problem but never
reaches it completely.
The level curves get
sharper edges and
the problem becomes
more difficult to solve
numerically.
Exterior penalty
- properties
SLP
SQP
Method of Feasible Direction
Gradient Projection method