Basic Optimization Review

The Books
Arora’s text is the main book of the course. It is

a very practical guide to optimization for
engineering applications. It contains enough
theory to explain what is going on, but not more
than that.
For people wishing to develop their own

optimization software, the best reference is: G.N.
Vanderplaats: “Numerical Optimization
Techniques for Engineering Design”. It focuses
entirely on algorithms.
Haftka, Gürdal & Kamat: “Elements of Structural

Optimization” has both theory and applications,
but is mostly directed towards structures.
Various papers and notes will be available from

http://www.ime.auc.dk/~no for download.
The Books (cont’d)
The famous book “Numerical
Recipes” by Press et.al. has been
released on the internet at the
address http://www.nr.com/. It
offers many algorithms that are
useful for optimization purposes.
Matematical programming
The problem below is solvable if you have a model, typically
a computer model, that can compute functions g for given
values of vector x
Objective function
Minimize
go(x) , x = {x1 , x2 , ... , xn}
Subject to
gi(x) Gi , i = 1..m
The design space

As many constraints as necessary
Optimization
- graphical interpretation
x2 g (x) = G
2 2
Unconstrained Constrained Well-posed design

optimum optimum problems can have
many constraints but
never more than one
Feasible objective!
domain
g (x) = G1
1
x1
Definitions
Global minimum
f(x)
f(x)
x
x
Global minimum: f(x*) Strict global minimum:

f(x) for all x. f(x*) < f(x) for all x.
We can have many global
minima, but they must all
be equally good.
Definitions
Local minimum
f(x)
Local minimum: f(x*)

f(x+ ) where is a
small number.
x* x
Existence of the solution
Weierstrass’ theorem
If f(x) is continuous in a closed and limited set, S,

then f has a global minimum in S.
This sounds trivial, but it is important. It gives us
hope. We know there is something to look for.
It is also not as simple as it sounds. Remember that
the set must be closed. This can lead to many
strange problems in practical optimization.
Topology optimization is a good example.
Definitions
Open and closed sets
A closed set: Lj xj Uj
An open set: Lj < xj < Uj
f(x) = -1/x.
If 0 < x 2 then the interval is

open
If 0 x 2 then the interval is

closed, but f is undefined for
x=0.
Optimality conditions
1-D problems
f(x)
Necessary condition:
f’(x) = 0
Sufficient condition:
f’’(x) > 0
This will identify only local optima.
For general functions, there are no
conditions that will ensure a global
optimum.
x
Optimality conditions
Multi-dimensional problems without constraints
Necessary condition:
∂f
= 0, for j = 1.. n
∂xj
x2 Sufficient condition: The Hessian

∂ 2f ∂ 2f ∂ 2f
. .
∂ x12 ∂ x1 ∂ x 2 ∂ x1 ∂ x n
∂ 2f
∂ x 22
H=
.
x1 .
∂ 2f
symm.
∂ x n2
must be positive definite.
The necessary condition(s)
- identify stationary points. When is a stationary point not
the solution we are looking for?
f(x)
When the stationary
point is a maximum.
When the stationary
point is a saddle point.
When the stationary
point is a local optimum
x
Convexity
When can we be sure to find an optimum?
A convex function is one

f(x) that has positive definite
Hessian everywhere.
A convex function has only
one minimum - the global
one.
In 1D a convex function is
one that everywhere has
Convex positive second derivative.
x
Convexity (cont’d)
In two dimensions
In 2D, a convex function forms a surface that “curves upwards”
everywhere.
Convex Non-convex
Convex sets
For a convex set, S, the following is true:
For any pair of points, (P,Q) belonging to S, a straight line
connecting P and Q will be completely contained in S.
This applies in any number of dimensions.
P Q P Q
Convex Non-convex
Convex optimization problems
If the objective function is convex, and the feasible domain is a convex set,
then the optimization problem is convex.
If all the constraint functions are convex, then the feasible domain is
convex.
Convex optimization problems have only one optimum - the global one.
This is very algorithmically convenient. If we have found a stationary point,
then we know that it is the global solution. The necessary conditions are
also sufficient.
There are no good algorithms for treatment of non-convex problems. Most
algorithms assume that the problem is convex. Many problems are not, so
beware!
It is usually very difficult to check if a function is convex. If the function is
implicit, then it is impossible. A good understanding of the physical nature
of the problem is usually very helpful.
Linear problems are always convex.
Optimization algorithms
We shall develop an algorithm for general constrained
optimization in multiple dimensions.
Algorithms for 1-D minimization
Unconstrained problems in multiple dimensions
Constrained problems in multiple dimensions
Today, we develop and implement a 1-D algorithm. It is important that you

finish the work for every lecture. The algorithm of each new lecture is built on
top of the previous.
1D algorithms
Categoriztion by order
Golden section search: 0th order

Bisection: 1st order
Brent, polynomial interpolation: 2nd order
1-D minimization problem
- definition
We assume that a function
f(x) f(x) is given, and we want to
find its minimum in the
interval [A,B].
We also assume that is is
expensive to compute f(x).
So we must find the
minimum with the least
possible number of function
evaluations.
A x The function is implicit - we
B
don’t know what the graph
looks like.
Golden Section Search
-a 0th order algorithm
f(x) The idea behind golden

section is to successively
prune the interval for parts
that do no contain the
minimum.
This way, the remaining
interval shrinks until it is so
small that we have
determined the location of
the minimum with sufficient
A B x accuracy.
Golden Section Search
- computing function values
f(x)
It turns out that, if we don’t

have gradient information,
then we need two function
evaluations before we can
identify an interval that does
not contain the minimum.
A B x
Golden Section Search (cont’d)
- pruning
f(x)
We don’t know the graph of
the function, but based on
the function values in and
, and the assumption that
we have only one minimum
in the interval, we can
deduce that the minimum
cannot be to the right of .
A x So we prune that part.
B
- development
f(x)
We could continue like this,
pruning the interval until it
gets small enough.
We would have to compute
two new function values for
each pruning.
Is there a way to save some
of these function
x evaluations?
A B
- re-using function values
If we position and carefully, τ I ( k + 1) = (1 − τ ) I ( k )

then we can make sure that the I ( k + 1) = τ I ( k )
of one iteration becomes the of
the next and vice versa.
I (k) (k) (k) τ 2I (k ) = I (k ) − τI (k )
(k+1) (k+1) τ 2 + τ − 1= 0
I(k+1) = I(k)
1
I(k+1) = τ = 0.618 = = The Golden Section
2 (k)
I
1618
.
(k+2)
The Golden Section
- the idea
The idea of the golden section dates back to Pythagoras and

later Italian mathematician Fibonacchi (1202).
In the middle ages, the golden section became a measure of
aesthetics adobted by many humanists and architects. It
defines a rhythm of shape that pleases the eye.
Composition of paintings, sculptures and buildings was done
according to the golden section.
The golden section played an important role also to the
cubists in the beginning og the 20th century.
Golden section (cont’d)
- the algorithm
Set a search interval, and initialize and . Compute f( )

and f( ).
Choose the right or left interval to prune.
If the right hand interval is removed, set := and f( ):=f( ).
Compute new and f( ).
If the left hand interval is removed, set := and f( ):=f( ).
Compute new and f( ).
If the interval is small enough, stop the algorithm and return
the happy news.
Repeat the iteration.
Golden section (cont’d)
- properties
Each iteration removes 1 - 0.618 = 38% of the
interval.
After n iterations, the interval is reduced to 0.618n
times its original size.
If n is 10, less than 1% of the original interval
remains. If n=15, less than 1‰ remains.
The algorithm is rock-solid stable. It removes a
certain fraction of the interval each time, and it
requires only that the function is unimodal (has one
minimum) in the interval.
The bisection method
- cutting the interval in half
f(x)
If we have gradient
information, then we know to
which side of a computed
function value, the function
decreases.
In that case, we can cut the
interval in half each time and
obtain faster convergence.
A B x
The bisection method
- pros and cons
Faster convergence. After 10 iterations,

about 1‰ of the interval is left.
f(x) We need gradient information. It usually
requires more computation, but it can
sometimes come very cheap. More about
that in a later lecture.
When we rely on gradients, then we also
assume that the function is differentiable.
x Golden section does not have this
A B
requirement.
This method is less robust than golden
section.
Polynomial Interpolation
- fast and delicate like an old sports car
We compute the function

f(x) values in the end points and
a point in the middle.
We fit a parabola through the
three points.
We analytically determine
the minimum of the
parabola.
We let the new point replace
x
the worst of the previous
A B ones and repeat until
convergence.
Polynomial Interpolation
- pros and cons
f(x)
Convergence is very fast if the function

behaves nicely.
No gradients required.
Only one function evaluation for each new
iteration
The algorithm is very sensitive to non-convex
A x
B functions.
The algorithm requires 2nd order
differentiability.
The algorithm may diverge completely.
Not advisable for use in general algorithms, but very useful for special
applications.
Unconstrained minimization
- in multiple dimensions
x2
Choose a search direction, d (k)
Minimize along the search
direction (by golden section).
Step = (k) d(k).
Repeat until convergence
x ( k + 1)
= x (k )
+α (k )
d (k )
x1
Choice of direction
Steepest descent.
The obvious choice when
x2
minimizing a function is to choose
the path that goes as much
downhill as possible. This
algorithm is known as “steepest
descent”.
Some people call this type of
algorithm “greedy”.
x1 In real life, being greedy is often
only profitable in the short term.
∇f This also applies in optimization.
d (k )
=−
∇f
Steepest descent
Zig-zagging by nature
It is possible to show mathematically, that
each new direction in steepest descent is
perpendicular to the previous one.
x2
This means that the algorithm
approaches the optimum using only very
few directions.
In 2-D, only two different directions are
used.
The steps in each direction tend to get
smaller for each iteration.
They may become so small that the
algorithm thinks too soon that it has
x1 converged.
In any case, convergence can be very
slow.
Steepest descent
- may work well or terribly
On problems with “similar scales” in the different variable directions,
steepest descent often works well.
If the level curves are circular, then the optimum is found in the first
chosen direction.
If the level curves are “longish”, then the algorithm typically requires many
iterations.
The conjugate gradient method
Evening the zig-zags
The conjugate gradient method can be seen

as a way of detecting and eliminating zig-
x2 zagging. It also has more subtle
mathematical explanations, but we don’t
have to worry much about those.
The search direction is computed by the
formula:
d ( k ) = − ∇ f ( x ( k ) ) + β ( k ) d ( k − 1)
2
∇ f (x ) (k )
β (k )
=
∇ f ( x ( k − 1) )
x1
The conjugate gradient method
Why it works
Steepest descent
Gradient zero
d ( k ) = − ∇ f ( x ( k ) ) + β ( k ) d ( k − 1)
x2
2
∇ f (x ) (k )
β (k )
= Correction
∇ f ( x ( k − 1) )
We know that the gradient vanishes at the
optimum. This means, that if the process is going
well, then the gradient gets smaller for each
iteration. If this is true, then (k) is a small number,
and we don’t get much correction from the
x1 steepest descent method.
If the gradient does not get smaller, then we need
more correction, and this is precisely what we get.
Penalty methods
- a poor man’s approach to constrained optimization
There is a very easy way of making sure that an optimization process stays
within a defined set of constraints: tax eventual violations.
This is the basis of the so-called penalty methods.
They are also called “transformation methods” because they replace the
original constrained problem with an equivalent one without constraints.
The transformed problem can then be solved by an unconstrained
algorithm.
Penalty methods
- basic idea
Consider the problem to the left.
We want to minimize F(x) provided
g1 and g2 are negative.
Golden section would solve this
right away, but for the sake of the
argument, let us just assume that
we cannot impose the constraints.
Instead, we can penalize them.
Minimize f (x)
subject to gi (x) ≤ 0
is converted to
Minimize φ (x) = f (x) + r ⋅ P(g(x))
Penalty methods
- penalization
So we replace F by a new function,

, which is constructed so that it
increases rapidly when a constraint
is violated.
Minimizing will almost give us the
solution to the original problem.
There are two types of penalization:
- Exterior (a tax)
- Interior (capital punishment)
Exterior penalty
- the mild form
[max(0, g (x))]
m
2
φ ( x) = f ( x) + r i
i =1
This penalty does not come into play until

a constraint has been violated.
The severeness of the penalty depends
on the penalty factor, r.
Small values of r will cause constraint
violations. Large values will make the
problem difficult to solve because the
function gets sharp kinks.
The acceptable r values are problem-
dependent. It is a good idea to make the
functions dimensionless.
Exterior penalty
- examples
Original problem: Linear objective

function and two constraints in two
dimensions.
Optimum
Exterior penalty
- example
Penalized problem, r = 0.05.

Notice that the optimum falls quite far
from the solution to the original
problem.
Exterior penalty
- examples
Penalized problems, r
=0.1 and r = 1.0.
The optimum
approaches the
solution to the original
problem but never
reaches it completely.
The level curves get
sharper edges and
the problem becomes
more difficult to solve
numerically.
Exterior penalty
- properties
A penalty term is added only after constraint violation

The objective function inside the feasible domain is unaffected
The pseudo objective function is defined everywhere the original function
is. We don’t need a feasible point to get started.
The solution always falls slightly outside the feasible domain of the original
problem. Notice that the original problem may be undefined outside the
feasible domain.
Increasing the penalty brings the solution closer to the solution to the real
problem, but it also makes the problem more difficult to solve numerically.
It handles equality- as well as inequality constraints.
Interior penalty
- capital punishment
m
1
φ ( x) = f ( x) + r −
i =1 gi ( x)
This penalty is always present, and it

really kicks in when a constraint is
approached.
The penalty goes to infinity at the
constraint.
The severeness of the penalty depends
on the penalty factor, r.
Small values of r will cause the constraint
to kick in late but suddently as we
approach a constraint.
The penalty is -infinity (!) right outside
the constraint.
Interior penalty
- properties
The penalty is always present.

The pseudo objective function is undefined at the constraints and goes to
infinity outside. If the algorithm happens to violate a constraint, then
chances are that it will never return to the feasible domain.
We need a feasible point to start the algorithm.
The solution always falls slightly inside the feasible domain of the original
problem. This means that all solutions are usable.
Increasing the penalty brings the solution closer to the solution to the real
problem, but it also makes the problem more difficult to solve numerically.
It handles only inequality constraints.
Penalty methods
- properties in general
Penalty methods are “cheap and dirty” solutions to

constrained optimization.
They are problem-dependent and may be difficult to apply.
They are not suitable for general applications, but they may
suffice for special purposes.
The Augmented Lagrangian method is a further development
of penalty methods. It is only slightly more complicated and
it does away with many of the problems of interior and
exterior penalties.
Constrained Nonlinear Optimization
SLP
SQP
Method of Feasible Direction
Gradient Projection method

Basic Optimization Review

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Optimization Review

Uploaded by

Copyright:

Available Formats

The Books

Arora’s text is the main book of the course. It is

For people wishing to develop their own

Haftka, Gürdal & Kamat: “Elements of Structural

Various papers and notes will be available from

The design space

Unconstrained Constrained Well-posed design

Global minimum: f(x*) Strict global minimum:

Local minimum: f(x*)

If f(x) is continuous in a closed and limited set, S,

If 0 < x 2 then the interval is

If 0 x 2 then the interval is

x2 Sufficient condition: The Hessian

A convex function is one

Algorithms for 1-D minimization

Unconstrained problems in multiple dimensions

Constrained problems in multiple dimensions

Today, we develop and implement a 1-D algorithm. It is important that you

Golden section search: 0th order

f(x) The idea behind golden

It turns out that, if we don’t

If we position and carefully, τ I ( k + 1) = (1 − τ ) I ( k )

The idea of the golden section dates back to Pythagoras and

Set a search interval, and initialize and . Compute f( )

Faster convergence. After 10 iterations,

We compute the function

Convergence is very fast if the function

∇f This also applies in optimization.

The conjugate gradient method can be seen

So we replace F by a new function,

This penalty does not come into play until

Original problem: Linear objective

Penalized problem, r = 0.05.

A penalty term is added only after constraint violation

This penalty is always present, and it

The penalty is always present.

Penalty methods are “cheap and dirty” solutions to

You might also like