PKL Seminar

Runtime Analysis of Evolutionary Algorithms
Per Kristian Lehre

School of Computer Science
University of Nottingham
Functional Programming Laboratory Seminar

Nottingham, November 25th 2011
Black Box Optimisation

f (x1 ), f (x2 ), f (x3 ), ..., f (xt )
x1 , x2 , x3 , ..., xt
Function class F
f :X R
Photo: E. Gerhard (1846).
[Droste et al., 2006]
Evolutionary Algorithms
Selection
Variation
Meta-heuristics in Operations Research
Meta-heuristics are practical optimisation-techniques

I a pragmatic approach to NP-hard optimisation problems
I easy to implement, adaptable to many problem domains
I often produces solutions of strikingly good quality
A Theory of Meta-heuristics?
Weak theoretical foundation has been a major critique

I
lack of performance guarantees
impact of parameter settings poorly understood
no deep understanding of how and why they work
Largely ignored by the general theory community

I
not real algorithms
no guarantees about performance
mathematically very challenging
An Important Challenge...
Developing the mathematical methodology for

explaining and predicting the performance of these and
other heuristics is one of the most important challenges
facing the fields of optimization and algorithms today.
Papadimitriou and Steiglitz (1998)
Outline
Introduction
Runtime Analysis
Basic Definitions
Overview of Results
Evolutionary Algorithms
Exploration vs Exploitation
Analytical Techniques
Directions for Further Work
Heuristic Understanding
Systems to Build Systems
Conclusion
Runtime Analysis of Meta-heuristics
General Practical Question

I
Under what conditions will a given heuristic return

solutions of acceptable quality within reasonable time?
Runtime Analysis of Meta-heuristics
General Practical Question

I
Under what conditions will a given heuristic return

solutions of acceptable quality within reasonable time?
Theoretical Approach
I The runtime of a heuristic on a problem is
I
iterations until optimal (or approximate) solution found
Analysis of how the runtime depends on

1. problem characteristics
2. parameters of the heuristic
Meta-heuristics are Randomised Algorithms
Density
0e+00
4e04
8e04
(1+1) EA on Easy FSM instance (n=200).
1000
1500
2000
2500
3000
3500
4000
Number of iterations.
Definition
The expected runtime of an algorithm A on function class F is
TA,F := max E [TA,f ]
f F
where TA,f is number of f -evaluations before optimum found.
Expected runtime as a function of problem instance size
8e+06
4e+06
0e+00
Iterations of RS.
RS on Easy FSM instance class.
4 5 6 7 8 9
11
13
15
17
19
21
23
25
Number of states in FSM (n).
Exponential = Algorithm impractical on problem.
Expected runtime as a function of problem instance size
10000
6000
0
2000
Iterations of (1+1) EA.
(1+1) EA on Easy FSM instance class.
10
60
160
260
360
460
560
660
Number of states in FSM (n).
I
I
Exponential = Algorithm impractical on problem.

Polynomial = Possibly efficient algorithm.
760
Analytical Tool Box

I
Artificial Fitness Levels

[Wegener and Witt, 2005, Lehre, 2011a]
Concentration of measure
[Dubhashi and Panconesi, 2009]
Typical Runs
Expected Multiplicative Weight Decrease

[Neumann and Wegener, 2007]
Drift Analysis [Hajek, 1982]
Branching Processes [Lehre and Yao, 2009]
Electrical Resistive Networks

[Lehre and Haddow, 2006]
Yaos Minimax Principle

[Motwani and Raghavan, 1995]
(1+1) EA
(1+) EA
(+1) EA
1-ANT
(+1) IA
(1+1) EA
O(n log n)
O(n + n log n)
O(n + n log n)
O(n2 ) w.h.p.
O(n + n log n)
(n log n)
MST
cGA
(1+1) EA
(1+1) EA
(1+1) EA
MO (1+1) EA
(1+1) EA
(n2+ ), > 0 const.

e(n) , PRAS
(n2 log n)
O(n3 log(nwmax ))
O(n3 )
(m2 log(nwmax ))
Max. Clique
(rand. planar)
Eulerian Cycle
MinCut
Partition
Vertex Cover
(1+) EA
1-ANT
(1+1) EA
(16n+1) RLS
(1+1) EA
ACO with h
= O(1)
`
(1+1) EA
(1+1) EA
O(n log(nwmax )), = d m

e
n
O(mn log(nwmax ))
(n5 )
(n5/3 )
(m2 log m)
O(n2(h/`) )
PRAS, avg.
e(n) , arb. bad approx.
(1+1) EA
SEMO
(1+1) EA
e(n) , arb. bad approx.

Pol. O(log n)-approx.
m(k)
(1+1) EA edge-exch.
(1+1) EA
O(215k log k )
1/p-approximation in
O(|E|p+2 log(|E|wmax ))
e(n)
OneMax
Linear Functions
Max. Matching
Sorting
SS Shortest Path
Set Cover
MaxLeafST
(k leaves)
Intersection of
p 3 matroids
UIO/FSM conf.
(1+1) EA
[M
uhlenbein, 1992]
[Jansen et al., 2005]
[Witt, 2006]
[Neumann and Witt, 2006]
[Zarges, 2009]
[Droste et al., 2002] and
[He and Yao, 2003]
[Droste, 2006]
[Giel and Wegener, 2003]
[Scharnow et al., 2002]
[Baswana et al., 2009]
[Scharnow et al., 2002]
2

[Neumann and Witt, 2008]
[Storch, 2006]
[Storch, 2006]
[Doerr et al., 2007]
[K
otzing et al., 2010]
[Witt, 2005]
[Friedrich et al., 2007] and
[Oliveto et al., 2007]
[Friedrich et al., 2007]
[Friedrich et al., 2007]
[Kratsch et al., 2011]
[Reichel and Skutella, 2008]

[Lehre and Yao, 2007]
What about population-based EAs?
What about population-based EAs?
Given
the ma
themat
the infi
ical diffi
nite po
culty of
p
ula
doubt t
hat a m tion size mod
el, we
ath
of finite
populat ematical analy
sis
ions
[Mu
hlenbe
in, 1997 will be possib
le.
]
Exploration vs Exploitation...
Selection
Variation
Evolutionary Algorithm maxx{0,1}n f (x)
x
Pt
for t = 0, 1, 2, . . . until termination condition do

for i = 1 to do
Sample i-th parent x according to psel (Pt , f )
Sample i-th offspring Pt+1 (i) according to pmut (x)
end for
end for
Selection and Variation
Tournament Selection
10001110111011
Bitwise Mutation
10010110111001
How large tournament size k?
k = 1 No selective pressure
I
Unbiased random walk
Efficient optimisation is impossible
k = Highest selective pressure

I
Only fittest individual reproduced
No population diversity

Example
The runtime T of a non-elitist EA with
Runtime
7
poly
tournament size k
bit-wise mutation rate p
population size > log(nr)
on any unimodal function with

k
n Boolean variables
r distinct function values
has expected value

(
e(n)
E [T ] =
O(2 r + nr)
exp
2
n
if k < epn
if k > epn
Lehre (PPSN10), Lehre (GECCO11)

Example
The runtime T of a non-elitist EA with
Runtime
7
poly
tournament size k
bit-wise mutation rate p
k > e pn
I
population size > log(nr)
on any unimodal function with
n Boolean variables
r distinct function values
has expected value

(
e(n)
E [T ] =
O(2 r + nr)
exp
2
n
if k < epn
if k > epn
Lehre (PPSN10), Lehre (GECCO11)
How close to the global optimum in polynomial time?
Theorem ([Lehre, 2011b])

W.o.p., the Hamming-distance from any individual in the first ecn
generations to the global optimum is at least
s

!
1
ln k
ln k
n
2
.
2
4pn
pn
Other Example Applications

Expected runtime of EA with bit-wise mutation rate /n
Selection Mechanism
High Selective Pressure
Fitness Proportionate
Linear Ranking
k-Tournament
(, )
Cellular EAs
> fmax ln(2e )

> e
k > e
> e
OneMax
LeadingOnes
Linear Functions
r-Unimodal
Jumpr
O(n2 )
O(n2 + n2 )
O(n2 + n2 )
O(r2 + nr)
O(n2 + (n/)r )
Other Example Applications

Expected runtime of EA with bit-wise mutation rate /n
Selection Mechanism
High Selective Pressure
Low Selective Pressure
Fitness Proportionate
Linear Ranking
k-Tournament
(, )
Cellular EAs
> fmax ln(2e )

> e
k > e
> e
< / ln 2 and n3
< e
k < e
< e
(G) < e
OneMax
LeadingOnes
Linear Functions
r-Unimodal
Jumpr
O(n2 )
O(n2 + n2 )
O(n2 + n2 )
O(r2 + nr)
O(n2 + (n/)r )
e(n)
e(n)
e(n)
e(n)
e(n)
Markov Chain Analysis often Difficult
Drift Analysis: Long-term behaviour of X from
Theorem (Positive drift)
Theorem (Negative drift)
If exists > 0 st for all t 0
If exists > 0 st for all t 0
E [t | 0 < g(Xt )]
E [t | 0 < g(Xt ) < s]
...
...
then E [T ] smax /.
[Hajek, 1982, Oliveto and Witt, 2010, He and Yao, 2001]
then E [T ] ecs
State Aggregation Problem
How to come up with an appropriate distance function g?

I
Should reflect the progress of the heuristic, given its state.
Often hard to find for single-individual heuristics.
Highly non-trivial to find for population-based heuristics.
A new Approach for Finite Populations
Fitness Levels (upper bounds)

I
Drift analysis
A new Approach for Finite Populations
Fitness Levels (upper bounds)

I
Drift analysis
Population Drift (lower bounds)

I
Branching processes
Drift analysis
Population Drift
Central Parameters
I
Reproductive rate
0 = max E [#offspring from parent j],
1j
Drift of variation operator

Xt+1 pmut (Xt )
mut = g(Xt ) g(Xt+1 )
Population Drift
Central Parameters
I
Reproductive rate
1j

Xt+1 pmut (Xt )
Population Drift
Central Parameters
I
Reproductive rate
1j

Xt+1 pmut (Xt )
Population Drift
Central Parameters
I
Reproductive rate
1j

Xt+1 pmut (Xt )
Population Drift:1 Decoupling Selection & Variation

Population drift [Lehre, 2011b]
If there exists a > 0 such that
Mmut () < 1/0
where
Xt+1 pmut (Xt )
and
j
then the runtime is exponential.
This slide only shows the main conditions of the theorems.
Population Drift:1 Decoupling Selection & Variation

Population drift [Lehre, 2011b]
Classical drift [Hajek, 1982]

Mmut () < 1/0
M () < 1
where
where
= h(Pt ) h(Pt+1 ),
Xt+1 pmut (Xt )

and
j
This slide only shows the main conditions of the theorems.
Population Drift - Implications
Mmut () <
1
= Inefficient algorithm
0
High negative drift induced by the variation operator,

must be compensated with high reproductive rate.
Analysis of algorithm can be decoupled into analyses of

I
I
the drift of the variation operator mut

the reproductive rate of the selection mechanism 0
Feasible to analyse highly complex processes!
Proof Idea
Population drift as a multi-type branching process.
The Perron root of the mean matrix satisfies

< 0 :
(M ) 0 Mmut ().
The process becomes extinct if (M ) < 1.
Heuristics
Explicit Problem
Structure
Runtime
Analysis
Performance
Guarantees
Heuristic
Understanding
Problem
Insight
Heuristics
Problem
Characterisation
Runtime
Analysis
Industrial
Problems
Performance
Guarantees
Heuristic
Understanding
Design
Guidelines
Combine Runtime Analysis with Fitness Landscape Theory

I
Heuristic Understanding cluster
Systems to Build Systems cluster
Conclusion
I
I
Strong empirical evidence for the benefit of meta-heuristics

Recently, it has become possible to prove mathematical
statements that describe the relationship between
a) Problem characteristics & Parameters of the meta-heuristic
b) The expected runtime of the meta-heuristic
Progress made possible by appropriate analytical tools

I
Population drift theorem
Questions?
Contact Details
I
PerKristian.Lehre@nottingham.ac.uk
http://www.cs.nott.ac.uk/pkl
Selected References
I
Lehre & Yao, IEEE TEC (2011)
Lehre, PPSN (2010)
Lehre, GECCO (2011)
Lehre & Witt, GECCO (2010)
References I
Baswana, S., Biswas, S., Doerr, B., Friedrich, T., Kurur, P. P., and Neumann, F.
(2009).
Computing single source shortest paths using single-objective fitness.
In FOGA 09: Proceedings of the tenth ACM SIGEVO workshop on Foundations
of genetic algorithms, pages 5966, New York, NY, USA. ACM.
Doerr, B., Klein, C., and Storch, T. (2007).
Faster evolutionary algorithms by superior graph representation.
In Proceedings of the 1st IEEE Symposium on Foundations of Computational
Intelligence (FOCI 2007), pages 245250.
Droste, S. (2006).
A rigorous analysis of the compact genetic algorithm for linear functions.
Natural Computing, 5(3):257283.
Droste, S., Jansen, T., and Wegener, I. (2002).
On the analysis of the (1+1) Evolutionary Algorithm.
Theoretical Computer Science, 276:5181.
Droste, S., Jansen, T., and Wegener, I. (2006).
Upper and lower bounds for randomized search heuristics in black-box
optimization.
Theory of Computing Systems, 39(4):525544.
References II
Dubhashi, D. and Panconesi, A. (2009).
Concentration of Measure for the Analysis of Randomized Algorithms.
Cambridge University Press.
Friedrich, T., Hebbinghaus, N., Neumann, F., He, J., and Witt, C. (2007).
Approximating covering problems by randomized search heuristics using
multi-objective models.
In Proceedings of the 9th annual conference on Genetic and evolutionary
computation (GECCO 2007), pages 797804, New York, NY, USA. ACM Press.
Giel, O. and Wegener, I. (2003).
Evolutionary algorithms and the maximum matching problem.
In Proceedings of the 20th Annual Symposium on Theoretical Aspects of
Computer Science (STACS 2003), pages 415426.
Hajek, B. (1982).
Hitting-time and occupation-time bounds implied by drift analysis with
applications.
Advances in Applied Probability, 14(3):502525.
He, J. and Yao, X. (2001).
Drift analysis and average time complexity of evolutionary algorithms.
Artificial Intelligence, 127(1):5785.
References III
He, J. and Yao, X. (2003).
Towards an analytic framework for analysing the computation time of
evolutionary algorithms.
Artificial Intelligence, 145(1-2):5997.
Jansen, T., Jong, K. A. D., and Wegener, I. (2005).
On the choice of the offspring population size in evolutionary algorithms.
Evolutionary Computation, 13(4):413440.
K
otzing, T., Lehre, P. K., Neumann, F., and Oliveto, P. S. (2010).
Ant colony optimization and the minimum cut problem.
computation (GECCO 2010), pages 13931400, New York, NY, USA. ACM.
Kratsch, S., Lehre, P. K., Neumann, F., and Oliveto, P. S. (2011).
Fixed parameter evolutionary algorithms and maximum leaf spanning trees: A
matter of mutations.
In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume
6238 of LNCS, pages 204213. Springer Berlin / Heidelberg.
Lehre, P. K. (2011a).
Fitness-levels for non-elitist populations.
To appear in Proceedings of 2011 Genetic and Evolutionary Computation
Conference (GECCO 2011).
References IV
Lehre, P. K. (2011b).
Negative drift in populations.
In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume
6238 of LNCS, pages 244253. Springer Berlin / Heidelberg.
Lehre, P. K. and Haddow, P. C. (2006).
Accessibility and runtime between convex neutral networks.
In Wang, T.-D., Li, X., Chen, S.-H., Wang, X., Abbass, H. A., Iba, H., Chen, G.,
and Yao, X., editors, SEAL, volume 4247 of Lecture Notes in Computer Science,
pages 734741. Springer.
Lehre, P. K. and Yao, X. (2007).
Runtime analysis of (1+1) EA on computing unique input output sequences.
In Proceedings of 2007 IEEE Congress on Evolutionary Computation
(CEC 2007), pages 18821889. IEEE Press.
Lehre, P. K. and Yao, X. (2009).
On the impact of the mutation-selection balance on the runtime of evolutionary
algorithms.
In Proceedings of the tenth ACM SIGEVO workshop on Foundations of genetic
algorithms (FOGA 2009), pages 4758, New York, NY, USA. ACM.
Motwani, R. and Raghavan, P. (1995).
Randomized Algorithms.
Cambridge University Press.
References V
M
uhlenbein, H. (1992).
How genetic algorithms really work I. Mutation and Hillclimbing.
In Proceedings of the Parallel Problem Solving from Nature 2, (PPSN-II), pages
1526. Elsevier.
M
uhlenbein, H. (1997).
The equation for response to selection and its use for prediction.
Evoluationary Computation, 5(3):303346.
Neumann, F. and Wegener, I. (2007).
Randomized local search, evolutionary algorithms, and the minimum spanning
tree problem.
Theoretical Computer Science, 378(1):3240.
Neumann, F. and Witt, C. (2006).
Runtime analysis of a simple ant colony optimization algorithm.
In Proceedings of The 17th International Symposium on Algorithms and
Computation (ISAAC 2006), number 4288 in LNCS, pages 618627.
Neumann, F. and Witt, C. (2008).
Ant colony optimization and the minimum spanning tree problem.
In Proceedings of Learning and Intelligent Optimization (LION 2008), pages
153166.
References VI
Oliveto, P. and Witt, C. (2010).
Simplified drift analysis for proving lower bounds inevolutionary computation.
Algorithmica, pages 118.
10.1007/s00453-010-9387-z.
Oliveto, P. S., He, J., and Yao, X. (2007).
Evolutionary algorithms and the vertex cover problem.
In In Proceedings of the IEEE Congress on Evolutionary Computation
(CEC 2007).
Reichel, J. and Skutella, M. (2008).
Evolutionary algorithms and matroid optimization problems.
Algorithmica.
Scharnow, J., Tinnefeld, K., and Wegener, I. (2002).
Fitness landscapes based on sorting and shortest paths problems.
In Proceedings of 7th Conf. on Parallel Problem Solving from Nature
(PPSNVII), number 2439 in LNCS, pages 5463.
Storch, T. (2006).
How randomized search heuristics find maximum cliques in planar graphs.
computation (GECCO 2006), pages 567574, New York, NY, USA. ACM Press.
References VII
Wegener, I. and Witt, C. (2005).
On the analysis of a simple evolutionary algorithm on quadratic pseudo-boolean
functions.
Journal of Discrete Algorithms, 3(1):6178.
Witt, C. (2005).
Worst-case and average-case approximations by simple randomized search
heuristics.
In In Proceedings of the 22nd Annual Symposium on Theoretical Aspects of
Computer Science (STACS 05), number 3404 in LNCS, pages 4456.
Witt, C. (2006).
Runtime Analysis of the ( + 1) EA on Simple Pseudo-Boolean Functions.
Evolutionary Computation, 14(1):6586.
Zarges, C. (2009).
On the utility of the population size for inversely fitness proportional mutation
rates.
In FOGA 09: Proceedings of the tenth ACM SIGEVO workshop on Foundations
of genetic algorithms, pages 3946, New York, NY, USA. ACM.
Drift Analysis - Upper bounds
g(Xt+1 )
g(Xt )
Theorem ([He and Yao, 2001])

Given a stochastic process Xt0 and g : R+
0.
Define T to be the first time t such that g(Xt ) = 0.
If there exists a constant D > 0 such that t 0
1. Pr [g(Xt ) < B] = 1, and
2. E [g(Xt ) g(Xt+1 ) | T > t] D,
then
E [T ]
B
.
D
Negative Drift for Populations

Theorem ([Lehre, 2011b])
Given the Population Selection-Variation Algorithm with
I transition matrix pmut over search space and distance function g : N+
I runtime T := min{t 0 | g(P0 ) b and g(Pt ) < a}, b a = (n)
if there exists a > 0, and constants 0 , 1 , 2 > 0, st for all t 0,
1.
2.
3.
4.
5.
E [Rt (i) | a < g(Pt (i)) < b] 0 for all i, 1 i ,

h
i
1
E et (i) | a < g(Xt ) < b <
0 (1 + 1 )
h
i
E e(g(Xt+1 )b) | g(Xt ) b = O(1)
Pr [t (i) = ` t+1 (i `) = k]
e(ba)(12 )
Pr [t (i) = ` k]
Pr [t (i) = j]
1 ` + k j,
= O(1)
Pr [t (i k) = `]
1 ` + k j,
where
P
I Rt (i) :=
j=1 [It (j) = i] is the number of offspring from individual i
I Xt0 is the Markov process on associated with pmut , and
I t (i) := (g(Xt+1 ) g(Xt ) | g(Xt ) = i),
then the runtime satisfies Pr [T ecn ] = e(n) , for some constant c > 0.

PKL Seminar

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PKL Seminar

Uploaded by

Copyright:

Available Formats

Runtime Analysis of Evolutionary Algorithms

Per Kristian Lehre

Functional Programming Laboratory Seminar

Black Box Optimisation

[Droste et al., 2006]

Meta-heuristics in Operations Research

Meta-heuristics are practical optimisation-techniques

Weak theoretical foundation has been a major critique

lack of performance guarantees

impact of parameter settings poorly understood

no deep understanding of how and why they work

Largely ignored by the general theory community

not real algorithms

no guarantees about performance

mathematically very challenging

Developing the mathematical methodology for

Runtime Analysis of Meta-heuristics

General Practical Question

Under what conditions will a given heuristic return

Runtime Analysis of Meta-heuristics

General Practical Question

Under what conditions will a given heuristic return

iterations until optimal (or approximate) solution found

Analysis of how the runtime depends on

Meta-heuristics are Randomised Algorithms

(1+1) EA on Easy FSM instance (n=200).

where TA,f is number of f -evaluations before optimum found.

Expected runtime as a function of problem instance size

RS on Easy FSM instance class.

Number of states in FSM (n).

Exponential = Algorithm impractical on problem.

Expected runtime as a function of problem instance size

Iterations of (1+1) EA.

(1+1) EA on Easy FSM instance class.

Number of states in FSM (n).

Exponential = Algorithm impractical on problem.

Analytical Tool Box

Artificial Fitness Levels

Expected Multiplicative Weight Decrease

Drift Analysis [Hajek, 1982]

Branching Processes [Lehre and Yao, 2009]

Electrical Resistive Networks

Yaos Minimax Principle

(n2+ ), > 0 const.

O(n log(nwmax )), = d m

e(n) , arb. bad approx.

[Neumann and Wegener, 2007]

[Reichel and Skutella, 2008]

What about population-based EAs?

What about population-based EAs?

Evolutionary Algorithm maxx{0,1}n f (x)

for t = 0, 1, 2, . . . until termination condition do

Selection and Variation

How large tournament size k?

Unbiased random walk

Efficient optimisation is impossible

k = Highest selective pressure

Only fittest individual reproduced

How large tournament size k?

bit-wise mutation rate p

population size > log(nr)

on any unimodal function with

r distinct function values