You are on page 1of 53

Runtime Analysis of Evolutionary Algorithms

Per Kristian Lehre


School of Computer Science
University of Nottingham

Functional Programming Laboratory Seminar


Nottingham, November 25th 2011

Black Box Optimisation


f (x1 ), f (x2 ), f (x3 ), ..., f (xt )

x1 , x2 , x3 , ..., xt

Function class F

f :X R
Photo: E. Gerhard (1846).

[Droste et al., 2006]

Evolutionary Algorithms

Selection

Variation

Meta-heuristics in Operations Research

Meta-heuristics are practical optimisation-techniques


I a pragmatic approach to NP-hard optimisation problems
I easy to implement, adaptable to many problem domains
I often produces solutions of strikingly good quality

A Theory of Meta-heuristics?

Weak theoretical foundation has been a major critique


I

lack of performance guarantees

impact of parameter settings poorly understood

no deep understanding of how and why they work

Largely ignored by the general theory community


I

not real algorithms

no guarantees about performance

mathematically very challenging

An Important Challenge...

Developing the mathematical methodology for


explaining and predicting the performance of these and
other heuristics is one of the most important challenges
facing the fields of optimization and algorithms today.
Papadimitriou and Steiglitz (1998)

Outline
Introduction
Runtime Analysis
Basic Definitions
Overview of Results
Evolutionary Algorithms
Exploration vs Exploitation
Analytical Techniques
Directions for Further Work
Heuristic Understanding
Systems to Build Systems
Conclusion

Runtime Analysis of Meta-heuristics

General Practical Question


I

Under what conditions will a given heuristic return


solutions of acceptable quality within reasonable time?

Runtime Analysis of Meta-heuristics

General Practical Question


I

Under what conditions will a given heuristic return


solutions of acceptable quality within reasonable time?

Theoretical Approach
I The runtime of a heuristic on a problem is
I

iterations until optimal (or approximate) solution found

Analysis of how the runtime depends on


1. problem characteristics
2. parameters of the heuristic

Meta-heuristics are Randomised Algorithms

Density

0e+00

4e04

8e04

(1+1) EA on Easy FSM instance (n=200).

1000

1500

2000

2500

3000

3500

4000

Number of iterations.

Definition
The expected runtime of an algorithm A on function class F is
TA,F := max E [TA,f ]
f F

where TA,f is number of f -evaluations before optimum found.

Expected runtime as a function of problem instance size

8e+06
4e+06
0e+00

Iterations of RS.

RS on Easy FSM instance class.

4 5 6 7 8 9

11

13

15

17

19

21

23

25

Number of states in FSM (n).

Exponential = Algorithm impractical on problem.

Expected runtime as a function of problem instance size

10000
6000
0

2000

Iterations of (1+1) EA.

(1+1) EA on Easy FSM instance class.

10

60

160

260

360

460

560

660

Number of states in FSM (n).

I
I

Exponential = Algorithm impractical on problem.


Polynomial = Possibly efficient algorithm.

760

Analytical Tool Box


I

Artificial Fitness Levels


[Wegener and Witt, 2005, Lehre, 2011a]

Concentration of measure
[Dubhashi and Panconesi, 2009]

Typical Runs

Expected Multiplicative Weight Decrease


[Neumann and Wegener, 2007]

Drift Analysis [Hajek, 1982]

Branching Processes [Lehre and Yao, 2009]

Electrical Resistive Networks


[Lehre and Haddow, 2006]

Yaos Minimax Principle


[Motwani and Raghavan, 1995]

(1+1) EA
(1+) EA
(+1) EA
1-ANT
(+1) IA
(1+1) EA

O(n log n)
O(n + n log n)
O(n + n log n)
O(n2 ) w.h.p.
O(n + n log n)
(n log n)

MST

cGA
(1+1) EA
(1+1) EA
(1+1) EA
MO (1+1) EA
(1+1) EA

(n2+ ), > 0 const.


e(n) , PRAS
(n2 log n)
O(n3 log(nwmax ))
O(n3 )
(m2 log(nwmax ))

Max. Clique
(rand. planar)
Eulerian Cycle
MinCut
Partition
Vertex Cover

(1+) EA
1-ANT
(1+1) EA
(16n+1) RLS
(1+1) EA
ACO with h
= O(1)
`
(1+1) EA
(1+1) EA

O(n log(nwmax )), = d m


e
n
O(mn log(nwmax ))
(n5 )
(n5/3 )
(m2 log m)
O(n2(h/`) )
PRAS, avg.
e(n) , arb. bad approx.

(1+1) EA
SEMO
(1+1) EA

e(n) , arb. bad approx.


Pol. O(log n)-approx.
m(k)

(1+1) EA edge-exch.
(1+1) EA

O(215k log k )
1/p-approximation in
O(|E|p+2 log(|E|wmax ))
e(n)

OneMax

Linear Functions

Max. Matching
Sorting
SS Shortest Path

Set Cover
MaxLeafST
(k leaves)
Intersection of
p 3 matroids
UIO/FSM conf.

(1+1) EA

[M
uhlenbein, 1992]
[Jansen et al., 2005]
[Witt, 2006]
[Neumann and Witt, 2006]
[Zarges, 2009]
[Droste et al., 2002] and
[He and Yao, 2003]
[Droste, 2006]
[Giel and Wegener, 2003]
[Scharnow et al., 2002]
[Baswana et al., 2009]
[Scharnow et al., 2002]
[Neumann and Wegener, 2007]
2

[Neumann and Wegener, 2007]


[Neumann and Witt, 2008]
[Storch, 2006]
[Storch, 2006]
[Doerr et al., 2007]
[K
otzing et al., 2010]
[Witt, 2005]
[Friedrich et al., 2007] and
[Oliveto et al., 2007]
[Friedrich et al., 2007]
[Friedrich et al., 2007]
[Kratsch et al., 2011]

[Reichel and Skutella, 2008]


[Lehre and Yao, 2007]

What about population-based EAs?

What about population-based EAs?

Given
the ma
themat
the infi
ical diffi
nite po
culty of
p
ula
doubt t
hat a m tion size mod
el, we
ath
of finite
populat ematical analy
sis
ions
[Mu
hlenbe
in, 1997 will be possib
le.
]

Exploration vs Exploitation...

Selection

Variation

Exploration vs Exploitation...

Exploration vs Exploitation...

Evolutionary Algorithm maxx{0,1}n f (x)

x
Pt

for t = 0, 1, 2, . . . until termination condition do


for i = 1 to do
Sample i-th parent x according to psel (Pt , f )
Sample i-th offspring Pt+1 (i) according to pmut (x)
end for
end for

Selection and Variation

Tournament Selection

10001110111011

Bitwise Mutation
10010110111001

How large tournament size k?

k = 1 No selective pressure
I

Unbiased random walk

Efficient optimisation is impossible

k = Highest selective pressure


I

Only fittest individual reproduced

No population diversity

How large tournament size k?


Example
The runtime T of a non-elitist EA with

Runtime
7
poly

tournament size k

bit-wise mutation rate p

population size > log(nr)

on any unimodal function with


k

n Boolean variables

r distinct function values

has expected value


(
e(n)
E [T ] =
O(2 r + nr)

exp

2
n

if k < epn
if k > epn

Lehre (PPSN10), Lehre (GECCO11)

How large tournament size k?


Example
The runtime T of a non-elitist EA with

Runtime
7
poly

tournament size k

bit-wise mutation rate p

k > e pn
I

population size > log(nr)

on any unimodal function with

n Boolean variables

r distinct function values

has expected value


(
e(n)
E [T ] =
O(2 r + nr)

exp

2
n

if k < epn
if k > epn

Lehre (PPSN10), Lehre (GECCO11)

How close to the global optimum in polynomial time?

Theorem ([Lehre, 2011b])


W.o.p., the Hamming-distance from any individual in the first ecn
generations to the global optimum is at least
s

!
1
ln k
ln k
n

2
.
2
4pn
pn

Other Example Applications


Expected runtime of EA with bit-wise mutation rate /n
Selection Mechanism

High Selective Pressure

Fitness Proportionate
Linear Ranking
k-Tournament
(, )
Cellular EAs

> fmax ln(2e )


> e
k > e
> e

OneMax
LeadingOnes
Linear Functions
r-Unimodal
Jumpr

O(n2 )
O(n2 + n2 )
O(n2 + n2 )
O(r2 + nr)
O(n2 + (n/)r )

Other Example Applications


Expected runtime of EA with bit-wise mutation rate /n
Selection Mechanism

High Selective Pressure

Low Selective Pressure

Fitness Proportionate
Linear Ranking
k-Tournament
(, )
Cellular EAs

> fmax ln(2e )


> e
k > e
> e

< / ln 2 and n3
< e
k < e
< e
(G) < e

OneMax
LeadingOnes
Linear Functions
r-Unimodal
Jumpr

O(n2 )
O(n2 + n2 )
O(n2 + n2 )
O(r2 + nr)
O(n2 + (n/)r )

e(n)
e(n)
e(n)
e(n)
e(n)

Markov Chain Analysis often Difficult

Drift Analysis: Long-term behaviour of X from

Theorem (Positive drift)

Theorem (Negative drift)

If exists > 0 st for all t 0

If exists > 0 st for all t 0

E [t | 0 < g(Xt )]

E [t | 0 < g(Xt ) < s]

...

...

then E [T ] smax /.
[Hajek, 1982, Oliveto and Witt, 2010, He and Yao, 2001]

then E [T ] ecs

State Aggregation Problem

How to come up with an appropriate distance function g?


I

Should reflect the progress of the heuristic, given its state.

Often hard to find for single-individual heuristics.

Highly non-trivial to find for population-based heuristics.

A new Approach for Finite Populations

Fitness Levels (upper bounds)


I

Concentration of measure

Drift analysis

A new Approach for Finite Populations

Fitness Levels (upper bounds)


I

Concentration of measure

Drift analysis

Population Drift (lower bounds)


I

Branching processes

Drift analysis

Population Drift

Central Parameters
I

Reproductive rate
0 = max E [#offspring from parent j],
1j

Drift of variation operator


Xt+1 pmut (Xt )
mut = g(Xt ) g(Xt+1 )

Population Drift

Central Parameters
I

Reproductive rate
0 = max E [#offspring from parent j],
1j

Drift of variation operator


Xt+1 pmut (Xt )
mut = g(Xt ) g(Xt+1 )

Population Drift

Central Parameters
I

Reproductive rate
0 = max E [#offspring from parent j],
1j

Drift of variation operator


Xt+1 pmut (Xt )
mut = g(Xt ) g(Xt+1 )

Population Drift

Central Parameters
I

Reproductive rate
0 = max E [#offspring from parent j],
1j

Drift of variation operator


Xt+1 pmut (Xt )
mut = g(Xt ) g(Xt+1 )

Population Drift:1 Decoupling Selection & Variation


Population drift [Lehre, 2011b]
If there exists a > 0 such that
Mmut () < 1/0
where
mut = g(Xt ) g(Xt+1 )
Xt+1 pmut (Xt )
and
0 = max E [#offspring from parent j],
j

then the runtime is exponential.

This slide only shows the main conditions of the theorems.

Population Drift:1 Decoupling Selection & Variation


Population drift [Lehre, 2011b]
If there exists a > 0 such that

Classical drift [Hajek, 1982]


If there exists a > 0 such that

Mmut () < 1/0

M () < 1
where

where
mut = g(Xt ) g(Xt+1 )

= h(Pt ) h(Pt+1 ),

Xt+1 pmut (Xt )


and
0 = max E [#offspring from parent j],
j

then the runtime is exponential.

then the runtime is exponential.

This slide only shows the main conditions of the theorems.

Population Drift - Implications

Mmut () <

1
= Inefficient algorithm
0

High negative drift induced by the variation operator,


must be compensated with high reproductive rate.

Analysis of algorithm can be decoupled into analyses of


I
I

the drift of the variation operator mut


the reproductive rate of the selection mechanism 0

Feasible to analyse highly complex processes!

Proof Idea

Population drift as a multi-type branching process.

The Perron root of the mean matrix satisfies


< 0 :

(M ) 0 Mmut ().

The process becomes extinct if (M ) < 1.

Directions for Further Work

Heuristics

Explicit Problem
Structure

Runtime
Analysis

Performance
Guarantees
Heuristic
Understanding

Directions for Further Work

Problem
Insight

Heuristics

Problem
Characterisation

Runtime
Analysis

Industrial
Problems

Performance
Guarantees
Heuristic
Understanding
Design
Guidelines

Combine Runtime Analysis with Fitness Landscape Theory


I

Heuristic Understanding cluster

Systems to Build Systems cluster

Conclusion

I
I

Strong empirical evidence for the benefit of meta-heuristics


Recently, it has become possible to prove mathematical
statements that describe the relationship between
a) Problem characteristics & Parameters of the meta-heuristic
b) The expected runtime of the meta-heuristic

Progress made possible by appropriate analytical tools


I

Population drift theorem

Questions?

Contact Details
I

PerKristian.Lehre@nottingham.ac.uk

http://www.cs.nott.ac.uk/pkl

Selected References
I

Lehre & Yao, IEEE TEC (2011)

Lehre, PPSN (2010)

Lehre, GECCO (2011)

Lehre & Witt, GECCO (2010)

References I
Baswana, S., Biswas, S., Doerr, B., Friedrich, T., Kurur, P. P., and Neumann, F.
(2009).
Computing single source shortest paths using single-objective fitness.
In FOGA 09: Proceedings of the tenth ACM SIGEVO workshop on Foundations
of genetic algorithms, pages 5966, New York, NY, USA. ACM.
Doerr, B., Klein, C., and Storch, T. (2007).
Faster evolutionary algorithms by superior graph representation.
In Proceedings of the 1st IEEE Symposium on Foundations of Computational
Intelligence (FOCI 2007), pages 245250.
Droste, S. (2006).
A rigorous analysis of the compact genetic algorithm for linear functions.
Natural Computing, 5(3):257283.
Droste, S., Jansen, T., and Wegener, I. (2002).
On the analysis of the (1+1) Evolutionary Algorithm.
Theoretical Computer Science, 276:5181.
Droste, S., Jansen, T., and Wegener, I. (2006).
Upper and lower bounds for randomized search heuristics in black-box
optimization.
Theory of Computing Systems, 39(4):525544.

References II
Dubhashi, D. and Panconesi, A. (2009).
Concentration of Measure for the Analysis of Randomized Algorithms.
Cambridge University Press.
Friedrich, T., Hebbinghaus, N., Neumann, F., He, J., and Witt, C. (2007).
Approximating covering problems by randomized search heuristics using
multi-objective models.
In Proceedings of the 9th annual conference on Genetic and evolutionary
computation (GECCO 2007), pages 797804, New York, NY, USA. ACM Press.
Giel, O. and Wegener, I. (2003).
Evolutionary algorithms and the maximum matching problem.
In Proceedings of the 20th Annual Symposium on Theoretical Aspects of
Computer Science (STACS 2003), pages 415426.
Hajek, B. (1982).
Hitting-time and occupation-time bounds implied by drift analysis with
applications.
Advances in Applied Probability, 14(3):502525.
He, J. and Yao, X. (2001).
Drift analysis and average time complexity of evolutionary algorithms.
Artificial Intelligence, 127(1):5785.

References III
He, J. and Yao, X. (2003).
Towards an analytic framework for analysing the computation time of
evolutionary algorithms.
Artificial Intelligence, 145(1-2):5997.
Jansen, T., Jong, K. A. D., and Wegener, I. (2005).
On the choice of the offspring population size in evolutionary algorithms.
Evolutionary Computation, 13(4):413440.
K
otzing, T., Lehre, P. K., Neumann, F., and Oliveto, P. S. (2010).
Ant colony optimization and the minimum cut problem.
In Proceedings of the 12th annual conference on Genetic and evolutionary
computation (GECCO 2010), pages 13931400, New York, NY, USA. ACM.
Kratsch, S., Lehre, P. K., Neumann, F., and Oliveto, P. S. (2011).
Fixed parameter evolutionary algorithms and maximum leaf spanning trees: A
matter of mutations.
In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume
6238 of LNCS, pages 204213. Springer Berlin / Heidelberg.
Lehre, P. K. (2011a).
Fitness-levels for non-elitist populations.
To appear in Proceedings of 2011 Genetic and Evolutionary Computation
Conference (GECCO 2011).

References IV
Lehre, P. K. (2011b).
Negative drift in populations.
In Proceedings of Parallel Problem Solving from Nature - (PPSN XI), volume
6238 of LNCS, pages 244253. Springer Berlin / Heidelberg.
Lehre, P. K. and Haddow, P. C. (2006).
Accessibility and runtime between convex neutral networks.
In Wang, T.-D., Li, X., Chen, S.-H., Wang, X., Abbass, H. A., Iba, H., Chen, G.,
and Yao, X., editors, SEAL, volume 4247 of Lecture Notes in Computer Science,
pages 734741. Springer.
Lehre, P. K. and Yao, X. (2007).
Runtime analysis of (1+1) EA on computing unique input output sequences.
In Proceedings of 2007 IEEE Congress on Evolutionary Computation
(CEC 2007), pages 18821889. IEEE Press.
Lehre, P. K. and Yao, X. (2009).
On the impact of the mutation-selection balance on the runtime of evolutionary
algorithms.
In Proceedings of the tenth ACM SIGEVO workshop on Foundations of genetic
algorithms (FOGA 2009), pages 4758, New York, NY, USA. ACM.
Motwani, R. and Raghavan, P. (1995).
Randomized Algorithms.
Cambridge University Press.

References V
M
uhlenbein, H. (1992).
How genetic algorithms really work I. Mutation and Hillclimbing.
In Proceedings of the Parallel Problem Solving from Nature 2, (PPSN-II), pages
1526. Elsevier.
M
uhlenbein, H. (1997).
The equation for response to selection and its use for prediction.
Evoluationary Computation, 5(3):303346.
Neumann, F. and Wegener, I. (2007).
Randomized local search, evolutionary algorithms, and the minimum spanning
tree problem.
Theoretical Computer Science, 378(1):3240.
Neumann, F. and Witt, C. (2006).
Runtime analysis of a simple ant colony optimization algorithm.
In Proceedings of The 17th International Symposium on Algorithms and
Computation (ISAAC 2006), number 4288 in LNCS, pages 618627.
Neumann, F. and Witt, C. (2008).
Ant colony optimization and the minimum spanning tree problem.
In Proceedings of Learning and Intelligent Optimization (LION 2008), pages
153166.

References VI
Oliveto, P. and Witt, C. (2010).
Simplified drift analysis for proving lower bounds inevolutionary computation.
Algorithmica, pages 118.
10.1007/s00453-010-9387-z.
Oliveto, P. S., He, J., and Yao, X. (2007).
Evolutionary algorithms and the vertex cover problem.
In In Proceedings of the IEEE Congress on Evolutionary Computation
(CEC 2007).
Reichel, J. and Skutella, M. (2008).
Evolutionary algorithms and matroid optimization problems.
Algorithmica.
Scharnow, J., Tinnefeld, K., and Wegener, I. (2002).
Fitness landscapes based on sorting and shortest paths problems.
In Proceedings of 7th Conf. on Parallel Problem Solving from Nature
(PPSNVII), number 2439 in LNCS, pages 5463.
Storch, T. (2006).
How randomized search heuristics find maximum cliques in planar graphs.
In Proceedings of the 8th annual conference on Genetic and evolutionary
computation (GECCO 2006), pages 567574, New York, NY, USA. ACM Press.

References VII
Wegener, I. and Witt, C. (2005).
On the analysis of a simple evolutionary algorithm on quadratic pseudo-boolean
functions.
Journal of Discrete Algorithms, 3(1):6178.
Witt, C. (2005).
Worst-case and average-case approximations by simple randomized search
heuristics.
In In Proceedings of the 22nd Annual Symposium on Theoretical Aspects of
Computer Science (STACS 05), number 3404 in LNCS, pages 4456.
Witt, C. (2006).
Runtime Analysis of the ( + 1) EA on Simple Pseudo-Boolean Functions.
Evolutionary Computation, 14(1):6586.
Zarges, C. (2009).
On the utility of the population size for inversely fitness proportional mutation
rates.
In FOGA 09: Proceedings of the tenth ACM SIGEVO workshop on Foundations
of genetic algorithms, pages 3946, New York, NY, USA. ACM.

Drift Analysis - Upper bounds

g(Xt+1 )

g(Xt )

Theorem ([He and Yao, 2001])


Given a stochastic process Xt0 and g : R+
0.
Define T to be the first time t such that g(Xt ) = 0.
If there exists a constant D > 0 such that t 0
1. Pr [g(Xt ) < B] = 1, and
2. E [g(Xt ) g(Xt+1 ) | T > t] D,
then
E [T ]

B
.
D

Negative Drift for Populations


Theorem ([Lehre, 2011b])
Given the Population Selection-Variation Algorithm with
I transition matrix pmut over search space and distance function g : N+
I runtime T := min{t 0 | g(P0 ) b and g(Pt ) < a}, b a = (n)
if there exists a > 0, and constants 0 , 1 , 2 > 0, st for all t 0,
1.
2.
3.
4.
5.

E [Rt (i) | a < g(Pt (i)) < b] 0 for all i, 1 i ,


h
i
1
E et (i) | a < g(Xt ) < b <
0 (1 + 1 )
h
i
E e(g(Xt+1 )b) | g(Xt ) b = O(1)
Pr [t (i) = ` t+1 (i `) = k]
e(ba)(12 )
Pr [t (i) = ` k]
Pr [t (i) = j]
1 ` + k j,
= O(1)
Pr [t (i k) = `]
1 ` + k j,

where
P
I Rt (i) :=
j=1 [It (j) = i] is the number of offspring from individual i
I Xt0 is the Markov process on associated with pmut , and
I t (i) := (g(Xt+1 ) g(Xt ) | g(Xt ) = i),
then the runtime satisfies Pr [T ecn ] = e(n) , for some constant c > 0.

You might also like