Adaboost Talk

AdaBoost
Lecturer:
Jan

Sochman
Authors:
Jan

Sochman, Jir Matas
Center for Machine Perception
Czech Technical University, Prague
http://cmp.felk.cvut.cz
2/17
Presentation
Motivation
AdaBoost with trees is the best o-the-shelf classier in the world. (Breiman 1998)
Outline:
AdaBoost algorithm
How it works?
Why it works?
Online AdaBoost and other variants
3/17
What is AdaBoost?
AdaBoost is an algorithm for constructing a strong classier as linear combination
f(x) =
T
t=1
t
h
t
(x)
of simple weak classiers h
t
(x): X {1, +1}.
Terminology
h
t
(x) ... weak or basis classier, hypothesis, feature
H(x) = sign(f(x)) ... strong or nal classier/hypothesis
Interesting properties
AB is capable reducing both bias (e.g. stumps) and variance (e.g. trees) of the weak
classiers
AB has good generalisation properties (maximises margin)
AB output converges to the logarithm of likelihood ratio
AB can be seen as a feature selector with a principled strategy (minimisation of upper
bound on empirical error)
AB is close to sequential decision making (it produces a sequence of gradually more
complex classiers)
4/17
The AdaBoost Algorithm
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
4/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
Initialise weights D
1
(i) = 1/m
4/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
4/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
t = 1
4/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
t = 1
4/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
t = 1
4/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
where Z
t
is normalisation factor
t = 1
4/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
where Z
t
Output the nal classier:
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 1
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
4/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
where Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 2
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
4/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
where Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 3
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
4/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
where Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 4
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
4/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
where Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 5
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
4/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
where Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 6
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
4/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
where Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 7
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
4/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
where Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 40
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
5/17
Reweighting
Eect on the training set
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
Increase (decrease) weight of wrongly (correctly) classied examples
The weight is the upper bound on the error of a given example
All information about previously selected features is captured in D
t
5/17
Reweighting
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
t
5/17
Reweighting
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
exp(
t
y
i
h
t
(x
i
))
< 1, y
i
= h
t
(x
i
)
> 1, y
i
= h
t
(x
i
)
t
6/17
Upper Bound Theorem
Theorem: The following upper bound holds on the training error of H
1
m
|{i : H(x
i
) = y
i
}|
T
t=1
Z
t
Proof: By unravelling the update rule
D
T+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
=
exp(
t
y
i
h
t
(x
i
))
m
t
Z
t
=
exp(y
i
f(x
i
))
m
t
Z
t
If H(x
i
) = y
i
then y
i
f(x
i
) 0 implying that exp(y
i
f(x
i
)) > 1, thus
JH(x
i
) = y
i
K exp(y
i
f(x
i
))
1
m
i
JH(x
i
) = y
i
K
1
m
i
exp(y
i
f(x
i
))
=
i
(
t
Z
t
)D
T+1
(i) =
t
Z
t 2 1 0 1 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
r
r
7/17
Consequences of the Theorem
Instead of minimising the training error, its upper bound can be minimised
This can be done by minimising Z
t
in each training round by:
Choosing optimal h
t
, and
Finding optimal
t
AdaBoost can be proved to maximise margin
AdaBoost iteratively ts an additive logistic regression model
8/17
Choosing
t
We attempt to minimise Z
t
=
i
D
t
(i)exp(
t
y
i
h
t
(x
i
)):
dZ
d
=
m
i=1
D(i)y
i
h(x
i
)e
y
i
i
h(x
i
)
= 0
i:y
i
=h(x
i
)
D(i)e
i:y
i
=h(x
i
)
D(i)e
= 0
e
(1 ) +e
= 0
=
1
2
log
1
The minimisator of the upper bound is

t
=
1
2
log
1
t
t
9/17
Choosing h
t
Weak classier examples
Decision tree (or stump), Perceptron H innite
Selecting the best one from given nite set H
Justication of the weighted error minimisation
Having
t
=
1
2
log
1
t
t
Z
t
=
m
i=1
D
t
(i)e
y
i
i
h
t
(x
i
)
=
i:y
i
=h
t
(x
i
)
D
t
(i)e
t
+
i:y
i
=h
t
(x
i
)
D
t
(i)e
t
= (1
t
)e
t
+
t
e
t
= 2
t
(1
t
)
Z
t
is minimised by selecting h
t
with minimal weighted error
t
10/17
Generalisation (Schapire & Singer 1999)
Maximising margins in AdaBoost
P
(x,y)S
[yf(x) ] 2
T
T
t=1
1
t
(1
t
)
1+
where f(x) =

h(x)

1
Choosing h
t
(x) with minimal
t
in each step one minimises the margin
Margin in SVM use the L
2
norm instead: (
h(x))/
2
Upper bounds based on margin
With probability 1 over the random choice of the training set S
P
(x,y)D
[yf(x) 0] P
(x,y)S
[yf(x) ] +O
d log
2
(m/d)
2
+ log(1/)
1/2
where D is a distribution over X {+1, 1}, and d is pseudodimension of H.

Problem: The upper bound is very loose. In practice AdaBoost works much better.
11/17
Convergence (Friedman et al. 1998)
Proposition 1 The discrete AdaBoost algorithm minimises J(f(x)) = E(e
yf(x)
) by
adaptive Newton updates
Lemma J(f(x)) is minimised at
f(x) =
T
t=1
t
h
t
(x) =
1
2
log
P(y = 1|x)
P(y = 1|x)
Hence
P(y = 1|x) =
e
f(x)
e
f(x)
+e
f(x)
and P(y = 1|x) =
e
f(x)
e
f(x)
+e
f(x)
Additive logistic regression model
T
t=1
a
t
(x) = log
P(y = 1|x)
P(y = 1|x)
Proposition 2 By minimising J(f(x)) the discrete AdaBoost ts (up to a factor 2) an
additive logistic regression model
12/17
The Algorithm Recapitulation
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
12/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
12/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
12/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
t = 1
12/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
t = 1
12/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
t = 1
12/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
t = 1
12/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 1
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
12/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 2
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
12/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 3
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
12/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 4
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
12/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 5
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
12/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 6
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
12/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 7
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
12/17
Given: (x
1
, y
1
), . . . , (x
m
, y
m
); x
i
X, y
i
{1, +1}
1
(i) = 1/m
For t = 1, ..., T:
Find h
t
= arg min
h
j
H
j
=
m
i=1
D
t
(i)Jy
i
= h
j
(x
i
)K
If
t
1/2 then stop
Set
t
=
1
2
log(
1
t
t
)
Update
D
t+1
(i) =
D
t
(i)exp(
t
y
i
h
t
(x
i
))
Z
t
H(x) = sign
t=1
t
h
t
(x)
step
t
r
a
i
n
i
n
g
e
r
r
o
r
t = 40
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
13/17
AdaBoost Variants
Freund & Schapire 1995
Discrete (h : X {0, 1})
Multiclass AdaBoost.M1 (h : X {0, 1, ..., k})
Multiclass AdaBoost.M2 (h : X [0, 1]
k
)
Real valued AdaBoost.R (Y = [0, 1], h : X [0, 1])
Schapire & Singer 1999
Condence rated prediction (h : X R, two-class)
Multilabel AdaBoost.MR, AdaBoost.MH (dierent formulation of minimised loss)
Oza 2001
Online AdaBoost
Many other modications since then: cascaded AB, WaldBoost, probabilistic boosting
tree, ...
14/17
Online AdaBoost
Oine
Given:
Set of labeled training samples
X = {(x
1
, y
1
), ..., (x
m
, y
m
)|y = 1}
Weight distribution over X
D
0
= 1/m
For t = 1, . . . , T
Train a weak classier using samples
and weight distribution
h
t
(x) = L(X, D
t1
)
Calculate error
t
Calculate coecient
t
from
t
Update weight distribution D
t
Output:
F(x) = sign(
T
t=1
t
h
t
(x))
Online
Given:
For t = 1, . . . , T
Output:
F(x) = sign(
T
t=1
t
h
t
(x))
14/17
Online AdaBoost
Oine
Given:
X = {(x
1
, y
1
), ..., (x
m
, y
m
)|y = 1}
D
0
= 1/m
For t = 1, . . . , T
h
t
(x) = L(X, D
t1
)
Calculate error
t
Calculate coecient
t
from
t
t
Output:
F(x) = sign(
T
t=1
t
h
t
(x))
Online
Given:
For t = 1, . . . , T
Output:
F(x) = sign(
T
t=1
t
h
t
(x))
One labeled training sample
(x, y)|y = 1
Strong classier to update
14/17
Online AdaBoost
Oine
Given:
X = {(x
1
, y
1
), ..., (x
m
, y
m
)|y = 1}
D
0
= 1/m
For t = 1, . . . , T
h
t
(x) = L(X, D
t1
)
Calculate error
t
Calculate coecient
t
from
t
t
Output:
F(x) = sign(
T
t=1
t
h
t
(x))
Online
Given:
For t = 1, . . . , T
Output:
F(x) = sign(
T
t=1
t
h
t
(x))
(x, y)|y = 1
Initial importance = 1
14/17
Online AdaBoost
Oine
Given:
X = {(x
1
, y
1
), ..., (x
m
, y
m
)|y = 1}
D
0
= 1/m
For t = 1, . . . , T
h
t
(x) = L(X, D
t1
)
Calculate error
t
Calculate coecient
t
from
t
t
Output:
F(x) = sign(
T
t=1
t
h
t
(x))
Online
Given:
For t = 1, . . . , T
Output:
F(x) = sign(
T
t=1
t
h
t
(x))
(x, y)|y = 1
Update the weak classier using the
sample and the importance
h
t
(x) = L(h
t
, (x, y), )
14/17
Online AdaBoost
Oine
Given:
X = {(x
1
, y
1
), ..., (x
m
, y
m
)|y = 1}
D
0
= 1/m
For t = 1, . . . , T
h
t
(x) = L(X, D
t1
)
Calculate error
t
Calculate coecient
t
from
t
t
Output:
F(x) = sign(
T
t=1
t
h
t
(x))
Online
Given:
For t = 1, . . . , T
Output:
F(x) = sign(
T
t=1
t
h
t
(x))
(x, y)|y = 1
h
t
(x) = L(h
t
, (x, y), )
Update error estimation
t
Update weight
t
based on
t
14/17
Online AdaBoost
Oine
Given:
X = {(x
1
, y
1
), ..., (x
m
, y
m
)|y = 1}
D
0
= 1/m
For t = 1, . . . , T
h
t
(x) = L(X, D
t1
)
Calculate error
t
Calculate coecient
t
from
t
t
Output:
F(x) = sign(
T
t=1
t
h
t
(x))
Online
Given:
For t = 1, . . . , T
Output:
F(x) = sign(
T
t=1
t
h
t
(x))
(x, y)|y = 1
h
t
(x) = L(h
t
, (x, y), )
t
Update weight
t
based on
t
Update importance weight
14/17
Online AdaBoost
Oine
Given:
X = {(x
1
, y
1
), ..., (x
m
, y
m
)|y = 1}
D
0
= 1/m
For t = 1, . . . , T
h
t
(x) = L(X, D
t1
)
Calculate error
t
Calculate coecient
t
from
t
t
Output:
F(x) = sign(
T
t=1
t
h
t
(x))
Online
Given:
For t = 1, . . . , T
Output:
F(x) = sign(
T
t=1
t
h
t
(x))
(x, y)|y = 1
h
t
(x) = L(h
t
, (x, y), )
t
Update weight
t
based on
t
Update importance weight
15/17
Online AdaBoost
Converges to oine results given the same training
set and the number of iterations N
N. Oza and S. Russell. Online Bagging and Boosting.
Articial Inteligence and Statistics, 2001.
16/17
Pros and Cons of AdaBoost
Advantages
Very simple to implement
General learning scheme - can be used for various learning tasks
Feature selection on very large sets of features
Good generalisation
Seems not to overt in practice (probably due to margin maximisation)
Disadvantages
Suboptimal solution (greedy learning)
17/17
Selected references
Y. Freund, R.E. Schapire. A Decision-theoretic Generalization of On-line Learning
and an Application to Boosting. Journal of Computer and System Sciences. 1997
R.E. Schapire, Y. Freund, P. Bartlett, W.S. Lee. Boosting the Margin: A New
Explanation for the Eectiveness of Voting Methods. The Annals of Statistics,
1998
R.E. Schapire, Y. Singer. Improved Boosting Algorithms Using Condence-rated
Predictions. Machine Learning. 1999
J. Friedman, T. Hastie, R. Tibshirani. Additive Logistic Regression: a Statistical
View of Boosting. Technical report. 1998
N.C. Oza. Online Ensemble Learning. PhD thesis. 2001
http://www.boosting.org
17/17
Selected references
Y. Freund, R.E. Schapire. A Decision-theoretic Generalization of On-line Learning
and an Application to Boosting. Journal of Computer and System Sciences. 1997
R.E. Schapire, Y. Freund, P. Bartlett, W.S. Lee. Boosting the Margin: A New
Explanation for the Eectiveness of Voting Methods. The Annals of Statistics,
1998
R.E. Schapire, Y. Singer. Improved Boosting Algorithms Using Condence-rated
Predictions. Machine Learning. 1999
J. Friedman, T. Hastie, R. Tibshirani. Additive Logistic Regression: a Statistical
View of Boosting. Technical report. 1998
N.C. Oza. Online Ensemble Learning. PhD thesis. 2001
http://www.boosting.org
Thank you for attention
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
2 1 0 1 2
0
0.5
1
1.5
2
2.5
3
yf(x)
e
r
r
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0 5 10 15 20 25 30 35 40
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35

Adaboost Talk

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Adaboost Talk

Uploaded by

Copyright:

Available Formats

AdaBoost

The minimisator of the upper bound is

where D is a distribution over X {+1, 1}, and d is pseudodimension of H.

You might also like