Professional Documents
Culture Documents
DD3364
Introduction
classification.
decision boundaries.
classification.
decision boundaries.
classification.
decision boundaries.
classification.
decision boundaries.
g(k (x)) = k0 + kt x
g(k (x)) = k0 + kt x
g(k (x)) = k0 + kt x
exp(0 + t x)
1 + exp(0 + t x)
1
P (G = 2|X = x) =
1 + exp(0 + t x)
P (G = 1|X = x) =
exp(0 + t x)
1 + exp(0 + t x)
1
P (G = 2|X = x) =
1 + exp(0 + t x)
P (G = 1|X = x) =
exp(0 + t x)
1 + exp(0 + t x)
1
P (G = 2|X = x) =
1 + exp(0 + t x)
P (G = 1|X = x) =
hyperplanes if they exist and fail of the points are not linearly
separable.
There are fixes for the non-separable case but we will not
hyperplanes if they exist and fail of the points are not linearly
separable.
There are fixes for the non-separable case but we will not
hyperplanes if they exist and fail of the points are not linearly
separable.
There are fixes for the non-separable case but we will not
hyperplanes if they exist and fail of the points are not linearly
separable.
There are fixes for the non-separable case but we will not
hyperplanes if they exist and fail of the points are not linearly
separable.
There are fixes for the non-separable case but we will not
hyperplanes if they exist and fail of the points are not linearly
separable.
There are fixes for the non-separable case but we will not
2
1
1
2
1
3
22
1
2 22 2
3
2 22
33 3 3 3
22 2
2
12
2
33
3 3
2 2 22 2 22 2
3 3
3
2 222 2
1
22
2
22
3
22
1
2
2
3 3
22 2 2 2
22 2
22 22222 2 2
3 3 33
2222 2222 2222
33
2 2222 2 2 2 2222222 21
3 33 333
222 2 2
2
3 333 3333 3 3
222 222 22 2 2 2222 2 2
3333 33
2 2 2222 22 22 22 22222222 22 2
13
3 3 33
2
2
2
2
2
3
1
2
2
2
1 2 222 222222 2222
1
3 3
33333
21222212
1
2
2
1
1 3 33
1 1
33 3
2 122122 22
1
22
1
2 1 1222 1
11 1 33
3
22 2 2
1
2
1
11
1 1 1133333 33
11 1 1
2211
12 2
3 33
2
1
1
2
1
1
1
1
1
2
1
1
3 33 3333
1
1
1
22
12
11 1 1 1
22
1 11 11 1 1 11 1 13313333333 33 3
211 2 1
3
3
1
3
2 111
1
3
1
1 1 1111 1 11 1 3333333
111 1 1
11 1 1
33333333
3
1
1
3
1
1 1
3333 3
1 1 11 1 11 111 111 1 1 11 11 1 111 3
313 1
111 1 1
33 33
1 11 1
1 1 3333333
1
1
31 33 3 3
1 11 1
1 333333333333
11 1 1 1 1 111111 11 111 1 1 1 1
1 11 1
1
33
3 3 333333
11 3
1
1
1 1
33333 3
11
1
3
1 1 11
3 33
1
1
3
333 3
3
3 3
2
1
1
2
1
3
22
1
2 22 2
3
2 22
33 3 3 3
22 2
2
12
2
33
3 3
2 2 22 2 22 2
3 3
3
2 222 2
1
22
2
22
3
22
1
2
2
3 3
22 2 2 2
22 2
22 22222 2 2
3 3 33
2222 2222 2222
33
2 2222 2 2 2 2222222 21
3 33 333
222 2 2
2
3 333 3333 3 3
222 222 22 2 2 2222 2 2
3333 33
2 2 2222 22 22 22 22222222 22 2
13
3 3 33
2
2
2
2
2
3
1
2
2
2
1 2 222 222222 2222
1
3 3
33333
21222212
1
2
2
1
1 3 33
1 1
33 3
2 122122 22
1
22
1
2 1 1222 1
11 1 33
3
22 2 2
1
2
1
11
1 1 1133333 33
11 1 1
2211
12 2
3 33
2
1
1
2
1
1
1
1
1
2
1
1
3 33 3333
1
1
1
22
12
11 1 1 1
22
1 11 11 1 1 11 1 13313333333 33 3
211 2 1
3
3
1
3
2 111
1
3
1
1 1 1111 1 11 1 3333333
111 1 1
11 1 1
33333333
3
1
1
3
1
1 1
3333 3
1 1 11 1 11 111 111 1 1 11 11 1 111 3
313 1
111 1 1
33 33
1 11 1
1 1 3333333
1
1
31 33 3 3
1 11 1
1 333333333333
11 1 1 1 1 111111 11 111 1 1 1 1
1 11 1
1
33
3 3 333333
11 3
1
1
1 1
33333 3
11
1
3
1 1 11
3 33
1
1
3
333 3
3
3 3
FIGURE 4.1. The left plot shows some data from three classes, with linear
2
1
1
2
1
3
22
1
2 22 2
3
2 22
33 3 3 3
22 2
2
12
2
33
3 3
2 2 22 2 22 2
3 3
3
2 222 2
1
22
2
22
3
22
1
2
2
3 3
22 2 2 2
22 2
22 22222 2 2
3 3 33
2222 2222 2222
33
2 2222 2 2 2 2222222 21
3 33 333
222 2 2
2
3 333 3333 3 3
222 222 22 2 2 2222 2 2
3333 33
2 2 2222 22 22 22 22222222 22 2
13
3 3 33
2
2
2
2
2
3
1
2
2
2
1 2 222 222222 2222
1
3 3
33333
21222212
1
2
2
1
1 3 33
1 1
33 3
2 122122 22
1
22
1
2 1 1222 1
11 1 33
3
22 2 2
1
2
1
11
1 1 1133333 33
11 1 1
2211
12 2
3 33
2
1
1
2
1
1
1
1
1
2
1
1
3 33 3333
1
1
1
22
12
11 1 1 1
22
1 11 11 1 1 11 1 13313333333 33 3
211 2 1
3
3
1
3
2 111
1
3
1
1 1 1111 1 11 1 3333333
111 1 1
11 1 1
33333333
3
1
1
3
1
1 1
3333 3
1 1 11 1 11 111 111 1 1 11 11 1 111 3
313 1
111 1 1
33 33
1 11 1
1 1 3333333
1
1
31 33 3 3
1 11 1
1 333333333333
11 1 1 1 1 111111 11 111 1 1 1 1
1 11 1
1
33
3 3 333333
11 3
1
1
1 1
33333 3
11
1
3
1 1 11
3 33
1
1
3
333 3
3
3 3
2
1
1
2
1
3
22
1
2 22 2
3
2 22
33 3 3 3
22 2
2
12
2
33
3 3
2 2 22 2 22 2
3 3
3
2 222 2
1
22
2
22
3
22
1
2
2
3 3
22 2 2 2
22 2
22 22222 2 2
3 3 33
2222 2222 2222
33
2 2222 2 2 2 2222222 21
3 33 333
222 2 2
2
3 333 3333 3 3
222 222 22 2 2 2222 2 2
3333 33
2 2 2222 22 22 22 22222222 22 2
13
3 3 33
2
2
2
2
2
3
1
2
2
2
1 2 222 222222 2222
1
3 3
33333
21222212
1
2
2
1
1 3 33
1 1
33 3
2 122122 22
1
22
1
2 1 1222 1
11 1 33
3
22 2 2
1
2
1
11
1 1 1133333 33
11 1 1
2211
12 2
3 33
2
1
1
2
1
1
1
1
1
2
1
1
3 33 3333
1
1
1
22
12
11 1 1 1
22
1 11 11 1 1 11 1 13313333333 33 3
211 2 1
3
3
1
3
2 111
1
3
1
1 1 1111 1 11 1 3333333
111 1 1
11 1 1
33333333
3
1
1
3
1
1 1
3333 3
1 1 11 1 11 111 111 1 1 11 11 1 111 3
313 1
111 1 1
33 33
1 11 1
1 1 3333333
1
1
31 33 3 3
1 11 1
1 333333333333
11 1 1 1 1 111111 11 111 1 1 1 1
1 11 1
1
33
3 3 333333
11 3
1
1
1 1
33333 3
11
1
3
1 1 11
3 33
1
1
3
333 3
3
3 3
FIGURE 4.1. The left plot shows some data from three classes, with linear
2
1
1
2
1
3
22
1
2 22 2
3
2 22
33 3 3 3
22 2
2
12
2
33
3 3
2 2 22 2 22 2
3 3
3
2 222 2
1
22
2
22
3
22
1
2
2
3 3
22 2 2 2
22 2
22 22222 2 2
3 3 33
2222 2222 2222
33
2 2222 2 2 2 2222222 21
3 33 333
222 2 2
2
3 333 3333 3 3
222 222 22 2 2 2222 2 2
3333 33
2 2 2222 22 22 22 22222222 22 2
13
3 3 33
2
2
2
2
2
3
1
2
2
2
1 2 222 222222 2222
1
3 3
33333
21222212
1
2
2
1
1 3 33
1 1
33 3
2 122122 22
1
22
1
2 1 1222 1
11 1 33
3
22 2 2
1
2
1
11
1 1 1133333 33
11 1 1
2211
12 2
3 33
2
1
1
2
1
1
1
1
1
2
1
1
3 33 3333
1
1
1
22
12
11 1 1 1
22
1 11 11 1 1 11 1 13313333333 33 3
211 2 1
3
3
1
3
2 111
1
3
1
1 1 1111 1 11 1 3333333
111 1 1
11 1 1
33333333
3
1
1
3
1
1 1
3333 3
1 1 11 1 11 111 111 1 1 11 11 1 111 3
313 1
111 1 1
33 33
1 11 1
1 1 3333333
1
1
31 33 3 3
1 11 1
1 333333333333
11 1 1 1 1 111111 11 111 1 1 1 1
1 11 1
1
33
3 3 333333
11 3
1
1
1 1
33333 3
11
1
3
1 1 11
3 33
1
1
3
333 3
3
3 3
2
1
1
2
1
3
22
1
2 22 2
3
2 22
33 3 3 3
22 2
2
12
2
33
3 3
2 2 22 2 22 2
3 3
3
2 222 2
1
22
2
22
3
22
1
2
2
3 3
22 2 2 2
22 2
22 22222 2 2
3 3 33
2222 2222 2222
33
2 2222 2 2 2 2222222 21
3 33 333
222 2 2
2
3 333 3333 3 3
222 222 22 2 2 2222 2 2
3333 33
2 2 2222 22 22 22 22222222 22 2
13
3 3 33
2
2
2
2
2
3
1
2
2
2
1 2 222 222222 2222
1
3 3
33333
21222212
1
2
2
1
1 3 33
1 1
33 3
2 122122 22
1
22
1
2 1 1222 1
11 1 33
3
22 2 2
1
2
1
11
1 1 1133333 33
11 1 1
2211
12 2
3 33
2
1
1
2
1
1
1
1
1
2
1
1
3 33 3333
1
1
1
22
12
11 1 1 1
22
1 11 11 1 1 11 1 13313333333 33 3
211 2 1
3
3
1
3
2 111
1
3
1
1 1 1111 1 11 1 3333333
111 1 1
11 1 1
33333333
3
1
1
3
1
1 1
3333 3
1 1 11 1 11 111 111 1 1 11 11 1 111 3
313 1
111 1 1
33 33
1 11 1
1 1 3333333
1
1
31 33 3 3
1 11 1
1 333333333333
11 1 1 1 1 111111 11 111 1 1 1 1
1 11 1
1
33
3 3 333333
11 3
1
1
1 1
33333 3
11
1
3
1 1 11
3 33
1
1
3
333 3
3
3 3
FIGURE 4.1. The left plot shows some data from three classes, with linear
gi {1, . . . , K}.
Define
k (x) = 0k + kt x
Classify a new point x with
gi {1, . . . , K}.
Define
k (x) = 0k + kt x
Classify a new point x with
gi {1, . . . , K}.
Define
k (x) = 0k + kt x
Classify a new point x with
gi {1, . . . , K}.
Define
k (x) = 0k + kt x
Classify a new point x with
gi {1, . . . , K}.
Define
k (x) = 0k + kt x
Classify a new point x with
gi {1, . . . , K}.
Define
k (x) = 0k + kt x
Classify a new point x with
3 class example
0.5
0.5
0.5
10
0
0
10
20
10
0
0
10
20
10
0
0
10
20
0.5
0.5
10
0
0
10
20
0.5
10
0
0
10
20
10
0
0
10
20
0.5
0.5
0.5
10
0
0
10
20
10
0
0
10
20
10
0
0
10
20
0.5
0.5
1 (x)
1.5
0.5
0.5
2 (x)
1.5
0.5
0.5
3 (x)
1.5
0.5
1 (x)
0.5
2 (x)
0.5
3 (x)
0.5
1 (x)
0.5
2 (x)
0.5
3 (x)
discriminant functions.
discriminant functions.
discriminant functions.
discriminant functions.
PK
k=1
fk (x)k
P (G = k | X = x) = PK
l=1 fl (x)l
k = 1
to having P (G = k | X = x).
PK
k=1
fk (x)k
P (G = k | X = x) = PK
l=1 fl (x)l
k = 1
to having P (G = k | X = x).
PK
k=1
fk (x)k
P (G = k | X = x) = PK
l=1 fl (x)l
k = 1
to having P (G = k | X = x).
PK
k=1
fk (x)k
P (G = k | X = x) = PK
l=1 fl (x)l
k = 1
to having P (G = k | X = x).
distributions,
boundaries,
flexibility,
Qp
j=1
fkj (Xj ).
distributions,
boundaries,
flexibility,
Qp
j=1
fkj (Xj ).
distributions,
boundaries,
flexibility,
Qp
j=1
fkj (Xj ).
distributions,
boundaries,
flexibility,
Qp
j=1
fkj (Xj ).
distributions,
boundaries,
flexibility,
Qp
j=1
fkj (Xj ).
1
p
fk (x) =
exp {.5(x k )t 1
k (x k )}
p
2 |k |
Similar discriminant functions were derived where each p(x
distributed
with(LDA)
equal arises
covariance
LinearNormally
Discriminant
Analysis
in thematrices.
special
case when
k = for all k
class distributions
decision boundary
partition
In this
lecture,
no assumptions,
One gets
linear
decision
boundaries. made about the underlying de
1
p
fk (x) =
exp {.5(x k )t 1
k (x k )}
p
2 |k |
Similar discriminant functions were derived where each p(x
distributed
with(LDA)
equal arises
covariance
LinearNormally
Discriminant
Analysis
in thematrices.
special
case when
k = for all k
class distributions
decision boundary
partition
In this
lecture,
no assumptions,
One gets
linear
decision
boundaries. made about the underlying de
LDA
Can see this as
log
P (G = k | X = x)
fk (x)
k
= log
+ log
P (G = l | X = x)
fl (x)
l
k
t
1
= log
.5 k k + .5 tl 1 l
l
+ xt 1 (k l )
= xt a + b
a linear function
t 1
The equal covariance matrices allow the xt 1
k x and x l x
functions
k (x) = xt 1 k .5 tk 1 k + log k
are an equivalent description of the decision rule with
G(x) = arg max k (x)
k
LDA
Can see this as
log
P (G = k | X = x)
fk (x)
k
= log
+ log
P (G = l | X = x)
fl (x)
l
k
t
1
= log
.5 k k + .5 tl 1 l
l
+ xt 1 (k l )
= xt a + b
a linear function
t 1
The equal covariance matrices allow the xt 1
k x and x l x
functions
k (x) = xt 1 k .5 tk 1 k + log k
are an equivalent description of the decision rule with
G(x) = arg max k (x)
k
LDA
Can see this as
log
P (G = k | X = x)
fk (x)
k
= log
+ log
P (G = l | X = x)
fl (x)
l
k
t
1
= log
.5 k k + .5 tl 1 l
l
+ xt 1 (k l )
= xt a + b
a linear function
t 1
The equal covariance matrices allow the xt 1
k x and x l x
functions
k (x) = xt 1 k .5 tk 1 k + log k
are an equivalent description of the decision rule with
G(x) = arg max k (x)
k
PK P
k=1
(xi
k )(xi Analysis
k )t /(n109
Linear
Discriminant
g4.3
i =k
13
3
3
3
33
3
2 2
13
2
3
3
3
31 3
3 22
2
1
3 3
2
3
33
11 23 33 1
22 2 2
2
3
2
1 1 1 1 22
13
3
2
1 31 1
3
1 11
2 22
11
22
1
1
2
2
1 1 2
1
1
1 2
1
2
2
2
1 3
33
K)
Bivariate example
t 1
1 log | |
.9 .5 .4
2.6
k (x)
= .5
(x
2k =
1 =
, 1 =k
, P(
=k.5 (x
2=
k )1 )
k ) + log,
.4
.3
.4
.2
class distributions
decision boundaries
partition
Bivariate example
t 1
1 log | |
.9 .5 .4
2.6
k (x)
= .5
(x
2k =
1 =
, 1 =k
, P(
=k.5 (x
2=
k )1 )
k ) + log,
.4
.3
.4
.2
class distributions
decision boundaries
partition
2
1
1
2
1
3
22 2
1
2 22
3
2 22
33 3 3 3
22 2
2
1
2
2
33
3 3
2 2 22 2 22 2
3 3
3
2 222 2
1
22
2
22
3
22
1
2
2
2
3 3
2 2
22
22 2
22 22222 2 2
3 3 33
2222 2222 2222
33
2 2222 2 2 2 2222222 21
3 33 333
222 2 2
2
3 333 3333 3 3
222 222 22 2 2 2222 2 2
333 33
2 2 2222 22 22 22 22222222 22 2
13
222 222 22
3 333 3 3
1 2 22222 2122
1
2
3 3
33333
2122222122
1
2
2
1
1 3 33
2
1 1
3 3
2 122122 22
1
22
1
2 1 1222 1
11 1 33
3 3 33
22 2 2 2
1
1
11
2
1
1
333 333
1
3
1
1
2
11 1 11
2211
3 3
2
1
2121 2
1 111
3 3 3333
11 1
1
112
1 121 11 1
222
1 1 1 1 1 11 1 13313333333 33 3
211 2 1
3
3
1
3
2 111
1
3
1
1
3
1
1
1
33333333
1
1
111 1 1
11
1 11 33333
333
1
1 1 1
33333
1 1 11 1 11 1111 111 1 1 1111 1111 1
11 313 1
111 1 1
333333
3
1 11 1
1
3
3
3
3
1
3
3
1 3133 3 3
1
1
1 11
1 33333333333
11 1 1 1 1 111111 11 111 1 1 1 1
1 11 1
1
33
3 3 333333
11 3
1
1
1 1
33333 3
11
1
3
1 1 11
3 33
1
1
3
333 3
3
3 3
2
1
1
2
1
3
22 2
1
2 22
3
2 22
33 3 3 3
22 2
2
1
2
2
33
3 3
2 2 22 2 22 2
3 3
3
2 222 2
1
22
2
22
3
22
1
2
2
2
3 3
2 2
22
22 2
22 22222 2 2
3 3 33
2222 2222 2222
33
2 2222 2 2 2 2222222 21
3 33 333
222 2 2
2
3 333 3333 3 3
222 222 22 2 2 2222 2 2
333 33
2 2 2222 22 22 22 22222222 22 2
13
222 222 22
3 333 3 3
1 2 22222 2122
1
2
3 3
33333
2122222122
1
2
2
1
1 3 33
2
1 1
3 3
2 122122 22
1
22
1
2 1 1222 1
11 1 33
3 3 33
22 2 2 2
1
1
11
2
1
1
333 333
1
3
1
1
2
11 1 11
2211
3 3
2
1
2121 2
1 111
3 3 3333
11 1
1
112
1 121 11 1
222
1 1 1 1 1 11 1 13313333333 33 3
211 2 1
3
3
1
3
2 111
1
3
1
1
3
1
1
1
33333333
1
1
111 1 1
11
1 11 33333
333
1
1 1 1
33333
1 1 11 1 11 1111 111 1 1 1111 1111 1
11 31 1
111 1 1
333333
33333
1 11 1
1
3
3
1
3
1 3133 3 3
1
1
1 11
1 33333333333
11 1 1 1 1 111111 11 111 1 1 1 1
1 11 1
1
33
3 3 333333
11 3
1
1
1 1
33333 3
11
1
3
1 1 11
3 33
1
1
3
333 3
3
3 3
FIGURE 4.6. Two methods for fitting quadratic boundaries. The left plot shows
the quadratic decision boundaries for the data in Figure 4.1 (obtained using LDA
in the five-dimensional space X1 , X2 , X1 X2 , X12 , X22 ). The right plot shows the
quadratic decision boundaries found by QDA. The differences are small, as is
usually the case.
u = 1 + 1 (2 1 ) + 2 (3 1 ) + + K1 (K 1 )
= 1 + 1 d1 + 2 d2 + + K1 dK1
x = 1 + 1 d1 + 2 d2 + + K1 dK1 + x ,
where x HK1
.
kx j k = k1 + 1 d1 + 2 d2 + + K1 dK1 + x j k
u = 1 + 1 (2 1 ) + 2 (3 1 ) + + K1 (K 1 )
= 1 + 1 d1 + 2 d2 + + K1 dK1
x = 1 + 1 d1 + 2 d2 + + K1 dK1 + x ,
where x HK1
.
kx j k = k1 + 1 d1 + 2 d2 + + K1 dK1 + x j k
u = 1 + 1 (2 1 ) + 2 (3 1 ) + + K1 (K 1 )
= 1 + 1 d1 + 2 d2 + + K1 dK1
x = 1 + 1 d1 + 2 d2 + + K1 dK1 + x ,
where x HK1
.
kx j k = k1 + 1 d1 + 2 d2 + + K1 dK1 + x j k
u = 1 + 1 (2 1 ) + 2 (3 1 ) + + K1 (K 1 )
= 1 + 1 d1 + 2 d2 + + K1 dK1
x = 1 + 1 d1 + 2 d2 + + K1 dK1 + x ,
where x HK1
.
kx j k = k1 + 1 d1 + 2 d2 + + K1 dK1 + x j k
u = 1 + 1 (2 1 ) + 2 (3 1 ) + + K1 (K 1 )
= 1 + 1 d1 + 2 d2 + + K1 dK1
x = 1 + 1 d1 + 2 d2 + + K1 dK1 + x ,
where x HK1
.
kx j k = k1 + 1 d1 + 2 d2 + + K1 dK1 + x j k
To summarize
K centroids in p-dimensional input space lie in an affine
subspace of dimension K 1.
107
2
0
o o oo
ooo
oo
o o ooo
o o o
oo o
oooooo ooooo o oooo
oo o
o o
oo
o o
o ooo
oo ooo
o o o o oo
o
o
oo
oooooo
o
oo
o o o ooo o
ooooo o
o o o
o
oo o o oo
o
o
o
o
o
o
oo
o oo
oo o o o
ooo
o o oooo oo o
oo o o o
o
oo o o o o o
o ooo o
o
oooo o o
o o
o
o
o o
o
oo o oo
o o
oo
o
o o
o
ooo o
o
o
oooo o o o o
o
o
o
o
o
o
o
o
o oo o oo o o
o
o
o
o ooooo
ooo ooo
o o oo
o
o ooo
o
ooooo
oooooooo ooo
oo
o
o
o
o
o
o
o o o
o o oo o
oo
oo o o
o o oo
ooo o o oo
oo
o oo o o o
o o oo o
o
o
o
o
o o
o
o oo o
ooo
oo o
oooooooo o
oo oo o
o
o
o
oo oo
o
oo
o
o
o
o oo
oo o o
o
o o
o o o
o o o oo o
o
o
o
oo o
o
o
o
o o o
oo
o
o
o
o
o
o
o oooo o o
oo
o
oo
o o
o o ooo o oo o ooo
o
o
o
o
o
o
o o
o
o ooo o
o
o
o o o
oo
o o
oo
o
oo
o
o o
oo
o
o
o
o
o
o
oo
o
o
o
o
oo o
oo o o
o o
oo oo
ooo
o o
o
o
o
o
o
oo o
ooo
o oo o
oo
o o
o
o
o
o
o o
o
o
-6
-2
-4
oooo
oo
oooo
o
oo
oo
o
-4
-2
o
0
107
2
0
o o oo
ooo
oo
o o ooo
o o o
oo o
oooooo ooooo o oooo
oo o
o o
oo
o o
o ooo
oo ooo
o o o o oo
o
o
oo
oooooo
o
oo
o o o ooo o
ooooo o
o o o
o
oo o o oo
o
o
o
o
o
o
oo
o oo
oo o o o
ooo
o o oooo oo o
oo o o o
o
oo o o o o o
o ooo o
o
oooo o o
o o
o
o
o o
o
oo o oo
o o
oo
o
o o
o
ooo o
o
o
oooo o o o o
o
o
o
o
o
o
o
o
o oo o oo o o
o
o
o
o ooooo
ooo ooo
o o oo
o
o ooo
o
ooooo
oooooooo ooo
oo
o
o
o
o
o
o
o o o
o o oo o
oo
oo o o
o o oo
ooo o o oo
oo
o oo o o o
o o oo o
o
o
o
o
o o
o
o oo o
ooo
oo o
oooooooo o
oo oo o
o
o
o
oo oo
o
oo
o
o
o
o oo
oo o o
o
o o
o o o
o o o oo o
o
o
o
oo o
o
o
o
o o o
oo
o
o
o
o
o
o
o oooo o o
oo
o
oo
o o
o o ooo o oo o ooo
o
o
o
o
o
o
o o
o
o ooo o
o
o
o o o
oo
o o
oo
o
oo
o
o o
oo
o
o
o
o
o
o
oo
o
o
o
o
oo o
oo o o
o o
oo oo
ooo
o o
o
o
o
o
o
oo o
ooo
o oo o
oo
o o
o
o
o
o
o o
o
o
-6
-2
-4
oooo
oo
oooo
o
oo
oo
o
-4
-2
o
0
107
2
0
o o oo
ooo
oo
o o ooo
o o o
oo o
oooooo ooooo o oooo
oo o
o o
oo
o o
o ooo
oo ooo
o o o o oo
o
o
oo
oooooo
o
oo
o o o ooo o
ooooo o
o o o
o
oo o o oo
o
o
o
o
o
o
oo
o oo
oo o o o
ooo
o o oooo oo o
oo o o o
o
oo o o o o o
o ooo o
o
oooo o o
o o
o
o
o o
o
oo o oo
o o
oo
o
o o
o
ooo o
o
o
oooo o o o o
o
o
o
o
o
o
o
o
o oo o oo o o
o
o
o
o ooooo
ooo ooo
o o oo
o
o ooo
o
ooooo
oooooooo ooo
oo
o
o
o
o
o
o
o o o
o o oo o
oo
oo o o
o o oo
ooo o o oo
oo
o oo o o o
o o oo o
o
o
o
o
o o
o
o oo o
ooo
oo o
oooooooo o
oo oo o
o
o
o
oo oo
o
oo
o
o
o
o oo
oo o o
o
o o
o o o
o o o oo o
o
o
o
oo o
o
o
o
o o o
oo
o
o
o
o
o
o
o oooo o o
oo
o
oo
o o
o o ooo o oo o ooo
o
o
o
o
o
o
o o
o
o ooo o
o
o
o o o
oo
o o
oo
o
oo
o
o o
oo
o
o
o
o
o
o
oo
o
o
o
o
oo o
oo o o
o o
oo oo
ooo
o o
o
o
o
o
o
oo o
ooo
o oo o
oo
o o
o
o
o
o
o o
o
o
-6
-2
-4
oooo
oo
oooo
o
oo
oo
o
-4
-2
o
0
107
2
0
o o oo
ooo
oo
o o ooo
o o o
oo o
oooooo ooooo o oooo
oo o
o o
oo
o o
o ooo
oo ooo
o o o o oo
o
o
oo
oooooo
o
oo
o o o ooo o
ooooo o
o o o
o
oo o o oo
o
o
o
o
o
o
oo
o oo
oo o o o
ooo
o o oooo oo o
oo o o o
o
oo o o o o o
o ooo o
o
oooo o o
o o
o
o
o o
o
oo o oo
o o
oo
o
o o
o
ooo o
o
o
oooo o o o o
o
o
o
o
o
o
o
o
o oo o oo o o
o
o
o
o ooooo
ooo ooo
o o oo
o
o ooo
o
ooooo
oooooooo ooo
oo
o
o
o
o
o
o
o o o
o o oo o
oo
oo o o
o o oo
ooo o o oo
oo
o oo o o o
o o oo o
o
o
o
o
o o
o
o oo o
ooo
oo o
oooooooo o
oo oo o
o
o
o
oo oo
o
oo
o
o
o
o oo
oo o o
o
o o
o o o
o o o oo o
o
o
o
oo o
o
o
o
o o o
oo
o
o
o
o
o
o
o oooo o o
oo
o
oo
o o
o o ooo o oo o ooo
o
o
o
o
o
o
o o
o
o ooo o
o
o
o o o
oo
o o
oo
o
oo
o
o o
oo
o
o
o
o
o
o
oo
o
o
o
o
oo o
oo o o
o o
oo oo
ooo
o o
o
o
o
o
o
oo o
ooo
o oo o
oo
o o
o
o
o
o
o o
o
o
-6
-2
-4
oooo
oo
oooo
o
oo
oo
o
-4
-2
o
0
2
3
2
3
-4
-2
-6
-2
2
o
o
1
-2
-3
Coordinate 10
-2
0
Coordinate 1
o
o
oo
o o o
o o
o
o o oooo o
o
o
o o o o
o
o oooo o o o oo
o
oo
o
o o
o
oo o oo
o
oo
o
o
oo
o o
o
o
ooooo o
o oo o o o o
ooo
o
o
o
o
o
o
o
o o
oo
o ooo ooooooo ooo oooo
o
o o oo ooo ooooo
o
o
o
o
o
o
o oo oooooo o o
o
o
o o
o o
oo ooo o o o
oo
ooo
o oo
oooooooooo oo
o oo o oooo
ooo o
o
o
o
o
o oo
o o o o oooo oo o
ooooo
o
o ooooo o
oo o
o
o oo
o ooo ooooo o o
o
o
oo o oo o oo oo
o
o
o
o
o
o
o
o
o
o
o
oo
o oo
oo oooo
oo oooooooooooooo
o oo o
oo o
o oo oooo o o
ooo
oo oo oo oooo
o o
o
o
o
o
o ooo ooooo oooo oo o oooooo o
o
oo o o o o oooo o
o oo o oooo ooo
o
oo
oo o o o o o o o o
o o oo o oooooooo ooo ooooooo o o
oo o oo
o o oo o o
o
o
o o ooo o o o ooo oo ooo
o
ooo oooo oo
ooo oo
o
oo o
o o o o oo o
o oo o oo
oo o
o
o o
oo o o
oooo o
o
o o
o
o
o
o o
o
o
o
o
o
o
-4
o o
oo
-1
3
2
1
0
Coordinate 7
-1
-2
oo o o
o
o
o oo
o oo
oo
oo oo
oo o
oo oooo o
o
o
oo
ooo oo
ooo
o
o
oooo o o o
o o
o o
o oo oo
oo o o
o
o oo oo
ooooo
o
oo oo
oo oo o
o
o o o o o oo o
o
oo o
oo
ooo oooo o
ooo o
o
o
oooo o o ooo o o
o o oo
o o oo
o
o
o o
o o ooooo ooo o
ooo
o oo oooooo o ooo ooo ooo o oo oooo
oo o
ooooo
o
o ooo o o oo o o oo oo
oo ooooooooo o oo ooo ooo oooo o
o
ooo oo o o oo o o ooo o
o o
o ooooo
oo
o oo
ooo o o ooo o o o o
o oo
oooooo
o
o
o oo
o ooooooo
o
o
oo oooo
o
oooo o o ooo
o o o ooo
ooo o o oo oooooo oo
o
o ooo
ooo
o
o o
o
oo oooo o o
o o o oo o o ooo o o o
oo oo oo o o
o
o
o
o
o
o
o
o
o
o
oo
oooo o oo oo oo o
oo o o oooooo
oo
oo o
o
oo
ooo
o o o
oo
ooo
oo
ooo
o oooo
oo
o
o
o ooo
o
o ooooooo
o o
oo o o
o
o oo o
o
o
oo o o
oo oo
o o
o
oo
ooo
Coordinate 2
o o
o
o
oo o o
o
o
-4
Coordinate 1
o
o
oo
o
o
o
oo
o
oo
o oo
o
o
o
o ooo o o
o
oo
o o
o
ooooo oo o oo
oo
o o oo
oo
o
o
o
oo
o
o
o
o
oo o
ooo ooo o o
o o o oo
ooooo
ooooo
o
o
o
oooooo ooooo oooo
o oo ooo oo
oooo oooo o oooooo
o oooo oooo o oo
o oooo ooooooooooo o
o o
oo
o o oo
o oo
oo oooooo
ooooo ooo
o ooo o
oo
o
o
o
oo
o ooooooo o oooo
oooooooo o ooo
o o oo o oo o ooo o oo
oo
ooo
o oo o o ooo o oo o ooo oo oooo
ooooo oo oooooo ooooooo o o o
ooo oooo o
o o o ooo o
o oo oo ooooooooooo o oo
o o oo o
o oo
o
o o ooooooooooo
o
o oooo o oooo oo
oo oooo oooooo
o
o
o
o
o
o
o
o
o
o
o
o oo o
ooo o o
o o
oo o
o oo o oo oo
o
oo oo
oo
o o o o oo o o o o o o
oo o o o
oo o
o o
oo
ooo
o oo
o o o oooo o
o
o o
o
o
o
oo
o
ooo
o
oo
oo
o
o
o
o
o
ooo
o
o oo o oo
o
Coordinate 3
-2
o
o
o
o
o
o
o
o
o o
o
o
o
o
oo
o
oo o
o o
o o
oo o oo o
o o
o
o o ooo
oooooo oo
o oo
o
o
o o
o
o oo
o
o o
oooo
oo
o o
o
o ooo
o
o oo
o
o
o
o
o
o
o
o
o o oo o o
o o
oooo oo o o ooo ooo
o
ooooo o ooo o oo o
oo ooo o
o
o oo
o
ooooo o oo
ooo oo ooo
o o
oo
oo
o ooooo ooooooo
o oo oooooo ooo
oo
oo o
oo
oo oo
oo oooo oooo oo ooo
oo
o
oo o
o
o o
o o o oooooo oo
ooo oo o
oo o
oooo o oooo o o oo oo
o
o oooo
oo
o o oo oooo ooo
oooo
oo
o oo ooo o o
oo
ooo o oo o ooooo o o
oo oo o o o
o o oo oooo ooo o oo ooo
oo
o o o ooooo
o
o
o
o
o
o o o oo
oo o ooo ooooo
o o ooooo
o o
o
oo oo oooooo oo o o o
o
o oo o
oo
ooo o o
oo o o
oo o
oo
oo o
o
oooo
o o oo
o oo o o
ooo
o
o
o
o
o oo
o
o oo o o
oo oo o oo
oo
o oooo o o o o
o
oo
o o o
oo
oo o
o
o
ooo
o oo
o
oo
o o o
o
oo
o
o
o
oo
oo o
o
o
oooo
o
o
o
o o
o
-2
Coordinate 3
-2
-1
oo
oo
o
Coordinate 9
FIGURE 4.8. Four projections onto pairs of canonical variates. Notice that as
+
+
+
+
max
a
at B a
at W a
or equivalently
max at B a subject to at W a = 1
a
max
a
at B a
at W a
or equivalently
max at B a subject to at W a = 1
a
max
a
at B a
at W a
or equivalently
max at B a subject to at W a = 1
a
largest eigenvalue of W 1 B.
a2 = arg max
a
at B a
subject to at W a1 = 0
at W a
1
Once again a2 = W 2 v2 .
In a similar fashion can find a3 , a4 , . . .
largest eigenvalue of W 1 B.
a2 = arg max
a
at B a
subject to at W a1 = 0
at W a
1
Once again a2 = W 2 v2 .
In a similar fashion can find a3 , a4 , . . .
largest eigenvalue of W 1 B.
a2 = arg max
a
at B a
subject to at W a1 = 0
at W a
1
Once again a2 = W 2 v2 .
In a similar fashion can find a3 , a4 , . . .
largest eigenvalue of W 1 B.
a2 = arg max
a
at B a
subject to at W a1 = 0
at W a
1
Once again a2 = W 2 v2 .
In a similar fashion can find a3 , a4 , . . .
largest eigenvalue of W 1 B.
a2 = arg max
a
at B a
subject to at W a1 = 0
at W a
1
Once again a2 = W 2 v2 .
In a similar fashion can find a3 , a4 , . . .
118
canonical
variates.
Classification
in Reduced Subspace
ooo
ooo
ooooo
oo
oo
o o oo
ooo
oo
oo ooo o
o o o
ooooo ooooo o oooo
ooo
oo
o o
oo
o o
o ooo
o
ooooo
o
o
o
o
o
o
oo
o
ooooo
o
o ooooo oo
o
o o o ooo o
oo o
o
oo oo o o oo
o
ooo
o oo o
oo o o o
oo o o o
oooooo o oo o oooo oo o
o o oo o
oooo o o
o oo o
o oo o o o o
o
oo o o
o
oo
oo
o
ooo o oooo
o
o
oo ooo o o
ooo o
oooo oooo
ooooo o o
oooo o o
o o
o
o
o
o
o
o
o
o
o
ooooo
o
o
ooooo oo o o
oo
o
o
o o oooo o o
oo
ooo oo oo o oo oo o o
o o oo
ooo o o oo
o
o
oo o o
oo o o
oo o o
oo
o
o
ooooo o
ooo o
ooo
o
o
o
o
o
o
o
o
o
oo o
o
oo oo
oooo
o oo o
o
o o
o
o o
o
o o o o
o o o oo o
o
o
o
o o
o o
o o o oo
oo
o oo o o o
o
o
o
o ooo
oo
o
oo
oo
oo
oooo
o
o
o
o
o
o
o o o oooo
oo
o
o
o
o o o oo
o
o
oo
o o
oo
o
oo
o
o o
oo
o
o
oooo
oo o
o o ooo
oo o o o
o
o
oo o
o o
oo
o
ooo
o o
o
oo o
ooo
o
oo o oo o o
o
o
o
o
Canonical Coordinate 2
o o
o
Canonical Coordinate 1
FIGURE 4.11. Decision boundaries for the vowel training data, in the two-dimensional subspace spanned by the first two canonical variates. Note that in
any higher-dimensional subspace, the decision boundaries are higher-dimensional
affine planes, and could not be represented as lines.
Logistic Regression
Logistic regression
Arises from trying to model the posterior probabilities of the
P (G = k|X = x) =
and k = K
P (G = K|X = x) =
exp(k0 + kt x)
P
t
1 + K1
l=1 exp(l0 + l x)
1+
PK1
l=1
1
exp(l0 + lt x)
Logistic regression
Arises from trying to model the posterior probabilities of the
P (G = k|X = x) =
and k = K
P (G = K|X = x) =
exp(k0 + kt x)
P
t
1 + K1
l=1 exp(l0 + l x)
1+
PK1
l=1
1
exp(l0 + lt x)
Logistic regression
This model: k = 1, . . . , K 1
P (G = k|X = x) =
and k = K
P (G = K|X = x) =
exp(k0 + kt x)
P
t
1 + K1
l=1 exp(l0 + l x)
1+
PK1
l=1
1
exp(l0 + lt x)
= {10 , 1t , 20 , 2t , . . .} and
P (G = k|X = x) = pk (x; )
`() = log
n
Y
i=1
pgi (xi ; )
n
X
log(pgi (xi ; ))
i=1
p1 (x; ) =
exp( t x)
and p2 (x; ) = 1 p1 (x; )
1 + exp( t x)
n
X
i=1
n h
X
i=1
n h
X
i=1
yi t xi yi log(1 + e
yi t xi log(1 + e
tx
tx
) (1 yi ) log(1 + e
tx
`() =
n h
X
i=1
yi t xi log(1 + e
tx
1 + exp( t xi )
i=1
n
X
exp( t xi )
=
xi yi
1 + exp( t xi )
i=1
=
n
X
i=1
xi (yi p1 (xi ; )) = 0
Newton-Raphson algorithm.
`() X
=
xi (yi p1 (xi ; ))
i=1
X
`()
=
xi xti p1 (xi ; )(1 p1 (xi ; ))
t
i=1
old
`()
t
1
`()
Then
`()
= Xt (y p)
and
`()
= Xt WX
t
= (Xt WX)1 Xt W X old + W1 (y p)
= (Xt WX)1 Xt Wz
with response
z = X old + W1 (y p)
known as the adjusted response. Note at iteration each W, p and
z change.
An toy example
Size 1/Wii
Size 1/Wii
Size 1/Wii
p
n h
X
i
X
t
yi (0 + t xi ) log(1 + e0 + xi )
|j |
max
0 ,1
i=1
j=1
Note:
meaningful,
Separating Hyperplanes
A hyperplane is defined as
{x : 0 + t x = 0}
A hyperplane is defined as
{x : 0 + t x = 0}
A hyperplane is defined as
{x : 0 + t x = 0}
x0
x
0 + T x = 0
If x0 L then x = .
1. For
any two points
x1 and x2xlying
The signed
distance
of point
to inLL,is T (x1 x2 ) = 0, and hence
1 T t
1
x0 =x
2.tFor
(xany
point
x0 ) x=
+0 .0 ) =
f (x) f (x)
0 in L, (
kk
kf 0 (x)k
3. The signed distance of any point x to L is given by
x0
x
0 + T x = 0
If x0 L then x = .
1. For
any two points
x1 and x2xlying
The signed
distance
of point
to inLL,is T (x1 x2 ) = 0, and hence
1 T t
1
x0 =x
2.tFor
(xany
point
x0 ) x=
+0 .0 ) =
f (x) f (x)
0 in L, (
kk
kf 0 (x)k
3. The signed distance of any point x to L is given by
x0
x
0 + T x = 0
1 T t
1
x0 =x
2.tFor
(xany
point
x0 ) x=
+0 .0 ) =
f (x) f (x)
0 in L, (
kk
kf 0 (x)k
3. The signed distance of any point x to L is given by
x0
x
0 + T x = 0
1 T t
1
x0 =x
2.tFor
(xany
point
x0 ) x=
+0 .0 ) =
f (x) f (x)
0 in L, (
kk
kf 0 (x)k
3. The signed distance of any point x to L is given by
Perceptron Learning
iM
yi (xti + 0 ) =
yi f,0 (xi )
iM
D(, 0 ) is non-negative.
D(, 0 ) is proportional to the distance of the misclassified
Questions:
Is there a unique , 0 which minimizes D(, 0 ) (disregarding
re-scaling of and 0 )
iM
yi (xti + 0 ) =
yi f,0 (xi )
iM
D(, 0 ) is non-negative.
D(, 0 ) is proportional to the distance of the misclassified
Questions:
Is there a unique , 0 which minimizes D(, 0 ) (disregarding
re-scaling of and 0 )
iM
yi (xti + 0 ) =
yi f,0 (xi )
iM
D(, 0 ) is non-negative.
D(, 0 ) is proportional to the distance of the misclassified
Questions:
Is there a unique , 0 which minimizes D(, 0 ) (disregarding
re-scaling of and 0 )
iM
yi (xti + 0 ) =
yi f,0 (xi )
iM
D(, 0 ) is non-negative.
D(, 0 ) is proportional to the distance of the misclassified
Questions:
Is there a unique , 0 which minimizes D(, 0 ) (disregarding
re-scaling of and 0 )
iM
yi (xti + 0 ) =
yi f,0 (xi )
iM
D(, 0 ) is non-negative.
D(, 0 ) is proportional to the distance of the misclassified
Questions:
Is there a unique , 0 which minimizes D(, 0 ) (disregarding
re-scaling of and 0 )
iM
X
D(, 0 )
=
yi
0
iM
and
0 0 + yi
iM
X
D(, 0 )
=
yi
0
iM
and
0 0 + yi
iM
X
D(, 0 )
=
yi
0
iM
and
0 0 + yi
iM
X
D(, 0 )
=
yi
0
iM
and
0 0 + yi
Current estimate
(0)
Point misclassified
by (0)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(17)
Is this the best separating hyperplane we could have found?
Cons
All separating hyperplanes are considered equally valid.
One found depends on the initial guess for and 0 .
The finite number of steps can be very large.
If the data is non-separable, the algorithm will not converge.
Cons
All separating hyperplanes are considered equally valid.
One found depends on the initial guess for and 0 .
The finite number of steps can be very large.
If the data is non-separable, the algorithm will not converge.
Cons
All separating hyperplanes are considered equally valid.
One found depends on the initial guess for and 0 .
The finite number of steps can be very large.
If the data is non-separable, the algorithm will not converge.
Cons
All separating hyperplanes are considered equally valid.
One found depends on the initial guess for and 0 .
The finite number of steps can be very large.
If the data is non-separable, the algorithm will not converge.
Cons
All separating hyperplanes are considered equally valid.
One found depends on the initial guess for and 0 .
The finite number of steps can be very large.
If the data is non-separable, the algorithm will not converge.
6?#)@,/*1-?,$,#)"A*B0?-$?/,"-1*C=D
6?#)@,/*1-?,$,#)"A*B0?-$?/,"-1*C=D
Optimal
Separating Hyperplane
Intuitively
!
!"#$%&'()*+'),("-.'/)"0)0%#&%#1)2)$',2(2*%#1)+3,'(,.2#')0"()2).%#'2(.3)
!"#$%&'()*+'),("-.'/)"0)0%#&%#1)2)$',2(2*%#1)+3,'(,.2#')0"()2).%#'2(.3)
?
$',2(2-.')&2*2$'*)456
?8)3!4@78A7=
78383
79856
:8383
:98;856
<8383
<9=8)6!>
$',2(2-.')&2*2$'*)456
9856
98;856
9=8)6!>8)3!4@78A7=
The optimal
separatinghyperplanes
hyperplane
separates the
two too close to t
Optimal
separating
!"#$"%&'%(")%#*'#*#()%"+,)-,./*)0%0"&1.2%3)%$"&&0)4%
Bad a hyperplane
passing
!"#$"%&'%(")%#*'#*#()%"+,)-,./*)0%0"&1.2%3)%$"&&0)4%
7
"
<
<
"
5*(1#(#6).+7%/%"+,)-,./*)%("/(%,/00)0%(&&%$.&0)%(&%(")%(-/#*#*8%)9/:,.)0%3#..%;)%0)*0#(#6)%
classes and
maximizes the distance
theand
closes
pointless
from
5*(1#(#6).+7%/%"+,)-,./*)%("/(%,/00)0%(&&%$.&0)%(&%(")%(-/#*#*8%)9/:,.)0%3#..%;)%0)*0#(#6)%
to to
noise
probably
likely to gen
(&%*�)%/*27%(")-)'&-)7%.)00%.#<).+%(&%8)*)-/.#=)%3)..%'&-%2/(/
(&%*�)%/*27%(")-)'&-)7%.)00%.#<).+%(&%8)*)-/.#=)%3)..%'&-%2/(/&1(0#2)%(")%(-/#*#*8%0)(
&1(0#2)%(")%(-/#*#*8%0)(
either class
[Vapnik
1996].
5*0()/27%#(%0)):0%-)/0&*/;.)%(&%)9,)$(%("/(%/%"+,)-,./*)%("/(%#0
'/-(")0(%'-&:%/..%
5*0()/27%#(%0)):0%-)/0&*/;.)%(&%)9,)$(%("/(%/%"+,)-,./*)%("/(%#0 '/-(")0(%'-&:%/..%
!
"
6: 6:
6: 6:
T
h
la
m
th
? ?
,( ,
#: (#:
/. /
%" .%"
+, +
)-,)
, . -,
/* ./
) *)
@/9#:1:
@/9#:1:
:/-8#*
:/-8#*
67 67
Which separating
!"#$%&'(#)%"*#%*+,##-$"*.",/01)1
!"#$%&'(#)%"*#%*+,##-$"*.",/01)1
2)(,$&%*3'#)-$$-4561'",
2)(,$&%*3'#)-$$-4561'",
7-8,1*.9:*;")<-$1)#0
7-8,1*.9:*;")<-$1)#0
6767
43/-%56"(37+&&78
+*'%9.2#(3:%;<<=>
+*'%9.2#(3:%;<<=>
hyperplane?43/-%56"(37+&&78
One which
maximizes
margin=>=>
6?#)@,/*1-?,$,#)"A*B0?-$?/,"-1*C=D
6?#)@,/*1-?,$,#)"A*B0?-$?/,"-1*C=D
Optimal
Separating Hyperplane
Intuitively
!
!"#$%&'()*+'),("-.'/)"0)0%#&%#1)2)$',2(2*%#1)+3,'(,.2#')0"()2).%#'2(.3)
!"#$%&'()*+'),("-.'/)"0)0%#&%#1)2)$',2(2*%#1)+3,'(,.2#')0"()2).%#'2(.3)
?
$',2(2-.')&2*2$'*)456
?8)3!4@78A7=
78383
79856
:8383
:98;856
<8383
<9=8)6!>
$',2(2-.')&2*2$'*)456
9856
98;856
9=8)6!>8)3!4@78A7=
The optimal
separatinghyperplanes
hyperplane
separates the
two too close to t
Optimal
separating
!"#$"%&'%(")%#*'#*#()%"+,)-,./*)0%0"&1.2%3)%$"&&0)4%
Bad a hyperplane
passing
!"#$"%&'%(")%#*'#*#()%"+,)-,./*)0%0"&1.2%3)%$"&&0)4%
7
"
<
<
"
5*(1#(#6).+7%/%"+,)-,./*)%("/(%,/00)0%(&&%$.&0)%(&%(")%(-/#*#*8%)9/:,.)0%3#..%;)%0)*0#(#6)%
classes and
maximizes the distance
theand
closes
pointless
from
5*(1#(#6).+7%/%"+,)-,./*)%("/(%,/00)0%(&&%$.&0)%(&%(")%(-/#*#*8%)9/:,.)0%3#..%;)%0)*0#(#6)%
to to
noise
probably
likely to gen
(&%*�)%/*27%(")-)'&-)7%.)00%.#<).+%(&%8)*)-/.#=)%3)..%'&-%2/(/
(&%*�)%/*27%(")-)'&-)7%.)00%.#<).+%(&%8)*)-/.#=)%3)..%'&-%2/(/&1(0#2)%(")%(-/#*#*8%0)(
&1(0#2)%(")%(-/#*#*8%0)(
either class
[Vapnik
1996].
5*0()/27%#(%0)):0%-)/0&*/;.)%(&%)9,)$(%("/(%/%"+,)-,./*)%("/(%#0
'/-(")0(%'-&:%/..%
5*0()/27%#(%0)):0%-)/0&*/;.)%(&%)9,)$(%("/(%/%"+,)-,./*)%("/(%#0 '/-(")0(%'-&:%/..%
!
"
6: 6:
6: 6:
T
h
la
m
th
? ?
,( ,
#: (#:
/. /
%" .%"
+, +
)-,)
, . -,
/* ./
) *)
@/9#:1:
@/9#:1:
:/-8#*
:/-8#*
67 67
Which separating
!"#$%&'(#)%"*#%*+,##-$"*.",/01)1
!"#$%&'(#)%"*#%*+,##-$"*.",/01)1
2)(,$&%*3'#)-$$-4561'",
2)(,$&%*3'#)-$$-4561'",
7-8,1*.9:*;")<-$1)#0
7-8,1*.9:*;")<-$1)#0
6767
43/-%56"(37+&&78
+*'%9.2#(3:%;<<=>
+*'%9.2#(3:%;<<=>
hyperplane?43/-%56"(37+&&78
One which
maximizes
margin=>=>
max
M subject to yi ( t xi + 0 ) M kk, i = 1, . . . , n
, 0 , kk = 1
1
min kk2 subject to yi ( t xi + 0 ) 1, i = 1, . . . , n
, 0 2
1
min kk2 subject to yi ( t xi + 0 ) 1, i = 1, . . . , n
, 0 2
1
min kk2 subject to yi ( t xi + 0 ) 1, i = 1, . . . , n
, 0 2
1
min kk2 subject to yi ( t xi + 0 ) 1, i = 1, . . . , n
, 0 2
"A*B0?-$?/,"-1*CDE
"+"-%&).%&+(/0"#1&2%)34&%,5/%44&")&(4&(&67#$)"*#&*6&
With this formulation of the problem
&:"(4&*6&).%&4%5(/()"#0&.;5%/52(#%
.+"/0%+1.%2)(+'-*.%&.+3..-%'%4#)-+%5%'-2%'%46'-.%730&8%)(
=
3 min
5 !&
, 0
1
kk2 subject to yi ( t xi + 0 ) 1, i = 1, . . . , n
2
$'6%1/4."46'-.%1'(%)-:)-)+.%(#6;+)#-(%&/%()$46/%(*'6)-,%+1.%
(0%3.%*1##(.%+1.%(#6;+)#-%:#"%31)*1%+1.%2)(*")$)-'-+%
The margin has thickness 1/kk as shown
distance between a point x and a plane (w, b)
.%:#"%+1.%+"')-)-,%.5'$46.(%*6#(.(+%+#%+1.%&#;-2'"/
slightly different).
3 5) ! & " @
=
,=
1.%!"#$#%!"&'1/4."46'-.
A
3
*.%:"#$%+1.%*6#(.(+%
2'"/%)(
3=5 ! &
3
"
$.(
$"
A
3
@
3
&
3
3=5 ! &
3
,<
in figure
(notation
|wT x+b|
is
X
1
Lp (, 0 , ) = kk2 +
i yi (1 t xi 0 )
2
i=1
X
1
i yi (1 t xi 0 )
Lp (, 0 , ) = kk2 +
2
i=1
X
1
i yi (1 t xi 0 )
Lp (, 0 , ) = kk2 +
2
i=1
1 Lp (1 , ) = 0
j 0 for j = 1, . . . , n
j (1 yj (0 + xtj )) = 0 for j = 1, . . . , n
(1 yj (0 + xtj )) 0 for j = 1, . . . , n
j yj xj
and 0 =
jA
j yj
jA
Lp (1 , ) = .5k k2 .
j yj xj
and 0 =
jA
j yj
jA
Lp (1 , ) = .5k k2 .
To summarize
As we have a convex optimization problem it has one local
minimum.
And if i
/ A then yi (0 + xti ) > 1 and xi lies outside of the
margin.
is a linear combination of the support vectors
X
=
j yj xj
jA
To summarize
As we have a convex optimization problem it has one local
minimum.
And if i
/ A then yi (0 + xti ) > 1 and xi lies outside of the
margin.
is a linear combination of the support vectors
X
=
j yj xj
jA
To summarize
As we have a convex optimization problem it has one local
minimum.
And if i
/ A then yi (0 + xti ) > 1 and xi lies outside of the
margin.
is a linear combination of the support vectors
X
=
j yj xj
jA
To summarize
As we have a convex optimization problem it has one local
minimum.
And if i
/ A then yi (0 + xti ) > 1 and xi lies outside of the
margin.
is a linear combination of the support vectors
X
=
j yj xj
jA
To summarize
As we have a convex optimization problem it has one local
minimum.
And if i
/ A then yi (0 + xti ) > 1 and xi lies outside of the
margin.
is a linear combination of the support vectors
X
=
j yj xj
jA
;<0(0"#>(./#(&>(&>#(&%(0"#(05&("3-#$-.)>#<(0")0(
3()0(0"#<#("3-#$-.)>#<(0"#(0#$,(3/45!+/6789:(
g
To summarize
"#(:9))'.,$;#&,'.2
(!/12(
0&$<(*&>0$/7;0#(
)>#
<>
!/ 3 / + /
:9))'.,$
;#&,'.2$?!@AB
>=(%$&,(0"#(II!
0"#(<;--&$0(D#*0&$<
#0(*&;.=(7#(
#*0&$<'()>=(
;.=(7#(0"#(<),#
If i
const
<=
==
X
Thus the SVMin
fact
only depends only a sm
=
yj x j
j
jA
How do I calculate ?
You have seen that the optimal solution is a weighted sum of
solutions to these problems are the same because of the original quadratic
cost function and linear inequality constraints.)
max
n
X
1 XX
i
i k yi yk xti xk
2
i=1
i=1
k=1
subject to i 0 i
How do I calculate ?
You have seen that the optimal solution is a weighted sum of
max
n
X
1 XX
i
i k yi yk xti xk
2
i=1
i=1
k=1
subject to i 0 i
How do I calculate ?
You have seen that the optimal solution is a weighted sum of
max
n
X
1 XX
i
i k yi yk xti xk
2
i=1
i=1
k=1
subject to i 0 i
How do I calculate ?
You have seen that the optimal solution is a weighted sum of
max
n
X
1 XX
i
i k yi yk xti xk
2
i=1
i=1
k=1
subject to i 0 i