Professional Documents
Culture Documents
1
CHNG 2
HC C GIM ST
SUPERVISED LEARNING
Ni dung
2.1 Khi nim
2.2 Hi quy
2.2.1 Gradient descent
2.2.2 Phng php o hm
2.2.3 Phng php xc sut
2.2.4 Hi quy trng s cc b
2.2.5 M hnh tuyn tnh tng qut
2.2.6 Phn loi nhiu lp
KHI NIM
Hc c gim st :
u vo l b d liu cng vi cc gi tr u ra ng tng ng.
Hc mi quan h gia u vo v gi tr u ra
T to ra cc gi tr u ra ng cho cc d liu u vo mi
KHI NIM
8/25/2010
2
KHI NIM
Mt s k hiu
m : s lng d liu hun luyn
x : gi tr cc bin u vo/ gi tr cc c tnh ca 1 mu
y : gi tr u ra/ch tng ng
Cp (x, y) m t 1 mu hun luyn
Trong tp d liu hun luyn (x
()
, y
()
) l mu hun luyn
th i
Hi quy
bt u hc ta phi chn cch biu din gi thuyt
VD. Trong bi ton d on gi phng ta chn
b
0
x = 0
0
+0
1
x
x : din tch phng
0
n
=0
(vi x
0
= 1)
b
0
x = 0
1
x (vi 0 = |0
0
, 0
1
, . . , 0
n
]
1
, v x = |1, x
1
, . . , x
n
]
1
)
Ta phi chn tham s 0 sao cho b
0
x gn nht vi y
Hi quy
Hm chi ph
[ 0 =
1
2
(b
0
x
()
-y
()
)
2
m
=1
Biu din mc gn y ca b
0
x trn ton b tp hun
luyn
Ta chn tham s 0 sao cho lm ti thiu hm [(0)
Hi quy
8/25/2010
3
Thut ton Gradient Descent Thut ton Gradient Descent
Thut ton gradient descent
Bt u vi 1 gi tr khi to
ngu nhin ca 0
Lp li bc cp nht
0
]
= 0
]
-o
o
o0
]
[(0)
Vi ] = u, . . , n
Cho ti khi hi t.
o l hng s dng, l tc
hc, v thng chn l gi tr
nh
Thut ton Gradient Descent
vi ch 1 mu hun luyn
o
o0
]
[ 0 =
o
o0
]
1
2
(b
0
x - y)
2
= b
0
x -y
o
o0
]
b
0
x -y
= b
0
x -y x
]
Lut cp nht (vi 1 mu)
0
]
= 0
]
-o b
0
x - y x
]
Thut ton Gradient Descent
Tng qut cho m mu hun luyn
0
]
= 0
]
-o b
0
x
()
-y
()
x
]
()
m
=1
Lut cp nht trn c lp i lp li cho n khi hi t
Thut ton trn c gi l : batch gradient descent
8/25/2010
4
Thut ton Gradient Descent
Thut ton stochastic gradient descent
(incremental gradient descent)
Lp
for i = 1 to m do
0
]
= 0
]
- o b
0
x
()
-y
()
x
]
()
Thut ton Gradient Descent
So snh batch gradient descent v stochastic gradient
descent:
C hai u l thut ton tm kim cc b
batch gradient descent phi duyt ton b cc mu
hun luyn trong mi bc lp (chi ph cao nu kch
thc tp hun luyn ln).
stochastic gradient descent duyt ln lt tng mu,
nn tm c ti im cc tr nhanh hn. Tuy nhin c
th n khng th hi t c n im cc tr m ch
dao ng xung quanh.
Thut ton Gradient Descent Thut ton Gradient Descent
K hiu matrix
7
0
[ =
o[
o0
0
o[
o0
n
R
n+1
Lut cp nht ca thut ton giauient uescent
0 = 0 - o7
0
[
8/25/2010
5
Phng php o hm
Tm 0 s dng phng php o hm
Tp hun luyn u vo
X =
x
0
(1)
x
1
(1)
x
n
(1)
x
0
(2)
x
1
(2)
x
n
(2)
x
0
(m)
x
1
(m)
x
n
(m)
=
(x
(1)
)
1
(x
(2)
)
1
(x
(m)
)
1
Vi x
0
(])
= 1 (] = 1, . . , m)
Phng php o hm
Gi tr ch
= y
(1)
, . . , y
(m)
1
Cc tham s
0 = 0
0
, . . , 0
n
1
Ta c b
0
x
]
= (x
(])
)
1
0
X0 - =
x
1
1
0
x
2
1
0
x
m
1
0
-
y
1
y
2
y
m
=
b
0
x
1
-y
1
b
0
x
2
-y
2
b
0
x
m
-y
m
Phng php o hm
Vi z l vector bt k th z
1
z = z
7
0
[ = X
1
X0 -X
1
8/25/2010
6
Phng php o hm
S dng phng trnh chun tc (Normal Equation)
ti thiu [ ta gn o hm bc nht ca n bng
khng
7
0
[ = u
X
1
X0 = X
1
Ta thu c
0 = (X
1
X)
-1
X
1
Hi quy
Hi quy tuyn tnh b
0
(x) = 0
0
+0
1
x
Hi quy
Hi quy a thc
Hi quy a thc bc hai :b
0
(x) = 0
0
+ 0
1
x +0
2
x
2
Hi quy a thc bc k :b
0
(x) = 0
0
+0
1
x +0
2
x
2
+. . +0
k
x
k
8/25/2010
7
Hi quy
Tm 0 thng qua cch biu din bng xc sut
y
()
= 0
1
x
()
+ e
()
e
()
l h s li
Gi s e
()
l cc gi tr c lp v c cng phn b (IID),
v n tun theo phn phi chun Gaussian e
()
~J(u, o
2
)
p e
=
1
2n
exp(-
(e
()
)
2
2o
2
)
Suy ra
p y
x
; 0 =
1
2n
exp(-
(y
-0
1
x
()
)
2
2o
2
)
Hi quy
Normal distribution
Hi quy
Cho cc mu hun luyn X, v 0, hi phn phi ca y
()
l g ?
Xc sut ca d liu c cho bi p(|X; 0)
Mi quan h gia 0, X v y c biu din bng hm ca y l
I 0 = I 0; X, = p(|X; 0)
V ta gi s e
()
l c lp (do y
()
l c lp khi bit x
()
),
ta c th vit li hm kh nng (likelihood) I(0) nh sau:
I 0 = _p y
x
; 0
m
=1
= _
1
2n
exp(-
(y
-0
1
x
()
)
2
2o
2
)
m
=1
Hi quy
Chng ta mun chn 0 sao cho kh nng ny l ln nht
(maximum likelihood): tm 0 sao cho
I(0) = _
1
2n
exp(-
(y
-0
1
x
()
)
2
2o
2
)
m
=1
t max.
Mo: tm 0 sao cho log I(0) t max th n cng lm I(0)
t max.
8/25/2010
8
Hi quy
0 = log I(0) = log _
1
2n
exp(-
(y
-0
1
x
()
)
2
2o
2
)
m
=1
0 = log
1
2n
exp(-
(y
- 0
1
x
()
)
2
2o
2
)
m
=1
0 = mlog
1
2no
-
1
o
2
1
2
(y
-0
1
x
()
)
2
m
=1
Tng ng vi tm 0 ti thiu
1
2
(y
-0
1
x
()
)
2
m
=1
Hi quy
Nhn xt:
Vi gi s v xc sut trn d liu th hi quy bnh
phng sai s nh nht tng ng vi tm c lng
kh nng ln nht ca 0
Vic la chn 0 khng ph thuc vo o trong gi s m
hnh xc sut ca d liu.
Hi quy
Hi quy tuyn tnh trng s cc b (Locally weighted
linear regression)
Hi quy tuyn tnh trng s cc b
Hi quy tuyn tnh trng s cc b tm 0 ti thiu
w
()
(y
-0
1
x
)
2
w
()
l s khng m c gi l trng s
w
()
= exp(-
(x
-x)
2
2
2
)
l tham s rng bng thng (bandwidth)
8/25/2010
9
Hi quy
Thut ton hc tham s - parametric algorithm (vd hi quy
khng trng s): ta cn mt s lng hu hn, c nh cc
tham s 0. Sau khi ta tm c b gi tr tham s ph hp
th ta khng cn lu gi li b d liu hun luyn d
on na.
Thut ton hc khng tham s - non-parametric algorithm
(vd. Hi quy tuyn tnh trng s cc b): d on trng
hp mi ta lun cn lu gi cc d liu trong tp hun
luyn. Ta khng c mt m hnh tng qut d on
chung.
PHN LOI V HI QUY
LOGIC
Phn loi
Xt trng hp phn loi nh phn
y u,1
VD. Phn loi th rc, phn loi bi bo a thch,
y
1
0
x
Hi quy logic
B qua vic y l cc gi tr ri rc, s dng m hnh hi
quy trc d on y vi u vo x.
Hiu qu rt ti
Cc gi tr > 1 hoc < u ca b
0
(x) khng c xt
Ci tin: thay i dng ca b
0
(x)
b
0
x = g 0
1
x =
1
1 + c
-0
T
x
g z =
1
1+c
-z
gi l hm logistic hoc hm sigmoid
8/25/2010
10
Hi quy logic Hi quy logic
g z =
1
1 +c
-z
o hm ca hm sigmoid
g
i
z =
og
oz
1
1 +c
-z
=
1
1 +c
-z 2
c
-z
=
1
1 +c
-z
1 -
1
1 + c
-z
= g(z)(1 -g(z))
Hi quy logic
Tm 0 ?
Xy dng hm kh nng (likelihood)
Tm 0 sao cho hm kh nng t gi tr ln nht trn
tp hun luyn
(Xy dng thut ton lp gradient descent tm 0)
Xy dng hm kh nng (likelihood)
Gi s
P y = 1 x; 0 = b
0
(x)
P y = u x; 0 = 1 -b
0
(x)
Vit li
p y x; 0 = b
0
(x)
1 -b
0
(x)
1-
Gi s m mu hun luyn c sinh ra c lp
I 0 = p X; 0 = _p y
x
; 0
m
=1
= _ b
0
(x
)
i
1 -b
0
(x
)
1-
i
m
=1
8/25/2010
11
Tm 0
Tm 0 sao cho lm ti a logarithm ca hm kh nng
0 = log I(0)
= y
log b
0
(x
) + 1 -y
log(1 -b
0
(x
))
m
=1
Lm th no ti a gi tr ca 0 ?
S dng thut ton lp gradient ascent
Lut cp nht:
0 = 0 +o7
0
0
Tm 0
o
o0
]
0 = y
1
g 0
1
x
- 1 - y
1
1 - g 0
1
x
o
o0
]
g 0
1
x
= y
1
g 0
1
x
- 1 -y
1
1 -g 0
1
x
g 0
1
x (1 -g 0
1
x )
o
o0
]
0
1
x
= y 1 -g 0
1
x - 1 -y g 0
1
x x
]
= (y - b
0
(x))x
]
Gradient ascent
Lut cp nht vi gradient ascent
0
]
= 0
]
-o(b
0
x
-y
()
)x
]
()
Thut ton khc tm max (0)
Thut ton lp Newton: s dng tm im 0 ca hm.
: R R
Tm 0 sao cho 0 = u
Lut cp nht 0 trong bc lp:
0 = 0 -
(0)
i
(0)
8/25/2010
12
Phng php Newton-Raphson Phng php Fischer scoring
p dng vo tm gi tr 0 sao cho (0) cc i.
Tm im khng ca hm 0 =
i
(0)
Lut cp nht :
0 = 0 -
i
(0)
ii
(0)
Hay :
0 = 0 -E
-1
7
0
0
Trong
E
]
=
o
2
0
o0
o0
]
M hnh tuyn tnh tng qut
Trong cc phn trc ta c
y|x; 0~J(p, o
2
) trong hi quy
y|x; 0~Beinoulli() trong hi quy logic
Tt c cc trng hp ny u thuc v mt h m hnh l
m hnh tuyn tnh tng qut (Generalized Linear Models)
M hnh tuyn tnh tng qut
H phn phi dng hm m
p y; p = b(y)c
q
T
1 -u(q)
Trong :
p: tham s t nhin (natural parameter)
I(y) : sufficient statistic (thng trong cc trng hp
phn phi c xem xt th I y = y)
o(p): log partition function
c
-u(q)
: hng s chun ha ( tng cc y; p theo y
bng 1)
8/25/2010
13
H phn phi dng hm m
Phn phi Bernoulli:
p y; =
(1 -)
1-
= exp(y log + 1 -y log(1 -) )
= exp log
1 -
y + log(1 -)
Ta c:
p = log
q
1-q
, suy ia =
1
1-c
-q
I y = y
o p = -log 1 - = log(1 +c
q
)
b y = 1
H phn phi dng hm m
Phn phi Gaussian:
Nhn xt: trong hi quy tuyn tnh vic la chn 0 khng ph thuc
vo o do ta c th chn o ty (VD. Chn o = 1)
p y; p =
1
2n
exp -
1
2
(y -p)
2
=
1
2n
exp -
1
2
y
2
exp py -
1
2
p
2
H phn phi dng hm m
Phn phi Gaussian
Ta c:
p = p
I y = y
o p =
2
2
=
q
2
2
b y =
1
2n
exp -
2
2
Xy dng mt phn phi dng hm m
Gi s:
y|x; 0 ExponcntiolFomily(p): cho x, 0 phn phi ca y
tun theo mt h phn phi dng hm m vi tham s p
Cho x, ta cn d on gi tr I(y) (thng th I y = y).
C ngha l ta d on gi tr u ra b
0
(x) bng gi
thuyt ca chng ta tha mn b x = E|y|x]
Tham s t nhin p v u vo x c quan h tuyn tnh:
p = 0
1
x
8/25/2010
14
Xy dng mt phn phi dng hm m
Hi quy logic:
y {u,1] nn ta chn h Bernoulli m hnh phn phi
c iu kin ca y theo x.
Trong phn phi Bernoulli theo h hm m ta c
=
1
1+c
-q
Trong phn phi Bernoulli E y|x; 0 =
Gi thuyt
b
0
x = E y|x; 0 = =
1
1 +c
-q
=
1
1 +c
-0
T
x
Phn loi nhiu lp
Phn loi nhiu lp vi softmax regression:
VD. Phn loi th thnh cc loi {c nhn, cng vic, th
rc}
Mi u ra nhn 1 trong k gi tr c th {1,2, . . , k]
u ra vn l gi tr ri rc nhng c th nhn nhiu hn
2 gi tr
Ta s m hnh ha n nh l phn phi a thc (nhiu bin)
Phn loi nhiu lp Phn loi nhiu lp
Ta s dng cc tham s
1
, . . ,
k
tham s ha cho k
u ra c th (
l xc sut u ra l lp i)
Tuy nhin ta c
k
=1
= 1, do ch cn k -1 tham s
1
, . . ,
k-1
v
k
= 1 -
k-1
=1
biu din phn phi a thc di dng phn phi h
hm m ta nh ngha I(y) R
k-1
I 1 =
1
u
u
. .
u
I 2 =
u
1
u
. .
u
I k -1 =
u
u
u
. .
1
I k =
u
u
u
. .
u
8/25/2010
15
Phn loi nhiu lp
(I(y))
l thnh phn th i ca I
Hm 1{. ]
1 Iruc = 1. V d 1 S = S = 1
1 Folsc = u. V d 1 S > S = u
Mi quan h ca I(y) v y s l (I(y))
= 1{y = i]
Ta c E I y
= P y = i =
2
1{=2]
. .
k
1{=k]
=
1
1{=1]
2
1{=2]
. .
k-1
1{=k-1]
k
1- 1{=]
k-1
i=1
=
1
1
1
2
1
2
. .
k-1
1
k-1
k
1- 1
i
k-1
i=1
= exp _ I y
1
log
1
+ I y
2
log
2
+ . . + I y
k-1
log
k-1
+ 1 - I y
k-1
=1
log
k
_
= exp I y
1
log(
k
) + I y
2
log(
k
) +. . + I y
k-1
log(
k-1
k
) + log
k
= b y exp p
1
I y - o(p)
Phn loi nhiu lp
Trong
p =
log
q
1
q
k
log
q
2
q
k
. .
log
q
k-1
q
k
o p = -log
k
b y = 1
Phn loi nhiu lp
p l vector trong p
= log
q
i
q
k
vi i = 1, . . , k - 1
V p
k
= log
q
k
q
k
= u
Ta c:
c
q
i
=
k
c
q
i
=
k
c
q
i
k
=1
=
k
=1
= 1
Suy ra
k
=
1
c
q
i
k
i=1
,
v
=
c
q
i
c
q
] k
]=1
,
8/25/2010
16
Phn loi nhiu lp
Theo gi s 3 ta c p
= 0
1
x (vi i = 1, . . , k - 1) trong
0
R
n+1
l cc tham s ca m hnh
nh ngha 0
k
= u v th p
k
= 0
k
1
x = u
Gi s v xc sut c iu kin ca y vi x l
p y = i x; 0 =
=
c
q
i
c
q
]
k
]=1
=
c
0
i
T
x
c
0
i
T
x k
]=1
Softmax regression l trng hp tng qut ca logistic
regression
Phn loi nhiu lp
Gi thuyt s cho u ra
b
0
x = E I y |x; 0
= E
1{y = 1]
1{y = 2]
. .
1{y = k - 1]
_ x; 0 =
1
2
. .
k-1
=
c
0
1
T
x
c
0
i
T
x k
]=1
c
0
2
T
x
c
0
i
T
x k
]=1
. .
c
0
k-1
T
x
c
0
i
T
x k
]=1
Phn loi nhiu lp
Vi tp mu gm m mu hun luyn, ta c hm logarithm
ca hm kh nng
0 = log p(y
|x
; 0)
m
=1
= log
c
0
i
T
x
(i)
c
0
i
T
x
(i)
k
]=1
1{
i
=I]
k
I=1
m
=1
Ta c th tm cc tham s lm 0 t gi tr ln nht bng
cch s dng gradient ascent hoc phng php Newton