Professional Documents
Culture Documents
Bayes Theorem
MAP, ML hypotheses
MAP learners
Bayes optimal classifier
Naive Bayes learner
Bayesian belief networks
Bayes Theorem
P (h|D) =
P (D|h)P (h)
P (D)
Choosing Hypotheses
P (h|D) =
P (D|h)P (h)
P (D)
Generally want the most probable hypothesis given the training data
Maximum a posteriori hypothesis hM AP :
hM AP
P (D|h)P (h)
hH
P (D)
= arg max P (D|h)P (h)
= arg max
hH
If assume P (hi ) = P (hj ) then can further simplify, and choose the Maximum likelihood (ML) hypothesis
hM L = arg max P (D|hi )
hi H
n
X
i=1
P (B|Ai )P (Ai )
Pn
i=1
P (Ai ) = 1, then
P (D|h)P (h)
P (D)
P(h )
P(h|D1)
hypotheses
(a)
P(h|D1, D2)
hypotheses
( b)
hypotheses
( c)
Characterizing Learning
Equivalent MAP Learners
Algorithms
Inductive system
Training examples D
Candidate
Elimination
Algorithm
Hypothesis space H
Output hypotheses
Brute force
MAP learner
P(h) uniform
P(D|h) = 0 if inconsistent,
= 1 if consistent
Prior assumptions
made explicit
by
y
f
hML
e
m
X
i=1
(di h(xi ))
hM L
argmax p(D|h)
hH
argmax
hH
argmax
hH
m
Y
p(di |h)
i=1
m
Y
i=1
2 2
e 2 (
di h(xi ) 2
)
hM L
= argmax
m
X
ln
1
2
2 2
1 di h(xi )
= argmax
hH
i=1
hH
i=1
m
X
= argmax
hH
= argmin
hH
m
X
(di h(xi ))
i=1
m
X
(di h(xi ))
i=1
10
di h(xi )
11
arg max
vj V
hi H
Example:
P (h1 |D) = .4, P (|h1 ) = 0, P (+|h1 ) = 1
P (h2 |D) = .3, P (|h2 ) = 1, P (+|h2 ) = 0
P (h3 |D) = .3, P (|h3 ) = 1, P (+|h3 ) = 0
therefore
X
= .4
= .6
hi H
hi H
and
arg max
vj V
hi H
12
13
vM AP
argmax
vj V
P (ai |vj )
which gives
Naive Bayes classifier: vN B = argmax P (vj )
vj V
14
Y
i
P (ai |vj )
Y
ai x
15
P (ai |vj )
16
Conditional Independence
17
Storm
BusTourGroup
S,B
Lightning
Campfire
0.4
0.1
0.8
0.2
0.6
0.9
0.2
0.8
Campfire
Thunder
ForestFire
18
Storm
BusTourGroup
S,B
Lightning
Campfire
0.4
0.1
0.8
0.2
0.6
0.9
0.2
0.8
Campfire
Thunder
ForestFire
n
Y
P (yi |P arents(Yi ))
i=1
19
20
21
22
dD
j wijk = 1
0 wijk 1
23
24
25