You are on page 1of 25

Bayesian Learning

It involves direct manipulation of probabilities in order to find correct hypotheses The quantities of interest are governed by probability distributions Optimal decisions can be made by reasoning about those probabilities

Cao Hoang Tru CSE Faculty - HCMUT

1 25 November 2008

Bayesian Learning
Bayesian learning algorithms are among the most practical approaches to certain type of learning problems They provide a useful perspective for understanding many learning algorithms that do not explicitly manipulate probabilities

Cao Hoang Tru CSE Faculty - HCMUT

2 25 November 2008

Features of Bayesian Learning


Each training example can incrementally decrease or increase the estimated probability that a hypothesis is correct Prior knowledge can be combined with observed data to determine the final probability of a hypothesis Hypotheses with probabilities can be accommodated New instances can be classified by combining multiple hypotheses weighted by their probabilities
Cao Hoang Tru CSE Faculty - HCMUT 3 25 November 2008

Bayes Theorem
P(D | h)P(h) P(h | D) = P(D) P(h): prior probability of hypothesis h P(D): prior probability of training data D P(h | D): probability that h holds given D P(D | h): probability that D is observed given h
Cao Hoang Tru CSE Faculty - HCMUT 4 25 November 2008

Bayes Theorem
Maximum A-posteriori hypothesis (MAP):
(dependent on experience)

hMAP = argmaxhH P(h | D) = argmaxhH P(D | h)P(h) P(h) is not a uniform distribution over H.

Cao Hoang Tru CSE Faculty - HCMUT

5 25 November 2008

Bayes Theorem
Maximum Likelihood hypothesis (ML): hML = argmaxhH P(h | D) = argmaxhH P(D | h) P(h) is a uniform distribution over H.

Cao Hoang Tru CSE Faculty - HCMUT

6 25 November 2008

Bayes Theorem
0.008 of the population have cancer Positive results are classified correctly in ony 98% cases Negative results are classified correctly in only 97% cases Would a new patient with a positive result have cancer or not?
Cao Hoang Tru CSE Faculty - HCMUT 7 25 November 2008

Bayes Theorem
0.008 of the population have cancer Positive results are classified correctly in ony 98% cases Negative results are classified correctly in only 97% cases Would a new patient with a positive result have cancer or not? P(cancer|) >< P(cancer|) ?
Cao Hoang Tru CSE Faculty - HCMUT 8 25 November 2008

Bayes Theorem
P(cancer) = .008 P(cancer) = .992 P(|cancer) = .98 P( |cancer) = .97 P(|cancer) = .03 P(cancer|) P(|cancer)P(cancer) = .0078 P(cancer|) P(|cancer)P(cancer) = .0298

Cao Hoang Tru CSE Faculty - HCMUT

9 25 November 2008

Bayes Theorem
Maximum A-posteriori hypothesis (MAP): hMAP = argmaxh{cancer, cancer} P(h | ) = argmaxh{cancer, cancer} P( | h)P(h) = cancer

Cao Hoang Tru CSE Faculty - HCMUT

10 25 November 2008

Bayes Theorem and Concept Learning


P(D | h)P(h) P(h | D) = P(D) P(h) = 1/|H| P(D | h) = 1 iff h is consistent with D; 0 otherwise P(D) = |VersionSpaceHD|/|H|
Cao Hoang Tru CSE Faculty - HCMUT 11 25 November 2008

Bayes Theorem and Concept Learning


FIND-S: P(h1) P(h2) if h1 is more specific than h2 hMAP = argmaxhH P(D | h)P(h) = hmost specific

Cao Hoang Tru CSE Faculty - HCMUT

12 25 November 2008

Maximum Likelihood and Least-Squared Error Hypotheses


y di h(xi) h hML

xi
Cao Hoang Tru CSE Faculty - HCMUT

x
13 25 November 2008

Maximum Likelihood and Least-Squared Error Hypotheses


hML = argmaxhH P(D | h) = argmaxhH i=1,m P(di | h) = argmaxhH i=1,m k.ec(di h(xi )) 2 = argminhH i=1,m (di h(xi))2 Normal distribution
Cao Hoang Tru CSE Faculty - HCMUT 14 25 November 2008

Minimum Description Length Principle


hMAP = argmaxhH P(D | h)P(h) = argmaxhH log2P(D | h) + log2P(h) = argminhH log2P(D | h) log2P(h) log2P(h): the description length of h under optimal encoding log2P(D | h): the description of D given h under optimal encoding
Cao Hoang Tru CSE Faculty - HCMUT 15 25 November 2008

Bayes Optimal Classifier


What is the most probable hypothesis given the training data?

?
What is the most probable classification of a new instance given the training data?

Cao Hoang Tru CSE Faculty - HCMUT

16 25 November 2008

Bayes Optimal Classifier


Hypothesis space = {h1, h2, h3} Posterior probabilities = {.4, .3., .3} (h1 is hMAP) New instance x is classified positive by h1 and negative by h2 and h3 What is the most probable classification of x?

Cao Hoang Tru CSE Faculty - HCMUT

17 25 November 2008

Bayes Optimal Classifier


The most probable classification of a new instance is obtained by combining the predictions of all hypotheses weighted by their posterior probabilities: argmaxcC P(c | D) = argmaxcC hH P(c | h).P(h | D)

Cao Hoang Tru CSE Faculty - HCMUT

18 25 November 2008

Naive Bayes Classifier


Each instance x is described by a conjunction of attribute values <a1, a2, ..., an> The target function f(x) can take on any value from a finite set C It is to assign the most probable target value to a new instance

Cao Hoang Tru CSE Faculty - HCMUT

19 25 November 2008

Naive Bayes Classifier


cMAP = argmaxcC P(c | a1, a2, ..., an) = argmaxcC P(a1, a2, ..., an | c).P(c) cNB = argmaxcC i=1,nP(ai | c).P(c) assuming that a1, a2, ..., an are independent given c

Cao Hoang Tru CSE Faculty - HCMUT

20 25 November 2008

Naive Bayes Classifier


x = (Outlook = sunny, Temp = cool, Humidity = high, Wind = strong) P(Play Tennis = yes | x) = ? P(Play Tennis = no | x) = ?

Cao Hoang Tru CSE Faculty - HCMUT

21 25 November 2008

Naive Bayes Classifier


x = (Outlook = sunny, Temp = cool, Humidity = high, Wind = strong) P(Play Tennis = yes | x) = ? P(Play Tennis = no | x) = ? c (a1, a2, ..., an)

Cao Hoang Tru CSE Faculty - HCMUT

22 25 November 2008

Naive Bayes Classifier


Estimating probabilities: nc + mp n+m

n: total number of training examples of a particular class nc: number of training examples having a particular attribute value in that class m: equivalent sample size p: prior estimate of the probability (= 1/k where k is the number of possible values of the attribute)
Cao Hoang Tru CSE Faculty - HCMUT 23 25 November 2008

Naive Bayes Classifier


Learning to classify text:
position i in the text

cNB = argmaxcC i=1,nP(ai = wk | c).P(c) = argmaxcC i=1,nP(wk | c).P(c)


assuming that all words have equal chance occuring in every position
Cao Hoang Tru CSE Faculty - HCMUT 24 25 November 2008

Exercises
In Mitchells ML (Chapter 6): 6.1 to 6.4

Cao Hoang Tru CSE Faculty - HCMUT

25 25 November 2008

You might also like