Bayesian Learning

Bayesian Learning
It involves direct manipulation of probabilities in order to find correct hypotheses The quantities of interest are governed by probability distributions Optimal decisions can be made by reasoning about those probabilities
Cao Hoang Tru CSE Faculty - HCMUT
1 25 November 2008
Bayesian Learning
Bayesian learning algorithms are among the most practical approaches to certain type of learning problems They provide a useful perspective for understanding many learning algorithms that do not explicitly manipulate probabilities
2 25 November 2008
Features of Bayesian Learning

Each training example can incrementally decrease or increase the estimated probability that a hypothesis is correct Prior knowledge can be combined with observed data to determine the final probability of a hypothesis Hypotheses with probabilities can be accommodated New instances can be classified by combining multiple hypotheses weighted by their probabilities
Cao Hoang Tru CSE Faculty - HCMUT 3 25 November 2008
Bayes Theorem
P(D | h)P(h) P(h | D) = P(D) P(h): prior probability of hypothesis h P(D): prior probability of training data D P(h | D): probability that h holds given D P(D | h): probability that D is observed given h
Bayes Theorem
Maximum A-posteriori hypothesis (MAP):
(dependent on experience)
hMAP = argmaxhH P(h | D) = argmaxhH P(D | h)P(h) P(h) is not a uniform distribution over H.
5 25 November 2008
Bayes Theorem
Maximum Likelihood hypothesis (ML): hML = argmaxhH P(h | D) = argmaxhH P(D | h) P(h) is a uniform distribution over H.
6 25 November 2008
Bayes Theorem
0.008 of the population have cancer Positive results are classified correctly in ony 98% cases Negative results are classified correctly in only 97% cases Would a new patient with a positive result have cancer or not?
Bayes Theorem
0.008 of the population have cancer Positive results are classified correctly in ony 98% cases Negative results are classified correctly in only 97% cases Would a new patient with a positive result have cancer or not? P(cancer|) >< P(cancer|) ?
Bayes Theorem
P(cancer) = .008 P(cancer) = .992 P(|cancer) = .98 P( |cancer) = .97 P(|cancer) = .03 P(cancer|) P(|cancer)P(cancer) = .0078 P(cancer|) P(|cancer)P(cancer) = .0298
9 25 November 2008
Bayes Theorem
Maximum A-posteriori hypothesis (MAP): hMAP = argmaxh{cancer, cancer} P(h | ) = argmaxh{cancer, cancer} P( | h)P(h) = cancer
10 25 November 2008
Bayes Theorem and Concept Learning

P(D | h)P(h) P(h | D) = P(D) P(h) = 1/|H| P(D | h) = 1 iff h is consistent with D; 0 otherwise P(D) = |VersionSpaceHD|/|H|
Bayes Theorem and Concept Learning

FIND-S: P(h1) P(h2) if h1 is more specific than h2 hMAP = argmaxhH P(D | h)P(h) = hmost specific
12 25 November 2008
Maximum Likelihood and Least-Squared Error Hypotheses

y di h(xi) h hML
xi
x
13 25 November 2008
Maximum Likelihood and Least-Squared Error Hypotheses

hML = argmaxhH P(D | h) = argmaxhH i=1,m P(di | h) = argmaxhH i=1,m k.ec(di h(xi )) 2 = argminhH i=1,m (di h(xi))2 Normal distribution
Minimum Description Length Principle

hMAP = argmaxhH P(D | h)P(h) = argmaxhH log2P(D | h) + log2P(h) = argminhH log2P(D | h) log2P(h) log2P(h): the description length of h under optimal encoding log2P(D | h): the description of D given h under optimal encoding
Bayes Optimal Classifier

What is the most probable hypothesis given the training data?
?
What is the most probable classification of a new instance given the training data?
16 25 November 2008

Hypothesis space = {h1, h2, h3} Posterior probabilities = {.4, .3., .3} (h1 is hMAP) New instance x is classified positive by h1 and negative by h2 and h3 What is the most probable classification of x?
17 25 November 2008

The most probable classification of a new instance is obtained by combining the predictions of all hypotheses weighted by their posterior probabilities: argmaxcC P(c | D) = argmaxcC hH P(c | h).P(h | D)
18 25 November 2008
Naive Bayes Classifier

Each instance x is described by a conjunction of attribute values <a1, a2, ..., an> The target function f(x) can take on any value from a finite set C It is to assign the most probable target value to a new instance
19 25 November 2008

cMAP = argmaxcC P(c | a1, a2, ..., an) = argmaxcC P(a1, a2, ..., an | c).P(c) cNB = argmaxcC i=1,nP(ai | c).P(c) assuming that a1, a2, ..., an are independent given c
20 25 November 2008

x = (Outlook = sunny, Temp = cool, Humidity = high, Wind = strong) P(Play Tennis = yes | x) = ? P(Play Tennis = no | x) = ?
21 25 November 2008

x = (Outlook = sunny, Temp = cool, Humidity = high, Wind = strong) P(Play Tennis = yes | x) = ? P(Play Tennis = no | x) = ? c (a1, a2, ..., an)
22 25 November 2008

Estimating probabilities: nc + mp n+m
n: total number of training examples of a particular class nc: number of training examples having a particular attribute value in that class m: equivalent sample size p: prior estimate of the probability (= 1/k where k is the number of possible values of the attribute)

Learning to classify text:
position i in the text
cNB = argmaxcC i=1,nP(ai = wk | c).P(c) = argmaxcC i=1,nP(wk | c).P(c)

assuming that all words have equal chance occuring in every position
Exercises
In Mitchells ML (Chapter 6): 6.1 to 6.4
25 25 November 2008

Bayesian Learning

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bayesian Learning

Uploaded by

Copyright:

Available Formats

Bayesian Learning

Cao Hoang Tru CSE Faculty - HCMUT

Cao Hoang Tru CSE Faculty - HCMUT

Features of Bayesian Learning

Cao Hoang Tru CSE Faculty - HCMUT

Cao Hoang Tru CSE Faculty - HCMUT

Cao Hoang Tru CSE Faculty - HCMUT

Cao Hoang Tru CSE Faculty - HCMUT

Bayes Theorem and Concept Learning

Bayes Theorem and Concept Learning

Cao Hoang Tru CSE Faculty - HCMUT

Maximum Likelihood and Least-Squared Error Hypotheses

Maximum Likelihood and Least-Squared Error Hypotheses

Minimum Description Length Principle

Bayes Optimal Classifier

Cao Hoang Tru CSE Faculty - HCMUT

Bayes Optimal Classifier

Cao Hoang Tru CSE Faculty - HCMUT

Bayes Optimal Classifier

Cao Hoang Tru CSE Faculty - HCMUT

Naive Bayes Classifier

Cao Hoang Tru CSE Faculty - HCMUT

Naive Bayes Classifier

Cao Hoang Tru CSE Faculty - HCMUT

Naive Bayes Classifier

Cao Hoang Tru CSE Faculty - HCMUT

Naive Bayes Classifier

Cao Hoang Tru CSE Faculty - HCMUT

Naive Bayes Classifier

Naive Bayes Classifier

cNB = argmaxcC i=1,nP(ai = wk | c).P(c) = argmaxcC i=1,nP(wk | c).P(c)

Cao Hoang Tru CSE Faculty - HCMUT

You might also like