Professional Documents
Culture Documents
Bayesian networks
Chapter 14, AIMA
Inference
• Inference in the statistical setting means
computing probabilities for different
outcomes to be true given the information
P (Outcome | Information)
Weather Cavity
Toothache Catch
The alarm network
Burglar alarm responds to
both earthquakes and burglars.
Burglary Earthquake
Two neighbors: John and Mary,
who have promised to call you
when the alarm goes off.
Toxics Smoking
Genetic
Cancer
Damage
Serum Lung
Calcium Tumour
Age Gender
Genetic
Cancer
Damage
Serum Lung
Calcium Tumour P(A,G,T,S,C,SC,LT,GD) =
P(A)P(G)P(T|A)P(S|A,G)×
P(SC,C,LT,GD) = P(SC|C)P(LT|C,GD)P(C) P(GD) P(C|T,S)P(GD)P(SC|C)×
From Breese and Coller 1997
P(LT|C,GD)
The product (chain) rule
P ( X 1 = x1 ∧ X 2 = x2 ∧ ∧ X n = xn ) =
n
P ( x1 , x2 , , xn ) = ∏ P ( xi | parents ( X i ))
i =1
P(C|¬a,b) = U[0.7,1.9]
0.7 1.9
Bayes network node is a function
A B
A BN node is a conditional
distribution function
• Inputs = Parent values
• Output = distribution over values
Alarm
B E P(A=a)
b e 0.95
b ¬e 0.94
¬b e 0.29
JohnCalls MaryCalls ¬b ¬e 0.001
A P(J=j) A P(M=m)
a 0.90 a 0.70
¬a 0.05 ¬a 0.01
Example: The alarm network
P ( j ∧ m ∧ a ∧ ¬b ∧ ¬ e )
= P (¬b) P (¬e) P (a | ¬b, ¬e) P (m | a ) P ( j | a )
= 0.999 ⋅ 0.998 ⋅ 0.001 ⋅ 0.70 ⋅ 0.90 = 0.00063
Burglary Earthquake
Probability distribution for
P(B=b) P(E=e) ”no earthquake, no burglary,
0.001 0.002 but alarm, and both Mary and
John make the call”
Alarm
B E P(A=a)
b e 0.95
b ¬e 0.94
¬b e 0.29
JohnCalls MaryCalls ¬b ¬e 0.001
A P(J=j) A P(M=m)
a 0.90 a 0.70
¬a 0.05 ¬a 0.01
Meaning of Bayesian network
The general chain rule (always true):
P( x1 , x2 , , xn ) = P( x1 | x2 , x3 , , xn ) P( x2 , x3 , , xn ) =
P( x1 | x2 , x3 , , xn ) P( x2 | x3 , x4 , , xn ) P ( x3 , x4 , , xn ) =
n
= ∏ P( xi | xi +1 , , xn )
i =1
Genetic
Cancer
Damage
Serum Lung
Calcium Tumour
P ( X , Z1 j , , Z nj | U1 , , U m ) =
P ( X | U1 , , U m ) P ( Z1 j , , Z nj | U1 , , U m )
Markov blanket
X2 X3
X1
X4
A node is conditionally
independent of all
other nodes in the
network, given its
parents, children, and
children’s parents
These constitute the
nodes Markov blanket.
X5
X6
Xk
P( X , X 1 , , X k | U1 , , U m , Z1 j , , Z nj , Y1 , , Yn ) =
P( X | U1 , , U m , Z1 j , , Z nj , Y1 , , Yn ) P ( X 1 , , X k | U1 , , U m , Z1 j , , Z nj , Y1 , , Yn )
Efficient representation of PDs
P(C|a,b) ?
C
• Boolean → Boolean
• Boolean → Discrete
B • Boolean → Continuous
• Discrete → Boolean
• Discrete → Discrete
• Discrete → Continuous
• Continuous → Boolean
• Continuous → Discrete
• Continuous → Continuous
Efficient representation of PDs
Boolean → Boolean:
Noisy-OR, Noisy-AND
Boolean/Discrete → Discrete:
Noisy-MAX
Bool./Discr./Cont. → Continuous:
Parametric distribution (e.g. Gaussian)
Continuous → Boolean:
Logit/Probit
Noisy-OR example
Boolean → Boolean
P(E|C1,C2,C3)
C1 0 1 0 0 1 1 0 1
C2 0 0 1 0 1 0 1 1
C3 0 0 0 1 0 1 1 1
The effect (E) is off (false) when none of the causes are true. The
probability for the effect increases with the number of true causes.
1 if true
Ci =
0 if false
C1 C2 Cn
P(E|C1,...)
n
P( E = ek | C1 , C2 , , Cn ) = ∏ q Ci
i ,k
i =1
Parametric probability densities
Boolean/Discr./Continuous → Continuous
1 − ( x − µ )2
P( X ) = exp = N ( µ , σ )
σ 2π 2σ
2
1 − ( x − α − βa) 2
P( X ) = exp
σ 2π 2σ 2
Probit & Logit
Discrete → Boolean
1
Logit : P ( A = a | x) =
1 + exp[ − 2( µ − x) / σ ]
x
1
Probit : P ( A = a | x) = ∫ exp(−( x − µ ) / σ )dx
2 2
2π −∞
T he lo g i s ti c s i g m o i d
1
0 .8
P(A|x)
0 .6
0 .4
0 .2
0
-8 -6 -4 -2 0 2 4 6 8
x
The cancer network
Age Gender
Discrete Discrete/boolean
Continuous Discrete/boolean
Inference in BN
Inference means computing P(X|e), where X is
a query (variable) and e is a set of evidence
variables (for which we know the values).
Examples:
P ( X , e)
P ( X | e) = = αP ( X , e) = α ∑ P( X , e, y )
P(e) y
Alarm
Burglary Earthquake P( B = b, E , A, j , m) =
P(E=e) P( j , m | b, E , A)P(b, E , A) =
P(B=b) 0.002 P( j | A) P(m | A)P(b, E , A) =
0.001
P( j | A) P(m | A) P(a | b, E )P(b, E ) =
Alarm
P( j | A) P(m | A) P(a | b, E ) P(b) P( E )
= 0.001 = 10-3
10 −3 × P( j | A) P(m | A) P( A | b, E ) P( E )
JohnCalls MaryCalls B E P(A=a)
b e 0.95
A P(J=j) A P(M=m) b ¬e 0.94
a 0.90 a 0.70 ¬b e 0.29
¬a 0.05 ¬a 0.01 ¬b ¬e 0.001
Example: The alarm network
What is the probability for a burglary if both John and Mary call?
P ( B | j , m) = α ∑ ∑ P( B, E, A, j, m)
E ={e ,¬e} A ={ a ,¬a}
P(b, j , m) =
Burglary Earthquake
P(E=e)
10 −3 ∑ P( j | A) P(m | A) P( A | b, E ) P( E ) =
A={a ,¬a}
P(B=b) 0.002
E ={e ,¬e}
Burglary Earthquake
P(E=e) P(b, j , m) = 0.5923 ×10 −3
P(B=b) 0.002
0.001 P(¬b, j , m) = 1.491×10 −3
α = P( j , m) −1 = [ P(b, j , m) + P(¬b, j , m)] =
−1
Alarm
= [2.083 ×10 −3 ]−1
P(b | j , m) = αP(b, j , m) = 0.284
P(¬b | j , m) = αP(¬b, j , m) = 0.716
Answer: 28%
P ( B | j , m) = α ∑ ∑ P( B, E, A, j, m)
E ={e ,¬e} A ={ a ,¬a}
Burglary Earthquake
P(E=e) P(b, j , m) = 0.5923 ×10 −3
P(B=b) 0.002
0.001 P(¬b, j , m) = 1.491×10 −3
α = P( j , m) −1 = [ P(b, j , m) + P(¬b, j , m)] =
−1
Alarm
= [2.083 ×10 −3 ]−1
P(b | j , m) = αP(b, j , m) = 0.284
P(¬b | j , m) = αP(¬b, j , m) = 0.716