Professional Documents
Culture Documents
Match score?
What is ‘a good measure’?
• Precision
• Significance
• Feasible to various data
What is ‘a good measure’?
• Precision
• Significance
• Feasible to various data
H(X) H(Y)
Joint entropy
• Venn diagram for definition of entropies
H(X,Y)
Example of joint entropy
• 성도 (X) and 성완(Y) tossed coins 10 times at a
time
• 0 : head, 1 : tail
• X : { 0, 0, 0, 0, 0, 1, 1, 1, 1, 1 }
• Y : { 0, 0, 1, 0, 0, 0, 1, 0, 1, 1 }
• H(X,Y) = 1.85
• Note : 𝐻 𝑋, 𝑌 ≤ 𝐻 𝑋 + 𝐻(𝑌)
R code for the calculation
> X <- c(0, 0, 0, 0, 0, 1, 1, 1, 1, 1)
> Y <- c(0, 0, 1, 0, 0, 0, 1, 0, 1, 1)
>
> freq <- table(X,Y)
>
> ret <- 0
> for( i in 1:2 ) {
> for( j in 1:2 ) {
> ret <- freq[i,j]/10.0 * log2(freq[i,j]/10.0)
> }
> }
> ret
‘entropy’ library
> library("entropy")
> x1 = runif(10000)
> hist(x1, xlim=c(0,1), freq=FALSE)
> y1 = discretize(x1, numBins=10, r=c(0,1))
> entropy(y1)
[1] 2.30244
> y1
Mutual information
• Measure for mutual dependence or interaction
• 𝐼 𝑋; 𝑌 = 𝐻 𝑋 + 𝐻 𝑌 − 𝐻 𝑋, 𝑌 ≤ min{𝐻 𝑋 , 𝐻 𝑌 }
I(X;Y)
Mutual information
• Some properties of mutual information
𝑝(𝑥,𝑦)
• 𝐼 𝑋; 𝑌 = 𝑥∈𝑋,𝑦∈𝑌 𝑝 𝑥, 𝑦 log
𝑝 𝑥 𝑝(𝑦)
• 𝐼 𝑋; 𝑌 = 𝐼(𝑌; 𝑋)
• 𝐼 𝑋, 𝑌 ≤ min 𝐻 𝑋 , 𝐻 𝑌
• 𝐼 𝑋, 𝑌 = 𝐻 𝑋 − 𝐻(𝑋|𝑌)
How to measure mutual information
Genotype 𝐴𝐴𝐵𝐵 𝐴𝐴𝐵𝑏 𝐴𝐴𝑏𝑏 𝐴𝑎𝐵𝐵 𝐴𝑎𝐵𝑏 𝐴𝑎𝑏𝑏 𝑎𝑎𝐵𝐵 𝑎𝑎𝐵𝑏 𝑎𝑎𝑏𝑏 sum
Case 39 91 95 92 14 31 63 4 71 500
Control 100 15 55 5 22 150 50 93 10 500
sum 139 106 150 97 36 181 113 97 81 1000
Frequency Table
Genotype 𝐴𝐴𝐵𝐵 𝐴𝐴𝐵𝑏 𝐴𝐴𝑏𝑏 𝐴𝑎𝐵𝐵 𝐴𝑎𝐵𝑏 𝐴𝑎𝑏𝑏 𝑎𝑎𝐵𝐵 𝑎𝑎𝐵𝑏 𝑎𝑎𝑏𝑏 sum
Case 0.039 0.091 0.095 0.092 0.014 0.031 0.063 0.004 0.071 0.500
Control 0.100 0.015 0.055 0.005 0.022 0.150 0.050 0.093 0.010 0.500
sum 0.139 0.106 0.150 0.097 0.036 0.181 0.113 0.097 0.081 1.000
18
How to measure mutual information
Entropy Table
Genotype 𝐴𝐴𝐵𝐵 𝐴𝐴𝐵𝑏 𝐴𝐴𝑏𝑏 𝐴𝑎𝐵𝐵 𝐴𝑎𝐵𝑏 𝐴𝑎𝑏𝑏 𝑎𝑎𝐵𝐵 𝑎𝑎𝐵𝑏 𝑎𝑎𝑏𝑏 sum
Case 0.183 0.315 0.323 0.317 0.086 0.155 0.251 0.032 0.271 0.500
Control 0.332 0.091 0.230 0.038 0.121 0.411 0.216 0.319 0.066 0.500
sum 0.396 0.343 0.411 0.326 0.173 0.446 0.355 0.326 0.294
= 0.31
19
‘entropy’ library
> x1 = runif(10000)
> x2 = runif(10000)
> y2d = discretize2d(x1, x2, numBins1=10, numBins2=10)
> H12 = entropy(y2d)
>
> # mutual information
> mi.empirical(y2d) # approximately zero
> H1 = entropy(rowSums(y2d))
> H2 = entropy(colSums(y2d))
> H1+H2-H12
Applications
Association measure between
genomic features and outcome
𝐼 𝑋1 , 𝑋2 ; 𝑌 = 𝐻 𝑋1 , 𝑋2 + 𝐻 𝑌 − 𝐻(𝑋1 , 𝑋2 , 𝑌)
pair of genomic
binary outcomes
features
22
Mutual Information With Clustering
(Leem et al., 2014) (1/2)
m candidates Centroid 2
: SNPs
: causative SNPs
distance
Score=d1+d2
d1
3 SNPs with the
Centroid 1 highest mutual
d2 information value
Centroid 3
m candidates
m candidates
23
Mutual Information With Clustering
(Leem et al., 2014) (2/2)
• Mutual information
• As distance measure for clustering
• K-means clustering algorithm
• Candidate selection
• Reduce search space dramatically
• Can detect high-order epistatic interaction
• Also, shows better performance (power, execution time)
than previous methods
24
Outcome-guided mutual information network
in network based prediction (Jeong et al., 2015)
(1/2)
• Two parameters - 𝜃 and 𝜎
𝜃 𝜃∗ 1+𝜎
θ = 𝑚𝑎𝑥𝑖≠𝑗 𝐼avg 𝑖, 𝑗
# of edges
30
1
𝐼avg 𝑖, 𝑗 = 𝐼avg 𝑔𝑖 , 𝑔𝑗 ; 𝑌𝑝
30
𝑝=1
mutual information
𝐺𝜎 = 𝑔𝑖 , 𝑔𝑗 𝑔𝑖 , 𝑔𝑗 ∈ 𝑃 𝑎𝑛𝑑 𝐼 𝑔𝑖 , 𝑔𝑗 ; 𝑌 ≥ 𝜃(1 + 𝜎)}
Outcome-guided mutual information network
in network based prediction (Jeong et al., 2015)
(2/2)
Feature network
Relevance networks for gastritis
(Jeong and Sohn, 2014)
MINA: Mutual Information
Network Analysis framework
https://github.com/hhjeong/MINA
Conclusion
Problems and its solution of
mutual information
• Noises for continuous data
• Alternative discretization technique
• Assessment of significance
• Permutation test
• Also, we should consider for multiple testing problem.
• Mutual information is not a metric!