COMP 4641 - Social Information Networks Analysis and Engineering

Social Media & Networks Classical paper: The Strength of Weak Ties
Tuesday, March 21, 2017 3:34 PM

(Ganovetter, 1973)
- Question: How large scale pattern (micro → macro) emerge?

○ Micro-lvl interaction within small group
Introduction
○ Macro-lvl patterns within society
- Previously: Strong ties considered important

close friends, family
- Idea:
○ Tradic closure: 2 people have common strong tie → Must have tie
between each other
○ Bridge:
 Link that is only path between 2 users
→ Must be weak ties (according to tradic closure)
 However, help connect community:
□ Must necessarily carry new info
□ Help map micro → macro
⇒ Strength of weak ties
- Why study network:

○ Natural fit with interaction: Users only interact with small subset of
others
○ Can predict behavior with network view
○ Social impact:
drug design
○ Universality: Networks from science, nature & tech more similar than
expected
○ Shared vocabulary between fields
- What to study in networks:

○ Structure Classical paper: Neocortex Size as Constraint Group Size in Primates
○ Evolution (How network become to such structure) (Robin D, 1989)
○ Processes & Dynamics
How info, behaviors, dieseases spread in network?
- Hypothesis: Large brain size due to "social" nature of primate
- Why study social media: ○ Measure "social" lvl by looking at typical group size
○ Observe social interaction at scale: ○ If true, then brain size should correlate with being "social"
 ↑ Confidence in results
 Certain effects only seen at scale - Findings:
○ Data availability:
 ↓ Field work
 Sites have complete history record → Provide entire evolution of
user groups
- Challenges of studying social media:

○ Miss many local interaction
○ Links "mean" less
○ Hard to compare networks
Classical paper: Experimental Study of Small World Problem

(Jeffery T, Stanley M, 1969) - Implication: # rel individual rel maintainable bounded by neocortex size
- Experiment:
○ Select 296 peo in Nebraska & Boston
○ Mail packet to them:
 Specific destination person
 Ask to forward to someone know personally
Leveraging Social Media: Predicting Future with Social Media
- Question: How long are successful path (Sitaram A, Bernardo H)
- Findings:
- Question: Can we extract info from conversation on social media networks?
(Collective wisdom)
- Method: Use Twitter to predict box-office return for movies

○ Search data repeatedly for each movie 3 weeks around release date
○ Make prediction
 How many tweets before movie release date?
COMP 4641 Page 1

- Method: Use Twitter to predict box-office return for movies
○ Search data repeatedly for each movie 3 weeks around release date
○ Make prediction
 How many tweets before movie release date?
 Compare with change of Hollywood Stock Exchange index
- Result: Surprisingly accurate
48% of 64 chains falls into 3 people
- Implication: Not only short chains exist

○ But people can find them
○ With only local information
⇒ Social network navigable
- Comment: Do movies studios only promote movies they expect to be hit?
Leveraging Social Media: Meme-tracking & Dynamic of News Cycle

(Jure L, Lars B, Jon K, 2009)
- Question: Can social media shed light on info flow?
- Method:
○ Collect data over 90 milion doc
○ Quote clustering on graph
 Graph structure: Node = Quote
Edge (weighted) = Inclusion relation
 Remove all but strongest outgoing edge
- Findings:
○ Nature of 24-hour news cycle: Memes quickly enter & leave
collective conscience
○ Media access peak 2.5 hours before blog access peak
○ Blog access volume persists much longer
COMP 4641 Page 2

Basic Network Properties Fundamentals of undirected graph
Tuesday, March 21, 2017 5:11 AM
- Deg ku: # of edges adj to node u

○ Avg deg:
𝑵
Networks vs Graphs 𝟏
𝒌 = ⎯⎯
𝟐𝑬
𝒌𝒊 = ⎯⎯⎯
𝑵 𝑵
𝒊 𝟏
Network Graph ○ Deg distr: Prob randomly chosen node has deg k
𝑵𝒌
- Refer to real sys - Math representation of network 𝑷(𝒌) = ⎯⎯⎯
𝑵
- Terminology: node, link - Terminology: vertex, edge Nk: # nodes having deg k
N: Total # nodes
Usually network & graph used interchangeably
- Distance/Path length huv: # edges along shortest path connecting 2 nodes
○ huu = 0
huv = ∞ if no path u → v
○ Avg path length:

𝟏 𝟏
Graph Structure of the Web 𝒉 = ⎯⎯⎯⎯⎯⎯ 𝒉𝒖𝒗 = ⎯⎯⎯⎯⎯⎯⎯⎯ 𝒉𝒖𝒗
𝟐𝑬𝐦𝐚𝐱 𝒏(𝒏 − 𝟏)
(𝒖,𝒗) (𝒖,𝒗)
(Ignore disconnected node pairs)
- Modelling method: Directed graph
Edge (u, v) ⇔ Website u has hyperlink to v ○ Diameter: Max distance between any pair of nodes in graph
(⇔ "longest" shortest path)
- Resulting structure: Bow-tie
○ 1 giant SCC - Clustering coef Cu: Portion of u's neighbor which also neighbor of each other
𝟐𝒆𝒖
𝑪𝒖 = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝒌𝒖 (𝒌𝒖 − 𝟏)
○ IN: Can reach SCC, but can't reached from SCC
eu: No of edges among u's neighbors
New sites not yet discovered
○ OUT: Accessible from SCC, but not link back to SCC

Corporate website (only internal link)
○ TENDRILS: Can't reach SCC, can't reached from SCC
○ TUBES: Links IN → OUT without through SCC
○ Avg clustering coef:

𝟏
𝑪 = ⎯⎯ 𝑪𝒖
𝑵
𝒖
- Complete (undirected) graph: All node adj to each other

𝑵(𝑵 − 𝟏)
𝑬𝒎𝒂𝒙 = ⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝟐
Fundamentals of directed graph
- In-degree indeg(u): # edges pointing to u

- Out-degree outdeg(u): # edges starting from u
- Avg deg:
𝑬
𝒌 = ⎯⎯
𝑵
- Issues: Structure discovered depend on crawling process
- Strongly connected component (SCC): All node in component can visit
each other (A → B, B → A)
- Weakly connected component: All node can visit each other in
"undirected" version component
- Directed Acyclic graph (DAG): No cycle
COMP 4641 Page 3

Small-World Phenomena
Real network vs Simple graph
Small-world phenomenon
Real network Simple Graph
Giant Exist Exist
Typical length of shortest path usually small connected (NOT emerge through (emerge through phase
Small-world experiment [Milgram 1967): 6 deg of component phase transition) transition)
separation
Avg path length Small Small
Clustering coef Big Small
(no local structure)
Simple Graph Model Gnp Deg distr Power law Binomial
(Erdos-Renyi, 1960)
- Def:
○ Undirected, n nodes
○ Each edge (u, v) appear with prob p
Small World Model

- Edge properties:
(Watts-Strogatz 1998)
𝒏(𝒏 − 𝟏)
○ 𝑬𝒎𝒂𝒙 = ⎯⎯⎯⎯⎯⎯⎯⎯
𝟐
○ Prob Gnp-type graph has E edges:
𝑬
𝑷(𝑬) = 𝒎𝒂𝒙 𝒑𝑬 (𝟏 − 𝒑)𝑬𝒎𝒂𝒙 𝑬
𝑬
(binomial)
- Degree distr: Binomial

𝑛−1
𝑃(𝑘) = 𝑝 (1 − 𝑝)
𝑘
𝑘 = (𝑛 − 1)𝑝
𝜎 = 𝑝(1 − 𝑝)(𝑛 − 1)
𝜎 1−𝑝 1
⎯⎯= ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝑘 𝑝 𝑛−1
(n (Graph size) ↑ → ⎯ ↓ → 𝜎 ≪ 𝑘 → Distr narrower → ↑
Confidence that node deg around 𝒌)
- Construction:
○ Initialization: Low-dim regular lattice
Each node connected to α nearest neighbor
(high clustering coef, high diameter)
○ Rewiring:
 (1) For each node, clockwise around ring:
□ Select edge connecting it to its nearest neighbor
□ With prob p, reconnect this edge to node chosen uniformly
over entire ring
 (2) Repeat (1) for 2nd, 3rd, …, αrd nearest neighbor
- Intention: ↑ Rewire → ↑ Randomness → Interpolate between regular lattice &

random graph
- Clustering coef: - Observation:

2𝑒 2 ○ Lots of rewiring required → ↓ Clustering coef
𝐸(𝐶 ) = 𝐸 ⎯⎯⎯⎯⎯⎯⎯⎯⎯ = ⎯⎯⎯⎯⎯⎯⎯⎯⎯𝐸(𝑒 )
𝑘 (𝑘 − 1) 𝑘 (𝑘 − 1) ○ BUT: Very small amount of newly-created long-range edges (result of
𝑘 (𝑘 − 1) rewiring) → ↓ Diameter
𝐸(𝑒 ) = 𝑝 ⎯⎯⎯⎯⎯⎯⎯⎯⎯
2
𝒌
⇒ 𝑬(𝑪𝒊) = 𝒑 = ⎯⎯⎯⎯⎯ - Model strength:
𝒏−𝟏
(For fixed 𝒌: n (Graph size) ↑ → C ↓) ○ Provide insight into interplay between clustering & small-world
○ Capture structure of many realistic networks
- Avg path length: small ○ Explain high clustering of real network
- Connectivity: Strongly controlled by np (phase transition) - Model weakness:

np < 1: Small components ○ Can't explain real network's deg distr
np > 1: Giant connected component emerge ○ Not enable navigation among clusters (communities)
COMP 4641 Page 4

Community Structure in Network Testing of tie strength in real network
(Onnela 2007)
- Data:
○ Cell-phone network of 20% country's population
Basic concepts
○ Edge strength: Aggregated call duration
- Link removal by edge strength (Low strength = weak link):

- Bridge edge: If removed → Graph disconnected
○ Low → High: Network disconnected sooner
- Local bridge: ○ High → Low: Network gradually shrink, but not collapse
○ Edges whose endpoints have no common neighbor
○ Exclude bridge - Link removal by edge overlap (Low overlap = weak link)
- Edge overlap:
|𝑵𝒆𝒊𝒈𝒉𝒃𝒐𝒓 𝒐𝒇 𝒊 𝒂𝒏𝒅 𝒋| |𝑵(𝒊) ∩ 𝑵(𝒋)|
𝑶𝒊𝒋 = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
|𝑵𝒆𝒊𝒈𝒉𝒃𝒐𝒓 𝒐𝒇 𝒊 𝒐𝒓 𝒋| |𝑵(𝒊) ∪ 𝑵(𝒋)|
N(i): Set of i's neighbor (excluding j)
Oij = 0 → Local bridge
Low → High: Network disconnected sooner, collapse
- Conclusion:
○ Weak ties crucial for maintaining network's structural integrity
○ Strong ties important for maintaining local communities
Structural Hole
Network Constraints Measure

- "Empty space" in network between sets of nodes not To what extent person's contact redundant
interacting closely
- Network spanning structural holes: Source of social capital

Node's "performance" neg associate with its network
constraint
𝟐

𝒄𝒊 = 𝒄𝒊𝒋 = 𝒑𝒊𝒋 + 𝒑𝒊𝒌 𝒑𝒌𝒋

𝒋 𝒋 𝒌
ci: i's network constraints

pij: Proportion of i's energy invested in rel with j
1 2 3 4 5
1 0 1/4 1/4 1/4 1/4
2 1/2 0 0 0 1/2
3 1 0 0 0 0
4 1/2 0 0 0 1/2
5 1/3 1/3 0 1/3 0
(Assumption: All link equally important)
1 1 1 1 1 1
𝑐 = (𝑝 + 𝑝 𝑝 ) + (𝑝 + 𝑝 𝑝 ) = ⎯⎯+ ⎯⎯× ⎯⎯ + ⎯⎯+ ⎯⎯× ⎯⎯
2 2 4 2 2 3
COMP 4641 Page 5

5 1/3 1/3 0 1/3 0
(Assumption: All link equally important)
1 1 1 1 1 1
𝑐 = (𝑝 + 𝑝 𝑝 ) + (𝑝 + 𝑝 𝑝 ) = ⎯⎯+ ⎯⎯× ⎯⎯ + ⎯⎯+ ⎯⎯× ⎯⎯
2 2 4 2 2 3
Girvan-Newman algorithm
- Input: Undirected graph

- Output: Hierarchical decomposition of network
Edge betweeness
- Step: Repeat until no edges left
○ Calculate edge betweeness
○ Remove edges with highness betweeness (may remove ≥ 2 edges)
- # shortest path passing over particular edge
○ Connected component are communities
- Computation:
○ Initialize betweeness(u, v) = 0 ∀u, v
○ For each node u:

 BFS to find f(v): # shortest path u → v
 Initialize nodeFlow(v) = 0 ∀v ≠ u
 Go upward from lowest node on BFS tree:
□ nodeFlow(v) += 1
□ For each node u higher than v on BFS tree & edge (u, v)
exist:
Δ = f(u) / f(v) * nodeFlow(v)
betweeness(u, v) += Δ
After step 1: [1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14]
nodeFlow(u) += Δ
After step 2: [1, 2, 3], [4, 5, 6], [7], [8], [9, 10, 11], [12], [13], [14]
After step 3: [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]
○ Divide all betweeness() by 2
Network Community
- Community: Sets of tightly connected node
- Modularity Q: Measure how well network partitioned into

communities
Given that we divide graph into several communities

𝟏 𝒌𝒊 𝒌𝒋
𝑸 = ⎯⎯⎯ 𝜹𝒊𝒋 𝑨𝒊𝒋 − ⎯⎯⎯⎯
𝟐𝒎 𝟐𝒎
𝒊,𝒋
m: Total # edges
δij =1, if i & j assigned to same community
0, otherwise
Aij = 1, if edge (i, j) exist
0, otherwise
0.3 < Q < 0.7: Significant community structure
- Application to Girvan-Newmann algorithm:

Best network decomposition = Highest Q
(NOTE: When calculate Q, use original graph, NOT graph with
edges removed after each step)
COMP 4641 Page 6

Centrality Measures
Centrality Measures for Undirected Graph
Degree Centrality Closeness Centrality Betweeness Centrality

Idea How neighbor-connective a - How "close" to others a node is How many individual pairs have to go through a
node is - Not only consider neighbor node to reach another

Formula 𝑪𝑫 (𝒖) = 𝒌𝒖 𝒏
𝟏 𝒈𝒗𝒌 (𝒖)
𝑪𝑩 (𝒖) = ⎯⎯⎯⎯⎯⎯
𝑪𝑪 (𝒖) = 𝒉𝒖𝒗 𝒈𝒗𝒌
𝒗 𝒌
𝒗 𝟏 gvk(u): # shortest path v → k passing
through u
gvk: Total # shortest path between v, k
Formula, with 𝑪𝑫
𝑪𝑫 (𝒖) = ⎯⎯⎯⎯⎯
𝑪𝑪 (𝒖) = (𝑵 − 𝟏)𝑪𝑪 (𝒖) 𝟐
𝑪𝑩 (𝒖) = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯𝑪𝑩 (𝒖)
normalization 𝑵−𝟏 (𝒏 − 𝟏)(𝒏 − 𝟐)
Illustration
Node positions according to centrality measure
Low Degree Low Closeness Low Betweeness

High Degree Located in cluster far away from - Many neighbor
rest of network - Few shortest path going through
High Closeness - Close to many people Many shortest path between same
- Few neighbor node pair in network
High Betweeness - Located in "bridge" Very rare
connecting 2 clusters
Centrality Measures for Directed Graph
Degree Prestige Closeness Prestige Betweeness Prestige Directed Geodesic

𝑪𝑫 = 𝒊𝒏𝒅𝒆𝒈(𝒖) 𝟏 𝒈𝒗𝒌 (𝒖) A node does not necessarily lie on
𝑪𝑪 = ⎯⎯⎯ 𝒉𝒗𝒖 𝑪𝑩 (𝒖) = ⎯⎯⎯⎯⎯⎯
|𝑰𝒖 |
(𝒗,𝒌)
𝒈𝒗𝒌 geodesic j → k if it lies on
𝒗∈𝑰𝒖
geodesics k → j
Iu (u's influence range): Set of gvk(u): # shortest path v → k passing

vertices able to reach u through u
gvk: Total # shortest path v → k
COMP 4641 Page 7

gvk: Total # shortest path v → k
NOTE: gvk ≠ gkv
Normalization:
𝑪𝑩 (𝒖)
𝑪𝑩 (𝒖) = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
(𝑵 − 𝟏)(𝑵 − 𝟐)
Eigenvector Centrality
Freeman's Network Centrality

- Principle: Node importance increased by having neighbors who themselves
also important
- Measure centrality of whole network:
Larger network → More likely single node is - Computation:
quite central, remaining less central Centrality vector 𝒙⃗ , Adj matrix A
- Calculation: Initially 𝒙𝒊 = 𝟏 ∀𝑖
∑ 𝒖 𝑪∗ (𝒏) − 𝑪(𝒖) Obtain better estimation: 𝒙𝒊 = ∑ 𝒋 𝑨𝒊𝒋 𝒙𝒋
𝑪𝑫 = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
(𝑵 − 𝟏)(𝑵 − 𝟐) Matrix form: 𝒙⃗ = 𝑨𝒙⃗
C*(n) = max(C(u)) Repeat t steps: 𝒙⃗(𝒕) = 𝑨𝒕 𝒙⃗(𝟎)

C(u): Can use CD(u), CC(u) or CB(u) (Intuitively, want 𝑥⃗(𝑡) converge)
CD ∈ [0, 1] - Alternative method to compute 𝒙⃗(𝒕)

Let 𝒗⃗ be eigenvector of A
⇒ 𝒗⃗ is also eigenvector of M = A + I
Let 𝒗𝟏⃗, 𝒗𝟐⃗, … , 𝒗𝒏⃗ eigenvector of M

λ1, λ2, … , λn: Corresponding eigenvalue
Supplement defs & theorem: Consider 𝒙⃗ not orthogonal to 𝒗𝟏⃗

𝑥⃗ = 𝛼 𝑣 ⃗ + 𝛼 𝑣 ⃗ + ⋯ + 𝛼 𝑣 ⃗
○ Primitive matrix M: Nonneg, square
∃k > 0 (k ∈ Z+): Mk strictly positive 𝑀 𝑥⃗ = 𝛼 𝑀 𝑣 ⃗ + 𝛼 𝑀 𝑣 ⃗ + ⋯ + 𝛼 𝑀 𝑣 ⃗
= 𝜆 𝛼 𝑣⃗ + 𝜆 𝛼 𝑣⃗ + ⋯ + 𝜆 𝛼 𝑣 ⃗
○ Perron-Frobenius theorem:
M: n×n primitive matrix 𝑀 𝑥⃗ 𝜆 𝛼 𝑣⃗ 𝜆 𝛼 𝑣⃗
⎯⎯⎯⎯ = 𝛼 𝑣 ⃗ + ⎯⎯⎯⎯⎯⎯ + ⋯ + ⎯⎯⎯⎯⎯⎯⎯
⇒ ∃ eigenvalue λ1 such that: 𝜆 𝜆 𝜆
λ1 > 0
λ1 has unique eigenvector 𝒗𝟏⃗: v1i > 0 ∀i (all entries 𝑴𝒌 𝒙⃗
𝐥𝐢𝐦 ⎯⎯⎯⎯ = 𝜶𝟏 𝒗𝟏⃗
positive) 𝒌→ 𝝀𝒌𝟏
λ1 > |λ| ∀ eigenvalue λ ≠ λ1 (λ1 is largest eigenvalue) ⇒ 𝑴𝒌 𝒙⃗ ∝ 𝒗𝟏⃗
Implication:
𝒙⃗(𝒕) ∝ 𝒗𝟏⃗
𝒗𝟏⃗ is eigenvector of M = A + I
⇒ Can regard 𝒗𝟏⃗ as eigenvector centrality (as we only consider
relative difference among entries of 𝒙⃗(𝒕))
⇒ Finding 𝒙⃗(𝒕) is same as finding leading eigenvalue of (A + I)

and its corresponding eigenvector 𝒗𝟏⃗
COMP 4641 Page 8

Network Formation Process
Power Law Distribution
Deg distr of many real networks
Heavy Tailed Distribution
Distr P(X > x) heavy tailed if:

𝑷(𝑿 > 𝒙)
𝐥𝐢𝐦 ⎯⎯⎯⎯⎯⎯⎯⎯ =∞
𝒙→ 𝒆 𝝀𝒙
NOT Heavy tailed:

( )
⎯⎯⎯⎯⎯
Normal PDF: 𝑝(𝑥) = ⎯ ⎯⎯⎯
⎯⎯⎯𝑒
√
Exponential PDF: 𝑝(𝑥) = 𝜆𝑒
⇒ 𝑝(𝑋 > 𝑥) = 𝑒
- Form: p(x) = Zx-α
Heavy tailed: Set cutoff value xm
Power law: 𝑃(𝑥) = 𝑥
Stretched exponential: 𝑃(𝑥) = 𝑥 𝑒 - Normalizing const Z:
( )
Log-normal: 𝑃(𝑥) = ⎯𝑒𝑥𝑝 − ⎯⎯⎯⎯⎯⎯⎯ 1= 𝑝(𝑥)𝑑𝑥 = 𝑍 𝑥 𝑑𝑥
𝑍 𝑍
= ⎯⎯⎯⎯⎯⎯ [𝑥 ] = − ⎯⎯⎯⎯⎯[∞ −𝑥 ]
−𝛼 + 1 𝛼−1
Assume α > 1 ⇒ ∞1-α = 0

Scale-Free Network ⇒ 𝒁 = (𝜶 − 𝟏)𝒙𝜶𝒎 𝟏
𝜶
𝜶−𝟏 𝒙
𝒑(𝒙) = ⎯⎯⎯⎯⎯ ⎯⎯⎯
- Network with deg distr's tail in power law form 𝒙𝒎 𝒙𝒎
- Name origin: - Expectation:

○ Scale invariance: Whatever scale we look at,
𝐸(𝑋) = 𝑥𝑝(𝑥)𝑑𝑥 = 𝑍 𝑥 𝑑𝑥
distr look "same" (No characteristic scale)
𝑍 (𝛼 − 1)𝑥
= ⎯⎯⎯⎯⎯⎯ [𝑥 ] = − ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯[∞ −𝑥 ]
○ Scale-free function: f(ax) = aλf(x) −𝛼 + 2 𝛼−2
Power law: f(ax) = (ax)λ = aλxλ = aλf(x)
Assume α > 2 ⇒ ∞2-α = 0
𝜶−𝟏
⇒ 𝑬(𝑿) = ⎯⎯⎯⎯⎯𝒙𝒎
𝜶−𝟐
- Variance:
𝑉𝑎𝑟(𝑋) = 𝐸 𝑋 − [𝐸(𝑋)]
Estimation of Power-law Exponent α
𝐸 𝑋 = 𝑥 𝑝(𝑥)𝑑𝑥 = 𝑍 𝑥 𝑑𝑥
𝑍 (𝛼 − 1)𝑥
= ⎯⎯⎯⎯⎯⎯ [𝑥 ] = − ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯[∞ −𝑥 ]
−𝛼 + 3 𝛼−3
- Method 1: Fit line on log-log graph using least squares
Assume α > 3 ⇒ ∞3-α = 0
𝛼−1
⇒ 𝐸 𝑋 = ⎯⎯⎯⎯⎯𝑥
𝛼−3
𝟐
𝜶−𝟏 𝜶−𝟏
⇒ 𝑽𝒂𝒓(𝑿) = ⎯⎯⎯⎯⎯𝒙𝟐𝒎 − ⎯⎯⎯⎯⎯𝒙𝒎
𝜶−𝟑 𝜶−𝟐
- In real network: 2 < α < 3:

E(X) = const
Var(X) = ∞
𝒚 = 𝒃𝒙 𝜶 Model explaining Power Law Degree Distribution

⇒ 𝐥𝐨𝐠(𝒚) = 𝐥𝐨𝐠(𝒃) − 𝜶 𝐥𝐨𝐠(𝒙)
Gradient = -α
- Preferential attachment: "Rich get richer":

Not good: log-log graph tends to "spreading" for large k
○ Nodes arrive in order 1, 2, …, n
For large k: N(k) usually very "equally" small (typical
○ New node j create m new link
value 0, 1, 2, …)
𝒌𝒊
(Usually: Graph used to generate deg distr is small 𝑷(𝒋 → 𝒊) = ⎯⎯⎯⎯⎯
∑ 𝒖 𝒌𝒖
⇒ ↑ Graph size → ↓ "Spreading" phenomenon)
u: Previously created node
- Method 2: Plot complementary CDF (CCDF) P(X ≥ x)

New citation to paper proportional to number it already has
- Exact model:
○ Graph formation:
 Nodes arrive in order 1, 2, .., n
COMP 4641 Page 9

- Exact model:
○ Graph formation:
 Nodes arrive in order 1, 2, .., n
 New node j create ONLY 1 out link (by doing EITHER (1) or (2))
□ (1) With prob p, j link to i chosen randomly, uniformly from
previous node
□ (2) With prob (1 - p):

 Randomly uniformly, choose node i previously created
 Link j to u which i points to
𝑷(𝑿 ≥ 𝒙) = 𝑝(𝑗) ≈ 𝑍𝑗 𝑑𝑗 ((2) ⇔ With prob (1 - p), j link to u with prob ∝ indeg(u))
𝑍 𝒁 (𝜶 𝟏) 𝟏
= ⎯⎯⎯⎯⎯⎯ [𝑗 ] = ⎯⎯⎯⎯⎯𝒙 𝟏 ⎯⎯⎯
−𝛼 + 1 𝜶−𝟏 ○ 𝑷(𝒌) ∝ 𝒌 𝟏 𝒑
𝟏
𝜶 = 𝟏 + ⎯⎯⎯⎯⎯
𝟏−𝒑
𝒚 = 𝑷(𝑿 ≥ 𝒙)
𝒁 ○ Behavior:
⇒ 𝐥𝐨𝐠(𝒚) = 𝐥𝐨𝐠 ⎯⎯⎯⎯⎯ − (𝜶 − 𝟏 ) 𝐥𝐨𝐠(𝒙)
𝜶−𝟏  p → 1:
Gradient = -(α - 1) □ Link formation mainly based on uniform random choices
□ α→∞
Better estimation: Aggregate N(k) → ΣN(k) big for large k □ Few nodes with large indeg
 p → 0:
□ Growth of network strongly governed by "rich-get-richer" behavior
□ α→2
□ Many nodes with large indeg
COMP 4641 Page 10

Network Effects & Cascading Behavior
Wednesday, May 10, 2017 10:31 PM
Network cascades Diffusion Models
- Contagion spreading over edges of network Decision-based model Probability model

- Create propagation tree
- Node observe neighbors' - Node get influenced with some prob
decisions → Make own decision from already influenced neighbors
- Application: product adoption, - Application: disease spreading, …

decision making, …
Bio: Diseases
Social: Viral marketing
COMP 4641 Page 11

Decision-based Model of Diffusion
Monday, May 15, 2017 10:19 PM
Game-theoretic Model: Properties
Game-theoretic Model of Cascade - Monotonic spreading: Node only switch A → B, NEVER back B → A
Proof (by contradiction):
Let u = First node switching B → A at time t
𝑏
- Rules: ⇒ 𝑝(𝑡) ≤ ⎯⎯⎯⎯⎯
𝑏+𝑎
○ Each node: Can choose only 1 of 2 actions A/B
○ Adj pair's payoff matrix:
At time t' < t: u switch A → B
 2 A: Each payoff = a > 0 𝑏
 2 B: Each payoff = b > 0 ⇒ 𝑝(𝑡 ) > ⎯⎯⎯⎯⎯≥ 𝑝(𝑡)
𝑏+𝑎
 1 A + 1 B: Each payoff = 0 ⇒ During time t' → t: # u's neighbor using A ↓
⇒ Impossible, because u is the first node switching B → A
- Single node decision process:
- Cascade capacity: Max q → ∃ finite set S can cause cascade
○ Capacity ↑ → Cascade more easily
○ ∀ graph G: Cascade capacity ≤ 1/2
- Stopping cascade:
○ Cluster C with density ρ: All C's nodes has ≥ ρ fraction of edges in C
Node v
d neighbors: pd already use A,
(1 - p)d already use B
(0 ≤ p ≤ 1) ○ Stopping condition:
S: Initial set of A's adopter
v's payoff = apd if v use A G\S contains cluster with density > (1 - q)
= b(1 - p)d if v use B ⇔ S cannot cause cascade
v use A if:
𝑎𝑝𝑑 > 𝑏(1 − 𝑝)𝑑
𝒃
⇔ 𝒑 > 𝒒 = ⎯⎯⎯⎯⎯
𝒂+𝒃
Interpretation example:
Game-theoretic Model: Cascade in Infinite Graph
If > 50% of my friends take A, I'll also take A
𝒃 𝟏 (each node has FINITE neighbors)
⇔ 𝒒 = ⎯⎯⎯⎯⎯ ≥ ⎯⎯⇔ 𝑏 ≥ 𝑎
𝒂+𝒃 𝟐
- Infinite path: q < 1/2 → Cascade
- Infinite tree: q < 1/3 → Cascade

Extended Model of Cascade
- Rules:
○ Each node: Can use both A & B
○ Adj pair's payoff matrix:
 AB - A: Each payoff = a
 AB - B: Each payoff = b
 AB - AB: Each payoff = max(a, b) - c
(c = Dual-maintenance cost)
- Infinite grid: q < 1/4 → Cascade
- Single node decision process:

○ Initialization:
 Infinite path, all B
 b=1
Consider node w
COMP 4641 Page 12

 Infinite path, all B
 b=1
Consider node w
○ Case 1:
Payoff = a if w choose A
=1 if w choose B
= a + 1 - c if w choose AB
Extended Model of Cascade: Analysis Summary
○ Case 2:
Payoff = a if w choose A
=1+1=2 if w choose B - Present condition:
=a+1-c if w choose AB ○ Default B
○ Better A comes
- Future scenarios:
○ Infiltration (B → AB → A):
 A & B too compatible with each other
 People first use both, but gradually drop B
○ Direct conquest (B → A):

 A & B too incompatible
 People immediately drop B, pick A
○ Buffer zone (B → AB):

 A complements B
 People use both
COMP 4641 Page 13

Probabilistic Model of Diffusion
Monday, May 15, 2017 11:06 PM
Virus' Spreading: SIR Model
Virus' Spreading Model Family - Phases:
○ Susceptible: No disease
○ Infectious: Get disease, can attack Susceptible
○ Recovered: Healed, no more infected
- Model development:
Pop N = S + I + R
P(Contact with S-node) = S / N
P(S → I) = p
Per unit time, per I-node:

- Params: # Contact made = cN
○ Birth rate β = P(Infected node attack neighbors) # Contact with S-node made = cN × S / N = cS
○ Death rate δ = P(Infected node healed) # S→I conversion made = pcS
- Virus strength s = β / δ Per unit time: No of S → I = pcSI = βSI (β = pc)

𝒅𝑺
⇒ ⎯⎯⎯= −𝜷𝑺𝑰
- Epidemic threshold τ: 𝒅𝒕
𝟏
𝝉 = ⎯⎯⎯
𝝀𝟏,𝑨 Similarly, we have:
λ1,A: Max eigenvalue of adj matrix A 𝒅𝑹 𝒅𝑰
⎯⎯⎯= 𝜹𝑰 , ⎯⎯⎯= 𝜷𝑺𝑰 − 𝜹𝑰
𝒅𝒕 𝒅𝒕
○ s < τ: Disease dies out
- Observation: S always ↓ , R always ↑
○ s > τ: Epidemic happens
- Typical assumption: Network topology not considered (Every

node has equal contact to others)
Independent Cascade Model
- Model:
○ Directed finite graph G = (V, E)
○ S: Initial set of "active" nodes
○ Edge (v, w):
 Pvw = Prob of node v, if active, also make neighbor w
active Virus' Spreading: SIS Model
 v only have 1 chance to make w
- Limitation: Many params → Hard to estimate from data
- Model development:
𝒅𝑰 𝒅𝑺
⎯⎯⎯= 𝜷𝑺𝑰 − 𝜹𝑰 , ⎯⎯⎯= −𝜷𝑺𝑰 + 𝜹𝐼
𝒅𝒕 𝒅𝒕
- Observation:
○ Case 1: t → ∞, I → 0
○ Case 2: Disease remains infinitely
Exposure & Adoption Model
- Model:
COMP 4641 Page 14

Exposure & Adoption Model
- Model:
○ States:
 Exposure: Node exposed to contagion by neighbors
 Adoption: Node act on contagion
Exposure & Adoption Model: Exposure Curve Modelling

Twitter case
○ Params: Exposure curve:

P(Adopt new behavior) = f(# adopted neighbors) - Question: Given user X & hashtag H
How successive exposure to H affect P(X mention H)?
- Development:
○ User X:
 Not yet mentioned H
 But k neighbors already
- Observation in viral marketing: ○ p(k) = P(X mention H before (k+1)th neighbors do so)
Too much marketing incentives ⇔ p(k) = Fraction of users adopting H after kth exposure
→ ↑ Recommendation from neighbor (in social network)
→ ↓ Effectiveness ○ Stickiness of H = max(p(k))
Book:
 Incentives:
□ Product recommender: 10%
□ First recommendee purchasing same item: 10% discount
 Effectiveness: p(Buy item) = f(# Recommendation received)
□ # ≤ 3: p ≈ const
□ # > 3: p ↓
Internal & External exposures
- Sources of exposures:
○ Internal: From inside network
User see URLs posted by friends
○ External: From outside network

User see URLs from source outside social network
Twitter case:
○ Trace emergence of URL, label each URL by its topic:
○ Findings:
 Topic's Max(p(k)):
P(Retweet | Art, edu article) < P(Retweet | Entertainment)
 k at Max(p(k)):
World news reach max infectious earliest → More time sensitive
 Decline of p(k) over time (Viral duration):

Topic with small "k at max(p(k)) tend → Short Viral duration
 External vs Internal exposure: Political news most externally driven
COMP 4641 Page 15

Influence Maximization in Network
Tuesday, May 16, 2017 12:48 AM
Vertex Cover Problem (NP-complete)
Influence Maximization (NP-complete)
- Def:
Given: U = {u1, …, un}
X1, …, Xm ⊆ U
- Formulation: Check: ∃ k sets: Xi1 ∩ Xi2 ∩ … Xik = U
S: Initial active set
- Approach: Influence Maximization in bipartite X-U graph
f(S): Expected size of final active set
"Expected" = Random process
1
𝑓(𝑆) = ⎯⎯ 𝑓 (𝑆)
𝑛
fi(S): ith realization of S
- ↑ f(S) → S more influential
- Problem: Max f(S)
Edge (Xi, uj) if uj ∈ Xi

Find S = {Xi1, Xi2, … Xik} → f(S) = k + n
- Approximation algorithm:
Simulation Experiment: Collaboration network
S0 = {}
for i = 1 … m:
Choose remaining Xj → Max f(Si ∪ {Xj}}
- Data: Co-authorship in papers of arXiv high-energy physics theory
Si = Si ∪ {Xj}
- Use Independent Cascade model

Evaluate f({X1}), …, f({Xm}) → Pick max at f({X2})
○ Case 1: All edge have prob p
Evaluate f({X2, X1}), f({X2, X3}), …, f({X2, Xm}) → Pick max at f({X2, X4})
○ Case 2: pvw = 1 / deg(w)
Evaluate f({X2, X4, X1}), f({X2, X4, X3}), …, f({X2, X4, Xm}) → Pick max at f({X2, X4, X1})
(Less friend → More influence on each friend)
- Approx guarantee: f(S) ≥ (1 - 1/e)*Optimal

- Compare vertex-cover approx with 3 common heuristics:
Claims hold for f() with 2 properties:
○ Degree centrality (Choose k peo with highest centrality as
 f monotone (activating more nodes doesn't hurt):
initial active set)
f({}) = 0
○ Distance centrality
S ⊆ T ⇒ f(S) ≤ f(T)
○ Random nodes
 f submodular (activating additional node help less):
S ⊆ T ⇒ f(S ∪ {u}) - f(S) ≥ f(T ∪ {u}) - f(T)
- Case 1 result: Degree & distance centrality not perform well
(Most central nodes can belong to same cluster)
(Diminishing return)
COMP 4641 Page 16

Network with Signed Edges
Theory of Structural Balance: Edge Sign Prediction
Theory of Structural Balance - 2 EQUIVALENT view of balance in non-complete graph:

Local Global
Fill in missing edges → Balance Divide graph into 2 coalitions
- Graph:
○ Undirected
○ Edge: (+) = Friend, (-) = Enemy
Wikipedia: (+) = A support B to become admin

Epinions: (+) = A trust B's product review
- Rule: Intuitive:
○ Friend's friend = Friend
○ Enemy's enemy = Friend
○ Friend's enemy = Enemy
- Theory: Graph is balanced
- Triangle unit: ⇔ No cycle with odd # (-) edges
Balanced Unbalanced - Balanced-checking algorithm:

○ 1: Find connected component based on (+) edge
Consistent with rules Inconsistent with rules
→ Exactly 1 or 3 (+) edges
- Balanced, complete graph: Every triangles is balanced

2 senarios:
 All edge (+)
 Nodes split into 2 sets: Only (-) edge between ○ 2: For each component: If ∃ (-) edges inside → Unbalanced
Global Structure of Signed Networks
○ 3: Regard connected components as SUPERnodes in new SUPERgraph

- (A, B)-Embeddedness = # shared neighbors (AT MOST 1 edge between any pair of supernodes)
- Question: How network structure interact with link:
- Observation (from Wikipedia, Epinions):

○ High embeddness → Significantly more likely (+) link
(Why: User with more common neighbor have greater
implicit pressure to remain (+))
○ (+) ties tend to be closed together
○ 4:
 Run BFS (from any node) → Form BFS tree:
 If ∃ edge connecting 2 nodes in same layer → Unbalanced
Theory of status: Story of soccer team
For each node X, ask how does skill of B compare to yours:
COMP 4641 Page 17

Directed graph
A give positive review to B

B has HIGHER status than A
A give negative review to B

B has LOWER status than A
- Question: Want to predict A → B based on X-A, X-B

A→B|X
NOTE: X-A, X-B can be of any direction, any sign (2 × 2
× 2 × 2 = 16 possible context)
- Measurement:
○ Baseline: User differ in fraction of (+) they give/receive:
 Generative baseline: Fraction of (+) given by U (of all
(+) given out)
 Receptive baseline: Fraction of (+) received by U (of all
(+) given out)
○ Surprise, in context: HOw
COMP 4641 Page 18

Homophily & Social Influence
Social-Affiliation Network
Homophily
- Def: Peo link to others similar to them

Agent to social network have characteristics:
 Non-mutable:
race, gender, age, …
 Mutable:
place to live, occupation, …
- Mechanism
Selection Social influence
Def Characteristics drive link Existing link shape peo's
formation mutable characteristics
A & B love dancing A start smoking → B
→ Form link follow - Def: Social network of peo + Affiliation network of peo & foci
Social policy Target characteristics Target "key players", let ○ Foci: Set of activities person participate
implication: family background, them positively influence ○ Affiliation network: Participation of some people in foci
Smoking … rest
prevention - Closure: 2 peo node have common neighbor → Form link
case
Homophily & Segregation
- Observation: Neighborhoods tend to segregated on race/culture basis
○ Triadic closure: Among peo only
○ Focal closure: Due to Selection

Karate introduce Anna to Daniel
○ Membership closure: Due to Social influence

Anna introduce Bob to Karate
- Schelling's grid model:

○ Rules:
 2 different agents: x, o
 Agents discover < k neighbors are same type → Interest in
moving to new cell
 Each round:
□ Consider unsatisfied agent in some order
□ Move unsatisfied agent to any unoccupied cell to satisfy
him
○ Result: Surprising relation between micro-behavior & macro-

outcomes:
Weak satisfying preferences for homophily can sufficiently
create complete segregation
○ Conclusion:
 ∃ Solution: No segregation, all agents satisfy
 Individual-based decision/mis-coordination → Segregation
COMP 4641 Page 19

Weak satisfying preferences for homophily can sufficiently
create complete segregation
○ Conclusion:
 ∃ Solution: No segregation, all agents satisfy
 Individual-based decision/mis-coordination → Segregation
COMP 4641 Page 20

Recommendation System
Hybrid Recommendation System

Rating: Utility function
- Augment collaborative filtering with content-based:

- Def: C × S → R ○ New item problem: Use item profile
C = Set of Customers ○ New user problem: Use demographics
S = Set of Items
R: Can be 0 .. 5 stars, … - Weighting: Combine different recommenders' results:
○ Linear combination
- Real problem: ○ Consensus scheme: Treat output of each recommender
○ Gather known rating: as set of votes
 Explicit:
□ Ask peo to rate - Switch between recommenders:
□ Not work well in practice ○ If content-based system cannot recommend with
 Implicit: sufficient confidence, use collaborative filtering
□ Learning rating from user action ○ However, both systems have "new user" problem
□ Not accurate in inferring low rating
- Feature combination:
○ Extrapolate unknown rating: ○ Treat collaborative info (other users' rating) as addition
feature
 Content-based
○ Apply content-based technique
 Collaborative
 Hybrid
Single-method Recommendation System
Content-based Collaborative filtering

Idea Recommend items similar to previous items highly - User-based: For given user:
rated by user • Find similar users whose ratings strongly correlated with him
Movie: Movies with same actors, director, • Recommend items highly rated by these similar users
genre, …
Book: Similar content - Item-based: For given user:
• Look into his rated items
• Compute how similar they are to others items not yet rated
• Select k most similar new items
COMP 4641 Page 21

Detail - Item profile: Set of features - Weight all users according to similarity with active user
s Movie: author, title, action, … a: Active user , u: Another user
Document: Set of "important" word
ra: rb: Rating vector for m items rated by BOTH a & u
- User profile: rx,j: Rating of user x for item j
○ Weighted avg of rated item profiles
○ Weight ∝ Deviation from avg rating 1
𝑟̅ = ⎯⎯ 𝑟 ,
𝑚
- Prediction heuristic:
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
u(c, s) = cos(c, s) = c .s / (|c| . |s|)
1
c: User profile 𝜎 = ⎯⎯ 𝑟 , − 𝑟̅
𝑚
s: Item profile
- TF.IDF: ∑𝒎
𝒊 𝟏 𝒓𝒂,𝒊 − 𝒓𝒂 𝒓𝒖,𝒊 − 𝒓𝒖
fij = Frequency of term ti in doc di 𝒄𝒐𝒗(𝒓𝒂 , 𝒓𝒖 ) = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝒎
ni = # doc mentioning term ti
N = Total # docs 𝒄𝒐𝒗(𝒓 ,𝒓 )
a,u-similarity: 𝒄𝒂,𝒖 = ⎯⎯⎯⎯⎯⎯⎯
𝒂 𝒖
𝝈 𝝈 𝒓𝒂 𝒓𝒃
𝒇𝒊𝒋
𝑻𝑭𝒊𝒋 = ⎯⎯⎯⎯⎯⎯⎯
𝐦𝐚𝐱 𝒇𝒌𝒋 Significance Weighting: Not trust correlation based on very few co-rated
𝒌
items
𝑵 𝒘𝒂,𝒖 = 𝒔𝒂,𝒖 𝒄𝒂,𝒖
𝑰𝑫𝑭𝒊 = 𝐥𝐨𝐠 ⎯⎯
𝒏𝒊 𝟏 𝒊𝒇 𝒎 > 𝟓𝟎
𝒔𝒂,𝒖 = 𝒎
⎯⎯⎯ 𝒊𝒇 𝒎 ≤ 𝟓𝟎
𝟓𝟎
TF.IDF score: 𝒘𝒊𝒋 = 𝑻𝑭𝒊𝒋 × 𝑰𝑫𝑭𝒊
- Select subset of users to serve as predictors:

Doc profile = Set of words with highest
TF.IDF score + Their scores • Method 1: Choose n users with largest wa,u
• Method 2: Choose users with wa,u > Threshold
- Rating prediction:
∑𝒏𝒖 𝟏 𝒘𝒂,𝒖 𝒓𝒖,𝒊 − 𝒓𝒖
𝒑𝒂,𝒊 = 𝒓𝒂 + ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
∑𝒏𝒖 𝟏 𝒘𝒂,𝒖
n selected neighbor users u = 1, 2, …, n

𝑟̅ : Avg rating of user a
- Present items with highest predicted rating as recommendations

Advan - No need for data on other users No feature selection needed
tages - No "cold-start"/sparsity problems
- Can recommend to users with unique tastes

- Can recommend new & unpopular items
- Can explain to user why item recommended by

showing features enabling it to recommended
Disadv ○ Define features - Cold start: New user/items: Few ratings → Can't make accurate recommendation
antage
s ○ Users' tastes must represented as learnable - Prediction based on nearest neighbor algorithm may inaccurate
function of these features
- Scalability:
○ Can't exploit other users' quality judgment  Nearest neighbor computation grow with # user + # items
 Solution: Isolate neighborhood generation & prediction
○ Neighborhood generation: Offline
○ Prediction: Online
- Popularity bias:
 Tend to recommend popular item
 Can't recommend to user with unique tastes
COMP 4641 Page 22

COMP 4641 - Social Information Networks Analysis and Engineering

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

COMP 4641 - Social Information Networks Analysis and Engineering

Uploaded by

Copyright:

Available Formats

Social Media & Networks Classical paper: The Strength of Weak Ties

Tuesday, March 21, 2017 3:34 PM

- Question: How large scale pattern (micro → macro) emerge?

- Previously: Strong ties considered important

- Why study network:

- What to study in networks:

- Challenges of studying social media:

Classical paper: Experimental Study of Small World Problem

- Method: Use Twitter to predict box-office return for movies

COMP 4641 Page 1

- Result: Surprisingly accurate

48% of 64 chains falls into 3 people

- Implication: Not only short chains exist

- Comment: Do movies studios only promote movies they expect to be hit?

Leveraging Social Media: Meme-tracking & Dynamic of News Cycle

- Question: Can social media shed light on info flow?

COMP 4641 Page 2

- Deg ku: # of edges adj to node u

○ Avg path length:

○ OUT: Accessible from SCC, but not link back to SCC

○ TENDRILS: Can't reach SCC, can't reached from SCC

○ TUBES: Links IN → OUT without through SCC

○ Avg clustering coef:

- Complete (undirected) graph: All node adj to each other

Fundamentals of directed graph

- In-degree indeg(u): # edges pointing to u

- Directed Acyclic graph (DAG): No cycle

COMP 4641 Page 3

Small World Model

- Degree distr: Binomial

- Intention: ↑ Rewire → ↑ Randomness → Interpolate between regular lattice &

- Clustering coef: - Observation:

- Connectivity: Strongly controlled by np (phase transition) - Model weakness:

COMP 4641 Page 4

- Link removal by edge strength (Low strength = weak link):

Oij = 0 → Local bridge

Low → High: Network disconnected sooner, collapse

Network Constraints Measure

- Network spanning structural holes: Source of social capital

𝒄𝒊 = 𝒄𝒊𝒋 = 𝒑𝒊𝒋 + 𝒑𝒊𝒌 𝒑𝒌𝒋

ci: i's network constraints

COMP 4641 Page 5

- Input: Undirected graph

○ For each node u:

- Community: Sets of tightly connected node

- Modularity Q: Measure how well network partitioned into

0.3 < Q < 0.7: Significant community structure

- Application to Girvan-Newmann algorithm:

COMP 4641 Page 6

Centrality Measures for Undirected Graph

Degree Centrality Closeness Centrality Betweeness Centrality

Node positions according to centrality measure

Low Degree Low Closeness Low Betweeness

Centrality Measures for Directed Graph

Degree Prestige Closeness Prestige Betweeness Prestige Directed Geodesic

Iu (u's influence range): Set of gvk(u): # shortest path v → k passing

COMP 4641 Page 7

NOTE: gvk ≠ gkv

Freeman's Network Centrality

C*(n) = max(C(u)) Repeat t steps: 𝒙⃗(𝒕) = 𝑨𝒕 𝒙⃗(𝟎)

CD ∈ [0, 1] - Alternative method to compute 𝒙⃗(𝒕)

Let 𝒗𝟏⃗, 𝒗𝟐⃗, … , 𝒗𝒏⃗ eigenvector of M

Supplement defs & theorem: Consider 𝒙⃗ not orthogonal to 𝒗𝟏⃗

⇒ Finding 𝒙⃗(𝒕) is same as finding leading eigenvalue of (A + I)