You are on page 1of 22

Social Media & Networks Classical paper: The Strength of Weak Ties

Tuesday, March 21, 2017 3:34 PM

(Ganovetter, 1973)

- Question: How large scale pattern (micro → macro) emerge?

○ Micro-lvl interaction within small group
○ Macro-lvl patterns within society

- Previously: Strong ties considered important

close friends, family

- Idea:
○ Tradic closure: 2 people have common strong tie → Must have tie
between each other
○ Bridge:
 Link that is only path between 2 users
→ Must be weak ties (according to tradic closure)
 However, help connect community:
□ Must necessarily carry new info
□ Help map micro → macro
⇒ Strength of weak ties

- Why study network:

○ Natural fit with interaction: Users only interact with small subset of
○ Can predict behavior with network view
○ Social impact:
drug design

○ Universality: Networks from science, nature & tech more similar than
○ Shared vocabulary between fields

- What to study in networks:

○ Structure Classical paper: Neocortex Size as Constraint Group Size in Primates
○ Evolution (How network become to such structure) (Robin D, 1989)
○ Processes & Dynamics
How info, behaviors, dieseases spread in network?
- Hypothesis: Large brain size due to "social" nature of primate
- Why study social media: ○ Measure "social" lvl by looking at typical group size
○ Observe social interaction at scale: ○ If true, then brain size should correlate with being "social"
 ↑ Confidence in results
 Certain effects only seen at scale - Findings:
○ Data availability:
 ↓ Field work
 Sites have complete history record → Provide entire evolution of
user groups

- Challenges of studying social media:

○ Miss many local interaction
○ Links "mean" less
○ Hard to compare networks

Classical paper: Experimental Study of Small World Problem

(Jeffery T, Stanley M, 1969) - Implication: # rel individual rel maintainable bounded by neocortex size

- Experiment:
○ Select 296 peo in Nebraska & Boston
○ Mail packet to them:
 Specific destination person
 Ask to forward to someone know personally
Leveraging Social Media: Predicting Future with Social Media
- Question: How long are successful path (Sitaram A, Bernardo H)

- Findings:
- Question: Can we extract info from conversation on social media networks?
(Collective wisdom)

- Method: Use Twitter to predict box-office return for movies

○ Search data repeatedly for each movie 3 weeks around release date
○ Make prediction
 How many tweets before movie release date?

COMP 4641 Page 1

- Method: Use Twitter to predict box-office return for movies
○ Search data repeatedly for each movie 3 weeks around release date
○ Make prediction
 How many tweets before movie release date?
 Compare with change of Hollywood Stock Exchange index

- Result: Surprisingly accurate

48% of 64 chains falls into 3 people

- Implication: Not only short chains exist

○ But people can find them
○ With only local information
⇒ Social network navigable

- Comment: Do movies studios only promote movies they expect to be hit?

Leveraging Social Media: Meme-tracking & Dynamic of News Cycle

(Jure L, Lars B, Jon K, 2009)

- Question: Can social media shed light on info flow?

- Method:
○ Collect data over 90 milion doc
○ Quote clustering on graph
 Graph structure: Node = Quote
Edge (weighted) = Inclusion relation
 Remove all but strongest outgoing edge

- Findings:
○ Nature of 24-hour news cycle: Memes quickly enter & leave
collective conscience
○ Media access peak 2.5 hours before blog access peak
○ Blog access volume persists much longer

COMP 4641 Page 2

Basic Network Properties Fundamentals of undirected graph
Tuesday, March 21, 2017 5:11 AM

- Deg ku: # of edges adj to node u

○ Avg deg:
Networks vs Graphs 𝟏
𝒌 = ⎯⎯
𝒌𝒊 = ⎯⎯⎯
𝒊 𝟏

Network Graph ○ Deg distr: Prob randomly chosen node has deg k
- Refer to real sys - Math representation of network 𝑷(𝒌) = ⎯⎯⎯
- Terminology: node, link - Terminology: vertex, edge Nk: # nodes having deg k
N: Total # nodes
Usually network & graph used interchangeably
- Distance/Path length huv: # edges along shortest path connecting 2 nodes
○ huu = 0
huv = ∞ if no path u → v

○ Avg path length:

𝟏 𝟏
Graph Structure of the Web 𝒉 = ⎯⎯⎯⎯⎯⎯ 𝒉𝒖𝒗 = ⎯⎯⎯⎯⎯⎯⎯⎯ 𝒉𝒖𝒗
𝟐𝑬𝐦𝐚𝐱 𝒏(𝒏 − 𝟏)
(𝒖,𝒗) (𝒖,𝒗)
(Ignore disconnected node pairs)
- Modelling method: Directed graph
Edge (u, v) ⇔ Website u has hyperlink to v ○ Diameter: Max distance between any pair of nodes in graph
(⇔ "longest" shortest path)
- Resulting structure: Bow-tie
○ 1 giant SCC - Clustering coef Cu: Portion of u's neighbor which also neighbor of each other
𝑪𝒖 = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝒌𝒖 (𝒌𝒖 − 𝟏)
○ IN: Can reach SCC, but can't reached from SCC
eu: No of edges among u's neighbors
New sites not yet discovered

○ OUT: Accessible from SCC, but not link back to SCC

Corporate website (only internal link)

○ TENDRILS: Can't reach SCC, can't reached from SCC

○ TUBES: Links IN → OUT without through SCC

○ Avg clustering coef:

𝑪 = ⎯⎯ 𝑪𝒖

- Complete (undirected) graph: All node adj to each other

𝑵(𝑵 − 𝟏)
𝑬𝒎𝒂𝒙 = ⎯⎯⎯⎯⎯⎯⎯⎯⎯

Fundamentals of directed graph

- In-degree indeg(u): # edges pointing to u

- Out-degree outdeg(u): # edges starting from u
- Avg deg:
𝒌 = ⎯⎯
- Issues: Structure discovered depend on crawling process
- Strongly connected component (SCC): All node in component can visit
each other (A → B, B → A)
- Weakly connected component: All node can visit each other in
"undirected" version component

- Directed Acyclic graph (DAG): No cycle

COMP 4641 Page 3

Small-World Phenomena
Tuesday, March 21, 2017 5:41 AM
Real network vs Simple graph

Small-world phenomenon
Real network Simple Graph
Giant Exist Exist
Typical length of shortest path usually small connected (NOT emerge through (emerge through phase
Small-world experiment [Milgram 1967): 6 deg of component phase transition) transition)
Avg path length Small Small
Clustering coef Big Small
(no local structure)
Simple Graph Model Gnp Deg distr Power law Binomial
(Erdos-Renyi, 1960)

- Def:
○ Undirected, n nodes
○ Each edge (u, v) appear with prob p

Small World Model

- Edge properties:
(Watts-Strogatz 1998)
𝒏(𝒏 − 𝟏)
○ 𝑬𝒎𝒂𝒙 = ⎯⎯⎯⎯⎯⎯⎯⎯
○ Prob Gnp-type graph has E edges:
𝑷(𝑬) = 𝒎𝒂𝒙 𝒑𝑬 (𝟏 − 𝒑)𝑬𝒎𝒂𝒙 𝑬

- Degree distr: Binomial

𝑃(𝑘) = 𝑝 (1 − 𝑝)
𝑘 = (𝑛 − 1)𝑝
𝜎 = 𝑝(1 − 𝑝)(𝑛 − 1)

𝜎 1−𝑝 1
⎯⎯= ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝑘 𝑝 𝑛−1
(n (Graph size) ↑ → ⎯ ↓ → 𝜎 ≪ 𝑘 → Distr narrower → ↑
Confidence that node deg around 𝒌)
- Construction:
○ Initialization: Low-dim regular lattice
Each node connected to α nearest neighbor
(high clustering coef, high diameter)

○ Rewiring:
 (1) For each node, clockwise around ring:
□ Select edge connecting it to its nearest neighbor
□ With prob p, reconnect this edge to node chosen uniformly
over entire ring
 (2) Repeat (1) for 2nd, 3rd, …, αrd nearest neighbor

- Intention: ↑ Rewire → ↑ Randomness → Interpolate between regular lattice &

random graph

- Clustering coef: - Observation:

2𝑒 2 ○ Lots of rewiring required → ↓ Clustering coef
𝐸(𝐶 ) = 𝐸 ⎯⎯⎯⎯⎯⎯⎯⎯⎯ = ⎯⎯⎯⎯⎯⎯⎯⎯⎯𝐸(𝑒 )
𝑘 (𝑘 − 1) 𝑘 (𝑘 − 1) ○ BUT: Very small amount of newly-created long-range edges (result of
𝑘 (𝑘 − 1) rewiring) → ↓ Diameter
𝐸(𝑒 ) = 𝑝 ⎯⎯⎯⎯⎯⎯⎯⎯⎯
⇒ 𝑬(𝑪𝒊) = 𝒑 = ⎯⎯⎯⎯⎯ - Model strength:
(For fixed 𝒌: n (Graph size) ↑ → C ↓) ○ Provide insight into interplay between clustering & small-world
○ Capture structure of many realistic networks
- Avg path length: small ○ Explain high clustering of real network

- Connectivity: Strongly controlled by np (phase transition) - Model weakness:

np < 1: Small components ○ Can't explain real network's deg distr
np > 1: Giant connected component emerge ○ Not enable navigation among clusters (communities)

COMP 4641 Page 4

Community Structure in Network Testing of tie strength in real network
Tuesday, March 21, 2017 6:16 AM
(Onnela 2007)

- Data:
○ Cell-phone network of 20% country's population
Basic concepts
○ Edge strength: Aggregated call duration

- Link removal by edge strength (Low strength = weak link):

- Bridge edge: If removed → Graph disconnected
○ Low → High: Network disconnected sooner

- Local bridge: ○ High → Low: Network gradually shrink, but not collapse
○ Edges whose endpoints have no common neighbor
○ Exclude bridge - Link removal by edge overlap (Low overlap = weak link)

- Edge overlap:
|𝑵𝒆𝒊𝒈𝒉𝒃𝒐𝒓 𝒐𝒇 𝒊 𝒂𝒏𝒅 𝒋| |𝑵(𝒊) ∩ 𝑵(𝒋)|
𝑶𝒊𝒋 = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
|𝑵𝒆𝒊𝒈𝒉𝒃𝒐𝒓 𝒐𝒇 𝒊 𝒐𝒓 𝒋| |𝑵(𝒊) ∪ 𝑵(𝒋)|
N(i): Set of i's neighbor (excluding j)

Oij = 0 → Local bridge

Low → High: Network disconnected sooner, collapse

- Conclusion:
○ Weak ties crucial for maintaining network's structural integrity
○ Strong ties important for maintaining local communities

Structural Hole

Network Constraints Measure

- "Empty space" in network between sets of nodes not To what extent person's contact redundant
interacting closely

- Network spanning structural holes: Source of social capital

Node's "performance" neg associate with its network


𝒄𝒊 = 𝒄𝒊𝒋 = 𝒑𝒊𝒋 + 𝒑𝒊𝒌 𝒑𝒌𝒋

𝒋 𝒋 𝒌

ci: i's network constraints

pij: Proportion of i's energy invested in rel with j

1 2 3 4 5
1 0 1/4 1/4 1/4 1/4
2 1/2 0 0 0 1/2
3 1 0 0 0 0
4 1/2 0 0 0 1/2
5 1/3 1/3 0 1/3 0
(Assumption: All link equally important)

1 1 1 1 1 1
𝑐 = (𝑝 + 𝑝 𝑝 ) + (𝑝 + 𝑝 𝑝 ) = ⎯⎯+ ⎯⎯× ⎯⎯ + ⎯⎯+ ⎯⎯× ⎯⎯
2 2 4 2 2 3

COMP 4641 Page 5

5 1/3 1/3 0 1/3 0
(Assumption: All link equally important)

1 1 1 1 1 1
𝑐 = (𝑝 + 𝑝 𝑝 ) + (𝑝 + 𝑝 𝑝 ) = ⎯⎯+ ⎯⎯× ⎯⎯ + ⎯⎯+ ⎯⎯× ⎯⎯
2 2 4 2 2 3

Girvan-Newman algorithm

- Input: Undirected graph

- Output: Hierarchical decomposition of network

Edge betweeness
- Step: Repeat until no edges left
○ Calculate edge betweeness
○ Remove edges with highness betweeness (may remove ≥ 2 edges)
- # shortest path passing over particular edge
○ Connected component are communities

- Computation:
○ Initialize betweeness(u, v) = 0 ∀u, v

○ For each node u:

 BFS to find f(v): # shortest path u → v

 Initialize nodeFlow(v) = 0 ∀v ≠ u
 Go upward from lowest node on BFS tree:
□ nodeFlow(v) += 1
□ For each node u higher than v on BFS tree & edge (u, v)
Δ = f(u) / f(v) * nodeFlow(v)
betweeness(u, v) += Δ
After step 1: [1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14]
nodeFlow(u) += Δ
After step 2: [1, 2, 3], [4, 5, 6], [7], [8], [9, 10, 11], [12], [13], [14]
After step 3: [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]
○ Divide all betweeness() by 2

Network Community

- Community: Sets of tightly connected node

- Modularity Q: Measure how well network partitioned into

Given that we divide graph into several communities

𝟏 𝒌𝒊 𝒌𝒋
𝑸 = ⎯⎯⎯ 𝜹𝒊𝒋 𝑨𝒊𝒋 − ⎯⎯⎯⎯
𝟐𝒎 𝟐𝒎

m: Total # edges
δij =1, if i & j assigned to same community
0, otherwise
Aij = 1, if edge (i, j) exist
0, otherwise

0.3 < Q < 0.7: Significant community structure

- Application to Girvan-Newmann algorithm:

Best network decomposition = Highest Q
(NOTE: When calculate Q, use original graph, NOT graph with
edges removed after each step)

COMP 4641 Page 6

Centrality Measures
Tuesday, March 21, 2017 6:48 AM

Centrality Measures for Undirected Graph

Degree Centrality Closeness Centrality Betweeness Centrality

Idea How neighbor-connective a - How "close" to others a node is How many individual pairs have to go through a
node is - Not only consider neighbor node to reach another
Formula 𝑪𝑫 (𝒖) = 𝒌𝒖 𝒏
𝟏 𝒈𝒗𝒌 (𝒖)
𝑪𝑩 (𝒖) = ⎯⎯⎯⎯⎯⎯
𝑪𝑪 (𝒖) = 𝒉𝒖𝒗 𝒈𝒗𝒌
𝒗 𝒌
𝒗 𝟏 gvk(u): # shortest path v → k passing
through u
gvk: Total # shortest path between v, k
Formula, with 𝑪𝑫
𝑪𝑫 (𝒖) = ⎯⎯⎯⎯⎯
𝑪𝑪 (𝒖) = (𝑵 − 𝟏)𝑪𝑪 (𝒖) 𝟐
𝑪𝑩 (𝒖) = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯𝑪𝑩 (𝒖)
normalization 𝑵−𝟏 (𝒏 − 𝟏)(𝒏 − 𝟐)

Node positions according to centrality measure

Low Degree Low Closeness Low Betweeness

High Degree Located in cluster far away from - Many neighbor
rest of network - Few shortest path going through
High Closeness - Close to many people Many shortest path between same
- Few neighbor node pair in network
High Betweeness - Located in "bridge" Very rare
connecting 2 clusters

Centrality Measures for Directed Graph

Degree Prestige Closeness Prestige Betweeness Prestige Directed Geodesic

𝑪𝑫 = 𝒊𝒏𝒅𝒆𝒈(𝒖) 𝟏 𝒈𝒗𝒌 (𝒖) A node does not necessarily lie on
𝑪𝑪 = ⎯⎯⎯ 𝒉𝒗𝒖 𝑪𝑩 (𝒖) = ⎯⎯⎯⎯⎯⎯
|𝑰𝒖 |
𝒈𝒗𝒌 geodesic j → k if it lies on
geodesics k → j

Iu (u's influence range): Set of gvk(u): # shortest path v → k passing

vertices able to reach u through u
gvk: Total # shortest path v → k

COMP 4641 Page 7

gvk: Total # shortest path v → k

NOTE: gvk ≠ gkv

𝑪𝑩 (𝒖)
𝑪𝑩 (𝒖) = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
(𝑵 − 𝟏)(𝑵 − 𝟐)

Eigenvector Centrality

Freeman's Network Centrality

- Principle: Node importance increased by having neighbors who themselves
also important
- Measure centrality of whole network:
Larger network → More likely single node is - Computation:
quite central, remaining less central Centrality vector 𝒙⃗ , Adj matrix A

- Calculation: Initially 𝒙𝒊 = 𝟏 ∀𝑖
∑ 𝒖 𝑪∗ (𝒏) − 𝑪(𝒖) Obtain better estimation: 𝒙𝒊 = ∑ 𝒋 𝑨𝒊𝒋 𝒙𝒋
𝑪𝑫 = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
(𝑵 − 𝟏)(𝑵 − 𝟐) Matrix form: 𝒙⃗ = 𝑨𝒙⃗

C*(n) = max(C(u)) Repeat t steps: 𝒙⃗(𝒕) = 𝑨𝒕 𝒙⃗(𝟎)

C(u): Can use CD(u), CC(u) or CB(u) (Intuitively, want 𝑥⃗(𝑡) converge)

CD ∈ [0, 1] - Alternative method to compute 𝒙⃗(𝒕)

Let 𝒗⃗ be eigenvector of A
⇒ 𝒗⃗ is also eigenvector of M = A + I

Let 𝒗𝟏⃗, 𝒗𝟐⃗, … , 𝒗𝒏⃗ eigenvector of M

λ1, λ2, … , λn: Corresponding eigenvalue

Supplement defs & theorem: Consider 𝒙⃗ not orthogonal to 𝒗𝟏⃗

𝑥⃗ = 𝛼 𝑣 ⃗ + 𝛼 𝑣 ⃗ + ⋯ + 𝛼 𝑣 ⃗
○ Primitive matrix M: Nonneg, square
∃k > 0 (k ∈ Z+): Mk strictly positive 𝑀 𝑥⃗ = 𝛼 𝑀 𝑣 ⃗ + 𝛼 𝑀 𝑣 ⃗ + ⋯ + 𝛼 𝑀 𝑣 ⃗
= 𝜆 𝛼 𝑣⃗ + 𝜆 𝛼 𝑣⃗ + ⋯ + 𝜆 𝛼 𝑣 ⃗
○ Perron-Frobenius theorem:
M: n×n primitive matrix 𝑀 𝑥⃗ 𝜆 𝛼 𝑣⃗ 𝜆 𝛼 𝑣⃗
⎯⎯⎯⎯ = 𝛼 𝑣 ⃗ + ⎯⎯⎯⎯⎯⎯ + ⋯ + ⎯⎯⎯⎯⎯⎯⎯
⇒ ∃ eigenvalue λ1 such that: 𝜆 𝜆 𝜆
λ1 > 0
λ1 has unique eigenvector 𝒗𝟏⃗: v1i > 0 ∀i (all entries 𝑴𝒌 𝒙⃗
𝐥𝐢𝐦 ⎯⎯⎯⎯ = 𝜶𝟏 𝒗𝟏⃗
positive) 𝒌→ 𝝀𝒌𝟏
λ1 > |λ| ∀ eigenvalue λ ≠ λ1 (λ1 is largest eigenvalue) ⇒ 𝑴𝒌 𝒙⃗ ∝ 𝒗𝟏⃗

𝒙⃗(𝒕) ∝ 𝒗𝟏⃗
𝒗𝟏⃗ is eigenvector of M = A + I
⇒ Can regard 𝒗𝟏⃗ as eigenvector centrality (as we only consider
relative difference among entries of 𝒙⃗(𝒕))

⇒ Finding 𝒙⃗(𝒕) is same as finding leading eigenvalue of (A + I)

and its corresponding eigenvector 𝒗𝟏⃗

COMP 4641 Page 8

Network Formation Process
Tuesday, March 21, 2017 7:11 AM
Power Law Distribution
Deg distr of many real networks

Heavy Tailed Distribution

Distr P(X > x) heavy tailed if:

𝑷(𝑿 > 𝒙)
𝐥𝐢𝐦 ⎯⎯⎯⎯⎯⎯⎯⎯ =∞
𝒙→ 𝒆 𝝀𝒙

NOT Heavy tailed:

( )
Normal PDF: 𝑝(𝑥) = ⎯  ⎯⎯⎯

Exponential PDF: 𝑝(𝑥) = 𝜆𝑒
⇒ 𝑝(𝑋 > 𝑥) = 𝑒
- Form: p(x) = Zx-α
Heavy tailed: Set cutoff value xm
Power law: 𝑃(𝑥) = 𝑥
Stretched exponential: 𝑃(𝑥) = 𝑥 𝑒 - Normalizing const Z:
( )
Log-normal: 𝑃(𝑥) = ⎯𝑒𝑥𝑝 − ⎯⎯⎯⎯⎯⎯⎯ 1= 𝑝(𝑥)𝑑𝑥 = 𝑍 𝑥 𝑑𝑥

= ⎯⎯⎯⎯⎯⎯ [𝑥 ] = − ⎯⎯⎯⎯⎯[∞ −𝑥 ]
−𝛼 + 1 𝛼−1

Assume α > 1 ⇒ ∞1-α = 0

Scale-Free Network ⇒ 𝒁 = (𝜶 − 𝟏)𝒙𝜶𝒎 𝟏

𝜶−𝟏 𝒙
𝒑(𝒙) = ⎯⎯⎯⎯⎯ ⎯⎯⎯
- Network with deg distr's tail in power law form 𝒙𝒎 𝒙𝒎

- Name origin: - Expectation:

○ Scale invariance: Whatever scale we look at,
𝐸(𝑋) = 𝑥𝑝(𝑥)𝑑𝑥 = 𝑍 𝑥 𝑑𝑥
distr look "same" (No characteristic scale)
𝑍 (𝛼 − 1)𝑥
= ⎯⎯⎯⎯⎯⎯ [𝑥 ] = − ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯[∞ −𝑥 ]
○ Scale-free function: f(ax) = aλf(x) −𝛼 + 2 𝛼−2
Power law: f(ax) = (ax)λ = aλxλ = aλf(x)
Assume α > 2 ⇒ ∞2-α = 0
⇒ 𝑬(𝑿) = ⎯⎯⎯⎯⎯𝒙𝒎

- Variance:
𝑉𝑎𝑟(𝑋) = 𝐸 𝑋 − [𝐸(𝑋)]
Estimation of Power-law Exponent α
𝐸 𝑋 = 𝑥 𝑝(𝑥)𝑑𝑥 = 𝑍 𝑥 𝑑𝑥

𝑍 (𝛼 − 1)𝑥
= ⎯⎯⎯⎯⎯⎯ [𝑥 ] = − ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯[∞ −𝑥 ]
−𝛼 + 3 𝛼−3
- Method 1: Fit line on log-log graph using least squares
Assume α > 3 ⇒ ∞3-α = 0
⇒ 𝐸 𝑋 = ⎯⎯⎯⎯⎯𝑥

𝜶−𝟏 𝜶−𝟏
⇒ 𝑽𝒂𝒓(𝑿) = ⎯⎯⎯⎯⎯𝒙𝟐𝒎 − ⎯⎯⎯⎯⎯𝒙𝒎
𝜶−𝟑 𝜶−𝟐

- In real network: 2 < α < 3:

E(X) = const
Var(X) = ∞

𝒚 = 𝒃𝒙 𝜶 Model explaining Power Law Degree Distribution

⇒ 𝐥𝐨𝐠(𝒚) = 𝐥𝐨𝐠(𝒃) − 𝜶 𝐥𝐨𝐠(𝒙)
Gradient = -α

- Preferential attachment: "Rich get richer":

Not good: log-log graph tends to "spreading" for large k
○ Nodes arrive in order 1, 2, …, n
For large k: N(k) usually very "equally" small (typical
○ New node j create m new link
value 0, 1, 2, …)
(Usually: Graph used to generate deg distr is small 𝑷(𝒋 → 𝒊) = ⎯⎯⎯⎯⎯
∑ 𝒖 𝒌𝒖
⇒ ↑ Graph size → ↓ "Spreading" phenomenon)
u: Previously created node

- Method 2: Plot complementary CDF (CCDF) P(X ≥ x)

New citation to paper proportional to number it already has

- Exact model:
○ Graph formation:
 Nodes arrive in order 1, 2, .., n

COMP 4641 Page 9

- Exact model:
○ Graph formation:
 Nodes arrive in order 1, 2, .., n

 New node j create ONLY 1 out link (by doing EITHER (1) or (2))
□ (1) With prob p, j link to i chosen randomly, uniformly from
previous node

□ (2) With prob (1 - p):

 Randomly uniformly, choose node i previously created
 Link j to u which i points to

𝑷(𝑿 ≥ 𝒙) = 𝑝(𝑗) ≈ 𝑍𝑗 𝑑𝑗 ((2) ⇔ With prob (1 - p), j link to u with prob ∝ indeg(u))

𝑍 𝒁 (𝜶 𝟏) 𝟏
= ⎯⎯⎯⎯⎯⎯ [𝑗 ] = ⎯⎯⎯⎯⎯𝒙 𝟏 ⎯⎯⎯
−𝛼 + 1 𝜶−𝟏 ○ 𝑷(𝒌) ∝ 𝒌 𝟏 𝒑

𝜶 = 𝟏 + ⎯⎯⎯⎯⎯
𝒚 = 𝑷(𝑿 ≥ 𝒙)
𝒁 ○ Behavior:
⇒ 𝐥𝐨𝐠(𝒚) = 𝐥𝐨𝐠 ⎯⎯⎯⎯⎯ − (𝜶 − 𝟏 ) 𝐥𝐨𝐠(𝒙)
𝜶−𝟏  p → 1:
Gradient = -(α - 1) □ Link formation mainly based on uniform random choices
□ α→∞
Better estimation: Aggregate N(k) → ΣN(k) big for large k □ Few nodes with large indeg

 p → 0:
□ Growth of network strongly governed by "rich-get-richer" behavior
□ α→2
□ Many nodes with large indeg

COMP 4641 Page 10

Network Effects & Cascading Behavior
Wednesday, May 10, 2017 10:31 PM

Network cascades Diffusion Models

- Contagion spreading over edges of network Decision-based model Probability model

- Create propagation tree
- Node observe neighbors' - Node get influenced with some prob
decisions → Make own decision from already influenced neighbors

- Application: product adoption, - Application: disease spreading, …

decision making, …

Bio: Diseases
Social: Viral marketing

COMP 4641 Page 11

Decision-based Model of Diffusion
Monday, May 15, 2017 10:19 PM

Game-theoretic Model: Properties

Game-theoretic Model of Cascade - Monotonic spreading: Node only switch A → B, NEVER back B → A
Proof (by contradiction):
Let u = First node switching B → A at time t
- Rules: ⇒ 𝑝(𝑡) ≤ ⎯⎯⎯⎯⎯
○ Each node: Can choose only 1 of 2 actions A/B
○ Adj pair's payoff matrix:
At time t' < t: u switch A → B
 2 A: Each payoff = a > 0 𝑏
 2 B: Each payoff = b > 0 ⇒ 𝑝(𝑡 ) > ⎯⎯⎯⎯⎯≥ 𝑝(𝑡)
 1 A + 1 B: Each payoff = 0 ⇒ During time t' → t: # u's neighbor using A ↓
⇒ Impossible, because u is the first node switching B → A
- Single node decision process:
- Cascade capacity: Max q → ∃ finite set S can cause cascade
○ Capacity ↑ → Cascade more easily
○ ∀ graph G: Cascade capacity ≤ 1/2

- Stopping cascade:
○ Cluster C with density ρ: All C's nodes has ≥ ρ fraction of edges in C

Node v
d neighbors: pd already use A,
(1 - p)d already use B
(0 ≤ p ≤ 1) ○ Stopping condition:
S: Initial set of A's adopter
v's payoff = apd if v use A G\S contains cluster with density > (1 - q)
= b(1 - p)d if v use B ⇔ S cannot cause cascade

v use A if:
𝑎𝑝𝑑 > 𝑏(1 − 𝑝)𝑑
⇔ 𝒑 > 𝒒 = ⎯⎯⎯⎯⎯

Interpretation example:
Game-theoretic Model: Cascade in Infinite Graph
If > 50% of my friends take A, I'll also take A
𝒃 𝟏 (each node has FINITE neighbors)
⇔ 𝒒 = ⎯⎯⎯⎯⎯ ≥ ⎯⎯⇔ 𝑏 ≥ 𝑎
𝒂+𝒃 𝟐

- Infinite path: q < 1/2 → Cascade

- Infinite tree: q < 1/3 → Cascade

Extended Model of Cascade

- Rules:
○ Each node: Can use both A & B
○ Adj pair's payoff matrix:
 AB - A: Each payoff = a
 AB - B: Each payoff = b
 AB - AB: Each payoff = max(a, b) - c
(c = Dual-maintenance cost)

- Infinite grid: q < 1/4 → Cascade

- Single node decision process:

○ Initialization:
 Infinite path, all B
 b=1

Consider node w

COMP 4641 Page 12

 Infinite path, all B
 b=1

Consider node w

○ Case 1:

Payoff = a if w choose A
=1 if w choose B
= a + 1 - c if w choose AB

Extended Model of Cascade: Analysis Summary

○ Case 2:

Payoff = a if w choose A
=1+1=2 if w choose B - Present condition:
=a+1-c if w choose AB ○ Default B
○ Better A comes

- Future scenarios:
○ Infiltration (B → AB → A):
 A & B too compatible with each other
 People first use both, but gradually drop B

○ Direct conquest (B → A):

 A & B too incompatible
 People immediately drop B, pick A

○ Buffer zone (B → AB):

 A complements B
 People use both

COMP 4641 Page 13

Probabilistic Model of Diffusion
Monday, May 15, 2017 11:06 PM

Virus' Spreading: SIR Model

Virus' Spreading Model Family - Phases:

○ Susceptible: No disease
○ Infectious: Get disease, can attack Susceptible
○ Recovered: Healed, no more infected

- Model development:
Pop N = S + I + R
P(Contact with S-node) = S / N
P(S → I) = p

Per unit time, per I-node:

- Params: # Contact made = cN
○ Birth rate β = P(Infected node attack neighbors) # Contact with S-node made = cN × S / N = cS
○ Death rate δ = P(Infected node healed) # S→I conversion made = pcS

- Virus strength s = β / δ Per unit time: No of S → I = pcSI = βSI (β = pc)

⇒ ⎯⎯⎯= −𝜷𝑺𝑰
- Epidemic threshold τ: 𝒅𝒕
𝝉 = ⎯⎯⎯
𝝀𝟏,𝑨 Similarly, we have:
λ1,A: Max eigenvalue of adj matrix A 𝒅𝑹 𝒅𝑰
⎯⎯⎯= 𝜹𝑰 , ⎯⎯⎯= 𝜷𝑺𝑰 − 𝜹𝑰
𝒅𝒕 𝒅𝒕
○ s < τ: Disease dies out
- Observation: S always ↓ , R always ↑
○ s > τ: Epidemic happens

- Typical assumption: Network topology not considered (Every

node has equal contact to others)

Independent Cascade Model

- Model:
○ Directed finite graph G = (V, E)
○ S: Initial set of "active" nodes
○ Edge (v, w):
 Pvw = Prob of node v, if active, also make neighbor w
active Virus' Spreading: SIS Model
 v only have 1 chance to make w

- Limitation: Many params → Hard to estimate from data

- Model development:
𝒅𝑰 𝒅𝑺
⎯⎯⎯= 𝜷𝑺𝑰 − 𝜹𝑰 , ⎯⎯⎯= −𝜷𝑺𝑰 + 𝜹𝐼
𝒅𝒕 𝒅𝒕

- Observation:
○ Case 1: t → ∞, I → 0
○ Case 2: Disease remains infinitely

Exposure & Adoption Model

- Model:

COMP 4641 Page 14

Exposure & Adoption Model

- Model:
○ States:
 Exposure: Node exposed to contagion by neighbors
 Adoption: Node act on contagion

Exposure & Adoption Model: Exposure Curve Modelling

Twitter case

○ Params: Exposure curve:

P(Adopt new behavior) = f(# adopted neighbors) - Question: Given user X & hashtag H
How successive exposure to H affect P(X mention H)?

- Development:
○ User X:
 Not yet mentioned H
 But k neighbors already

- Observation in viral marketing: ○ p(k) = P(X mention H before (k+1)th neighbors do so)
Too much marketing incentives ⇔ p(k) = Fraction of users adopting H after kth exposure
→ ↑ Recommendation from neighbor (in social network)
→ ↓ Effectiveness ○ Stickiness of H = max(p(k))

 Incentives:
□ Product recommender: 10%
□ First recommendee purchasing same item: 10% discount
 Effectiveness: p(Buy item) = f(# Recommendation received)
□ # ≤ 3: p ≈ const
□ # > 3: p ↓

Internal & External exposures

- Sources of exposures:
○ Internal: From inside network
User see URLs posted by friends

○ External: From outside network

User see URLs from source outside social network

Twitter case:
○ Trace emergence of URL, label each URL by its topic:
○ Findings:
 Topic's Max(p(k)):
P(Retweet | Art, edu article) < P(Retweet | Entertainment)

 k at Max(p(k)):
World news reach max infectious earliest → More time sensitive

 Decline of p(k) over time (Viral duration):

Topic with small "k at max(p(k)) tend → Short Viral duration

 External vs Internal exposure: Political news most externally driven

COMP 4641 Page 15

Influence Maximization in Network
Tuesday, May 16, 2017 12:48 AM

Vertex Cover Problem (NP-complete)

Influence Maximization (NP-complete)

- Def:
Given: U = {u1, …, un}
X1, …, Xm ⊆ U
- Formulation: Check: ∃ k sets: Xi1 ∩ Xi2 ∩ … Xik = U
S: Initial active set
- Approach: Influence Maximization in bipartite X-U graph
f(S): Expected size of final active set
"Expected" = Random process
𝑓(𝑆) = ⎯⎯ 𝑓 (𝑆)
fi(S): ith realization of S

- ↑ f(S) → S more influential

- Problem: Max f(S)

Edge (Xi, uj) if uj ∈ Xi

Find S = {Xi1, Xi2, … Xik} → f(S) = k + n

- Approximation algorithm:
Simulation Experiment: Collaboration network
S0 = {}
for i = 1 … m:
Choose remaining Xj → Max f(Si ∪ {Xj}}
- Data: Co-authorship in papers of arXiv high-energy physics theory
Si = Si ∪ {Xj}

- Use Independent Cascade model

Evaluate f({X1}), …, f({Xm}) → Pick max at f({X2})
○ Case 1: All edge have prob p
Evaluate f({X2, X1}), f({X2, X3}), …, f({X2, Xm}) → Pick max at f({X2, X4})
○ Case 2: pvw = 1 / deg(w)
Evaluate f({X2, X4, X1}), f({X2, X4, X3}), …, f({X2, X4, Xm}) → Pick max at f({X2, X4, X1})
(Less friend → More influence on each friend)

- Approx guarantee: f(S) ≥ (1 - 1/e)*Optimal

- Compare vertex-cover approx with 3 common heuristics:
Claims hold for f() with 2 properties:
○ Degree centrality (Choose k peo with highest centrality as
 f monotone (activating more nodes doesn't hurt):
initial active set)
f({}) = 0
○ Distance centrality
S ⊆ T ⇒ f(S) ≤ f(T)
○ Random nodes
 f submodular (activating additional node help less):
S ⊆ T ⇒ f(S ∪ {u}) - f(S) ≥ f(T ∪ {u}) - f(T)
- Case 1 result: Degree & distance centrality not perform well
(Most central nodes can belong to same cluster)

(Diminishing return)

COMP 4641 Page 16

Network with Signed Edges
Tuesday, May 16, 2017 2:42 AM

Theory of Structural Balance: Edge Sign Prediction

Theory of Structural Balance - 2 EQUIVALENT view of balance in non-complete graph:

Local Global
Fill in missing edges → Balance Divide graph into 2 coalitions
- Graph:
○ Undirected
○ Edge: (+) = Friend, (-) = Enemy

Wikipedia: (+) = A support B to become admin

Epinions: (+) = A trust B's product review

- Rule: Intuitive:
○ Friend's friend = Friend
○ Enemy's enemy = Friend
○ Friend's enemy = Enemy
- Theory: Graph is balanced
- Triangle unit: ⇔ No cycle with odd # (-) edges

Balanced Unbalanced - Balanced-checking algorithm:

○ 1: Find connected component based on (+) edge
Consistent with rules Inconsistent with rules
→ Exactly 1 or 3 (+) edges

- Balanced, complete graph: Every triangles is balanced

2 senarios:
 All edge (+)
 Nodes split into 2 sets: Only (-) edge between ○ 2: For each component: If ∃ (-) edges inside → Unbalanced

Global Structure of Signed Networks

○ 3: Regard connected components as SUPERnodes in new SUPERgraph

- (A, B)-Embeddedness = # shared neighbors (AT MOST 1 edge between any pair of supernodes)

- Question: How network structure interact with link:

- Observation (from Wikipedia, Epinions):

○ High embeddness → Significantly more likely (+) link
(Why: User with more common neighbor have greater
implicit pressure to remain (+))

○ (+) ties tend to be closed together

○ 4:
 Run BFS (from any node) → Form BFS tree:
 If ∃ edge connecting 2 nodes in same layer → Unbalanced

Theory of status: Story of soccer team

For each node X, ask how does skill of B compare to yours:

COMP 4641 Page 17

Directed graph

A give positive review to B

B has HIGHER status than A

A give negative review to B

B has LOWER status than A

- Question: Want to predict A → B based on X-A, X-B

NOTE: X-A, X-B can be of any direction, any sign (2 × 2
× 2 × 2 = 16 possible context)

- Measurement:
○ Baseline: User differ in fraction of (+) they give/receive:
 Generative baseline: Fraction of (+) given by U (of all
(+) given out)
 Receptive baseline: Fraction of (+) received by U (of all
(+) given out)

○ Surprise, in context: HOw

COMP 4641 Page 18

Homophily & Social Influence
Tuesday, May 16, 2017 1:12 AM

Social-Affiliation Network

- Def: Peo link to others similar to them

Agent to social network have characteristics:
 Non-mutable:
race, gender, age, …
 Mutable:
place to live, occupation, …

- Mechanism
Selection Social influence
Def Characteristics drive link Existing link shape peo's
formation mutable characteristics
A & B love dancing A start smoking → B
→ Form link follow - Def: Social network of peo + Affiliation network of peo & foci
Social policy Target characteristics Target "key players", let ○ Foci: Set of activities person participate
implication: family background, them positively influence ○ Affiliation network: Participation of some people in foci
Smoking … rest
prevention - Closure: 2 peo node have common neighbor → Form link

Homophily & Segregation

- Observation: Neighborhoods tend to segregated on race/culture basis

○ Triadic closure: Among peo only

○ Focal closure: Due to Selection

Karate introduce Anna to Daniel

○ Membership closure: Due to Social influence

Anna introduce Bob to Karate

- Schelling's grid model:

○ Rules:
 2 different agents: x, o
 Agents discover < k neighbors are same type → Interest in
moving to new cell
 Each round:
□ Consider unsatisfied agent in some order
□ Move unsatisfied agent to any unoccupied cell to satisfy

○ Result: Surprising relation between micro-behavior & macro-

Weak satisfying preferences for homophily can sufficiently
create complete segregation

○ Conclusion:
 ∃ Solution: No segregation, all agents satisfy
 Individual-based decision/mis-coordination → Segregation

COMP 4641 Page 19

Weak satisfying preferences for homophily can sufficiently
create complete segregation

○ Conclusion:
 ∃ Solution: No segregation, all agents satisfy
 Individual-based decision/mis-coordination → Segregation

COMP 4641 Page 20

Recommendation System
Tuesday, May 16, 2017 1:36 AM

Hybrid Recommendation System

Rating: Utility function

- Augment collaborative filtering with content-based:

- Def: C × S → R ○ New item problem: Use item profile
C = Set of Customers ○ New user problem: Use demographics
S = Set of Items
R: Can be 0 .. 5 stars, … - Weighting: Combine different recommenders' results:
○ Linear combination
- Real problem: ○ Consensus scheme: Treat output of each recommender
○ Gather known rating: as set of votes
 Explicit:
□ Ask peo to rate - Switch between recommenders:
□ Not work well in practice ○ If content-based system cannot recommend with
 Implicit: sufficient confidence, use collaborative filtering
□ Learning rating from user action ○ However, both systems have "new user" problem
□ Not accurate in inferring low rating
- Feature combination:
○ Extrapolate unknown rating: ○ Treat collaborative info (other users' rating) as addition
 Content-based
○ Apply content-based technique
 Collaborative
 Hybrid

Single-method Recommendation System

Content-based Collaborative filtering

Idea Recommend items similar to previous items highly - User-based: For given user:
rated by user • Find similar users whose ratings strongly correlated with him
Movie: Movies with same actors, director, • Recommend items highly rated by these similar users
genre, …
Book: Similar content - Item-based: For given user:
• Look into his rated items
• Compute how similar they are to others items not yet rated
• Select k most similar new items

COMP 4641 Page 21

Detail - Item profile: Set of features - Weight all users according to similarity with active user
s Movie: author, title, action, … a: Active user , u: Another user
Document: Set of "important" word
ra: rb: Rating vector for m items rated by BOTH a & u
- User profile: rx,j: Rating of user x for item j
○ Weighted avg of rated item profiles
○ Weight ∝ Deviation from avg rating 1
𝑟̅ = ⎯⎯ 𝑟 ,
- Prediction heuristic:
u(c, s) = cos(c, s) = c .s / (|c| . |s|)
c: User profile 𝜎 = ⎯⎯ 𝑟 , − 𝑟̅
s: Item profile

- TF.IDF: ∑𝒎
𝒊 𝟏 𝒓𝒂,𝒊 − 𝒓𝒂 𝒓𝒖,𝒊 − 𝒓𝒖
fij = Frequency of term ti in doc di 𝒄𝒐𝒗(𝒓𝒂 , 𝒓𝒖 ) = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
ni = # doc mentioning term ti
N = Total # docs 𝒄𝒐𝒗(𝒓 ,𝒓 )
a,u-similarity: 𝒄𝒂,𝒖 = ⎯⎯⎯⎯⎯⎯⎯
𝒂 𝒖
𝝈 𝝈 𝒓𝒂 𝒓𝒃

𝑻𝑭𝒊𝒋 = ⎯⎯⎯⎯⎯⎯⎯
𝐦𝐚𝐱 𝒇𝒌𝒋 Significance Weighting: Not trust correlation based on very few co-rated
𝑵 𝒘𝒂,𝒖 = 𝒔𝒂,𝒖 𝒄𝒂,𝒖
𝑰𝑫𝑭𝒊 = 𝐥𝐨𝐠 ⎯⎯
𝒏𝒊 𝟏 𝒊𝒇 𝒎 > 𝟓𝟎
𝒔𝒂,𝒖 = 𝒎
⎯⎯⎯ 𝒊𝒇 𝒎 ≤ 𝟓𝟎
TF.IDF score: 𝒘𝒊𝒋 = 𝑻𝑭𝒊𝒋 × 𝑰𝑫𝑭𝒊

- Select subset of users to serve as predictors:

Doc profile = Set of words with highest
TF.IDF score + Their scores • Method 1: Choose n users with largest wa,u
• Method 2: Choose users with wa,u > Threshold

- Rating prediction:
∑𝒏𝒖 𝟏 𝒘𝒂,𝒖 𝒓𝒖,𝒊 − 𝒓𝒖
𝒑𝒂,𝒊 = 𝒓𝒂 + ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
∑𝒏𝒖 𝟏 𝒘𝒂,𝒖

n selected neighbor users u = 1, 2, …, n

𝑟̅ : Avg rating of user a

- Present items with highest predicted rating as recommendations

Advan - No need for data on other users No feature selection needed
tages - No "cold-start"/sparsity problems

- Can recommend to users with unique tastes

- Can recommend new & unpopular items

- Can explain to user why item recommended by

showing features enabling it to recommended
Disadv ○ Define features - Cold start: New user/items: Few ratings → Can't make accurate recommendation
s ○ Users' tastes must represented as learnable - Prediction based on nearest neighbor algorithm may inaccurate
function of these features
- Scalability:
○ Can't exploit other users' quality judgment  Nearest neighbor computation grow with # user + # items
 Solution: Isolate neighborhood generation & prediction
○ Neighborhood generation: Offline
○ Prediction: Online

- Popularity bias:
 Tend to recommend popular item
 Can't recommend to user with unique tastes

COMP 4641 Page 22

You might also like