Professional Documents
Culture Documents
- Idea:
○ Tradic closure: 2 people have common strong tie → Must have tie
between each other
○ Bridge:
Link that is only path between 2 users
→ Must be weak ties (according to tradic closure)
However, help connect community:
□ Must necessarily carry new info
□ Help map micro → macro
⇒ Strength of weak ties
○ Universality: Networks from science, nature & tech more similar than
expected
○ Shared vocabulary between fields
- Experiment:
○ Select 296 peo in Nebraska & Boston
○ Mail packet to them:
Specific destination person
Ask to forward to someone know personally
Leveraging Social Media: Predicting Future with Social Media
- Question: How long are successful path (Sitaram A, Bernardo H)
- Findings:
- Question: Can we extract info from conversation on social media networks?
(Collective wisdom)
- Method:
○ Collect data over 90 milion doc
○ Quote clustering on graph
Graph structure: Node = Quote
Edge (weighted) = Inclusion relation
Remove all but strongest outgoing edge
- Findings:
○ Nature of 24-hour news cycle: Memes quickly enter & leave
collective conscience
○ Media access peak 2.5 hours before blog access peak
○ Blog access volume persists much longer
Network Graph ○ Deg distr: Prob randomly chosen node has deg k
𝑵𝒌
- Refer to real sys - Math representation of network 𝑷(𝒌) = ⎯⎯⎯
𝑵
- Terminology: node, link - Terminology: vertex, edge Nk: # nodes having deg k
N: Total # nodes
Usually network & graph used interchangeably
- Distance/Path length huv: # edges along shortest path connecting 2 nodes
○ huu = 0
huv = ∞ if no path u → v
Small-world phenomenon
Real network Simple Graph
Giant Exist Exist
Typical length of shortest path usually small connected (NOT emerge through (emerge through phase
Small-world experiment [Milgram 1967): 6 deg of component phase transition) transition)
separation
Avg path length Small Small
Clustering coef Big Small
(no local structure)
Simple Graph Model Gnp Deg distr Power law Binomial
(Erdos-Renyi, 1960)
- Def:
○ Undirected, n nodes
○ Each edge (u, v) appear with prob p
𝜎 1−𝑝 1
⎯⎯= ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝑘 𝑝 𝑛−1
(n (Graph size) ↑ → ⎯ ↓ → 𝜎 ≪ 𝑘 → Distr narrower → ↑
Confidence that node deg around 𝒌)
- Construction:
○ Initialization: Low-dim regular lattice
Each node connected to α nearest neighbor
(high clustering coef, high diameter)
○ Rewiring:
(1) For each node, clockwise around ring:
□ Select edge connecting it to its nearest neighbor
□ With prob p, reconnect this edge to node chosen uniformly
over entire ring
(2) Repeat (1) for 2nd, 3rd, …, αrd nearest neighbor
- Data:
○ Cell-phone network of 20% country's population
Basic concepts
○ Edge strength: Aggregated call duration
- Local bridge: ○ High → Low: Network gradually shrink, but not collapse
○ Edges whose endpoints have no common neighbor
○ Exclude bridge - Link removal by edge overlap (Low overlap = weak link)
- Edge overlap:
|𝑵𝒆𝒊𝒈𝒉𝒃𝒐𝒓 𝒐𝒇 𝒊 𝒂𝒏𝒅 𝒋| |𝑵(𝒊) ∩ 𝑵(𝒋)|
𝑶𝒊𝒋 = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
|𝑵𝒆𝒊𝒈𝒉𝒃𝒐𝒓 𝒐𝒇 𝒊 𝒐𝒓 𝒋| |𝑵(𝒊) ∪ 𝑵(𝒋)|
N(i): Set of i's neighbor (excluding j)
- Conclusion:
○ Weak ties crucial for maintaining network's structural integrity
○ Strong ties important for maintaining local communities
Structural Hole
𝟐
1 2 3 4 5
1 0 1/4 1/4 1/4 1/4
2 1/2 0 0 0 1/2
3 1 0 0 0 0
4 1/2 0 0 0 1/2
5 1/3 1/3 0 1/3 0
(Assumption: All link equally important)
1 1 1 1 1 1
𝑐 = (𝑝 + 𝑝 𝑝 ) + (𝑝 + 𝑝 𝑝 ) = ⎯⎯+ ⎯⎯× ⎯⎯ + ⎯⎯+ ⎯⎯× ⎯⎯
2 2 4 2 2 3
1 1 1 1 1 1
𝑐 = (𝑝 + 𝑝 𝑝 ) + (𝑝 + 𝑝 𝑝 ) = ⎯⎯+ ⎯⎯× ⎯⎯ + ⎯⎯+ ⎯⎯× ⎯⎯
2 2 4 2 2 3
Girvan-Newman algorithm
Edge betweeness
- Step: Repeat until no edges left
○ Calculate edge betweeness
○ Remove edges with highness betweeness (may remove ≥ 2 edges)
- # shortest path passing over particular edge
○ Connected component are communities
- Computation:
○ Initialize betweeness(u, v) = 0 ∀u, v
Initialize nodeFlow(v) = 0 ∀v ≠ u
Go upward from lowest node on BFS tree:
□ nodeFlow(v) += 1
□ For each node u higher than v on BFS tree & edge (u, v)
exist:
Δ = f(u) / f(v) * nodeFlow(v)
betweeness(u, v) += Δ
After step 1: [1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14]
nodeFlow(u) += Δ
After step 2: [1, 2, 3], [4, 5, 6], [7], [8], [9, 10, 11], [12], [13], [14]
After step 3: [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14]
○ Divide all betweeness() by 2
Network Community
𝟏 𝒌𝒊 𝒌𝒋
𝑸 = ⎯⎯⎯ 𝜹𝒊𝒋 𝑨𝒊𝒋 − ⎯⎯⎯⎯
𝟐𝒎 𝟐𝒎
𝒊,𝒋
m: Total # edges
δij =1, if i & j assigned to same community
0, otherwise
Aij = 1, if edge (i, j) exist
0, otherwise
Normalization:
𝑪𝑩 (𝒖)
𝑪𝑩 (𝒖) = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
(𝑵 − 𝟏)(𝑵 − 𝟐)
Eigenvector Centrality
- Calculation: Initially 𝒙𝒊 = 𝟏 ∀𝑖
∑ 𝒖 𝑪∗ (𝒏) − 𝑪(𝒖) Obtain better estimation: 𝒙𝒊 = ∑ 𝒋 𝑨𝒊𝒋 𝒙𝒋
𝑪𝑫 = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
(𝑵 − 𝟏)(𝑵 − 𝟐) Matrix form: 𝒙⃗ = 𝑨𝒙⃗
Implication:
𝒙⃗(𝒕) ∝ 𝒗𝟏⃗
𝒗𝟏⃗ is eigenvector of M = A + I
⇒ Can regard 𝒗𝟏⃗ as eigenvector centrality (as we only consider
relative difference among entries of 𝒙⃗(𝒕))
𝑍 𝑍
= ⎯⎯⎯⎯⎯⎯ [𝑥 ] = − ⎯⎯⎯⎯⎯[∞ −𝑥 ]
−𝛼 + 1 𝛼−1
𝜶
𝜶−𝟏 𝒙
𝒑(𝒙) = ⎯⎯⎯⎯⎯ ⎯⎯⎯
- Network with deg distr's tail in power law form 𝒙𝒎 𝒙𝒎
- Variance:
𝑉𝑎𝑟(𝑋) = 𝐸 𝑋 − [𝐸(𝑋)]
Estimation of Power-law Exponent α
𝐸 𝑋 = 𝑥 𝑝(𝑥)𝑑𝑥 = 𝑍 𝑥 𝑑𝑥
𝑍 (𝛼 − 1)𝑥
= ⎯⎯⎯⎯⎯⎯ [𝑥 ] = − ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯[∞ −𝑥 ]
−𝛼 + 3 𝛼−3
- Method 1: Fit line on log-log graph using least squares
Assume α > 3 ⇒ ∞3-α = 0
𝛼−1
⇒ 𝐸 𝑋 = ⎯⎯⎯⎯⎯𝑥
𝛼−3
𝟐
𝜶−𝟏 𝜶−𝟏
⇒ 𝑽𝒂𝒓(𝑿) = ⎯⎯⎯⎯⎯𝒙𝟐𝒎 − ⎯⎯⎯⎯⎯𝒙𝒎
𝜶−𝟑 𝜶−𝟐
- Exact model:
○ Graph formation:
Nodes arrive in order 1, 2, .., n
New node j create ONLY 1 out link (by doing EITHER (1) or (2))
□ (1) With prob p, j link to i chosen randomly, uniformly from
previous node
𝑷(𝑿 ≥ 𝒙) = 𝑝(𝑗) ≈ 𝑍𝑗 𝑑𝑗 ((2) ⇔ With prob (1 - p), j link to u with prob ∝ indeg(u))
𝑍 𝒁 (𝜶 𝟏) 𝟏
= ⎯⎯⎯⎯⎯⎯ [𝑗 ] = ⎯⎯⎯⎯⎯𝒙 𝟏 ⎯⎯⎯
−𝛼 + 1 𝜶−𝟏 ○ 𝑷(𝒌) ∝ 𝒌 𝟏 𝒑
𝟏
𝜶 = 𝟏 + ⎯⎯⎯⎯⎯
𝟏−𝒑
𝒚 = 𝑷(𝑿 ≥ 𝒙)
𝒁 ○ Behavior:
⇒ 𝐥𝐨𝐠(𝒚) = 𝐥𝐨𝐠 ⎯⎯⎯⎯⎯ − (𝜶 − 𝟏 ) 𝐥𝐨𝐠(𝒙)
𝜶−𝟏 p → 1:
Gradient = -(α - 1) □ Link formation mainly based on uniform random choices
□ α→∞
Better estimation: Aggregate N(k) → ΣN(k) big for large k □ Few nodes with large indeg
p → 0:
□ Growth of network strongly governed by "rich-get-richer" behavior
□ α→2
□ Many nodes with large indeg
Bio: Diseases
Social: Viral marketing
Game-theoretic Model of Cascade - Monotonic spreading: Node only switch A → B, NEVER back B → A
Proof (by contradiction):
Let u = First node switching B → A at time t
𝑏
- Rules: ⇒ 𝑝(𝑡) ≤ ⎯⎯⎯⎯⎯
𝑏+𝑎
○ Each node: Can choose only 1 of 2 actions A/B
○ Adj pair's payoff matrix:
At time t' < t: u switch A → B
2 A: Each payoff = a > 0 𝑏
2 B: Each payoff = b > 0 ⇒ 𝑝(𝑡 ) > ⎯⎯⎯⎯⎯≥ 𝑝(𝑡)
𝑏+𝑎
1 A + 1 B: Each payoff = 0 ⇒ During time t' → t: # u's neighbor using A ↓
⇒ Impossible, because u is the first node switching B → A
- Single node decision process:
- Cascade capacity: Max q → ∃ finite set S can cause cascade
○ Capacity ↑ → Cascade more easily
○ ∀ graph G: Cascade capacity ≤ 1/2
- Stopping cascade:
○ Cluster C with density ρ: All C's nodes has ≥ ρ fraction of edges in C
Node v
d neighbors: pd already use A,
(1 - p)d already use B
(0 ≤ p ≤ 1) ○ Stopping condition:
S: Initial set of A's adopter
v's payoff = apd if v use A G\S contains cluster with density > (1 - q)
= b(1 - p)d if v use B ⇔ S cannot cause cascade
v use A if:
𝑎𝑝𝑑 > 𝑏(1 − 𝑝)𝑑
𝒃
⇔ 𝒑 > 𝒒 = ⎯⎯⎯⎯⎯
𝒂+𝒃
Interpretation example:
Game-theoretic Model: Cascade in Infinite Graph
If > 50% of my friends take A, I'll also take A
𝒃 𝟏 (each node has FINITE neighbors)
⇔ 𝒒 = ⎯⎯⎯⎯⎯ ≥ ⎯⎯⇔ 𝑏 ≥ 𝑎
𝒂+𝒃 𝟐
- Rules:
○ Each node: Can use both A & B
○ Adj pair's payoff matrix:
AB - A: Each payoff = a
AB - B: Each payoff = b
AB - AB: Each payoff = max(a, b) - c
(c = Dual-maintenance cost)
Consider node w
Consider node w
○ Case 1:
Payoff = a if w choose A
=1 if w choose B
= a + 1 - c if w choose AB
○ Case 2:
Payoff = a if w choose A
=1+1=2 if w choose B - Present condition:
=a+1-c if w choose AB ○ Default B
○ Better A comes
- Future scenarios:
○ Infiltration (B → AB → A):
A & B too compatible with each other
People first use both, but gradually drop B
○ Susceptible: No disease
○ Infectious: Get disease, can attack Susceptible
○ Recovered: Healed, no more infected
- Model development:
Pop N = S + I + R
P(Contact with S-node) = S / N
P(S → I) = p
- Model:
○ Directed finite graph G = (V, E)
○ S: Initial set of "active" nodes
○ Edge (v, w):
Pvw = Prob of node v, if active, also make neighbor w
active Virus' Spreading: SIS Model
v only have 1 chance to make w
- Model development:
𝒅𝑰 𝒅𝑺
⎯⎯⎯= 𝜷𝑺𝑰 − 𝜹𝑰 , ⎯⎯⎯= −𝜷𝑺𝑰 + 𝜹𝐼
𝒅𝒕 𝒅𝒕
- Observation:
○ Case 1: t → ∞, I → 0
○ Case 2: Disease remains infinitely
- Model:
- Model:
○ States:
Exposure: Node exposed to contagion by neighbors
Adoption: Node act on contagion
- Development:
○ User X:
Not yet mentioned H
But k neighbors already
- Observation in viral marketing: ○ p(k) = P(X mention H before (k+1)th neighbors do so)
Too much marketing incentives ⇔ p(k) = Fraction of users adopting H after kth exposure
→ ↑ Recommendation from neighbor (in social network)
→ ↓ Effectiveness ○ Stickiness of H = max(p(k))
Book:
Incentives:
□ Product recommender: 10%
□ First recommendee purchasing same item: 10% discount
Effectiveness: p(Buy item) = f(# Recommendation received)
□ # ≤ 3: p ≈ const
□ # > 3: p ↓
- Sources of exposures:
○ Internal: From inside network
User see URLs posted by friends
Twitter case:
○ Trace emergence of URL, label each URL by its topic:
○ Findings:
Topic's Max(p(k)):
P(Retweet | Art, edu article) < P(Retweet | Entertainment)
k at Max(p(k)):
World news reach max infectious earliest → More time sensitive
- Def:
Given: U = {u1, …, un}
X1, …, Xm ⊆ U
- Formulation: Check: ∃ k sets: Xi1 ∩ Xi2 ∩ … Xik = U
S: Initial active set
- Approach: Influence Maximization in bipartite X-U graph
f(S): Expected size of final active set
"Expected" = Random process
1
𝑓(𝑆) = ⎯⎯ 𝑓 (𝑆)
𝑛
fi(S): ith realization of S
- Approximation algorithm:
Simulation Experiment: Collaboration network
S0 = {}
for i = 1 … m:
Choose remaining Xj → Max f(Si ∪ {Xj}}
- Data: Co-authorship in papers of arXiv high-energy physics theory
Si = Si ∪ {Xj}
(Diminishing return)
- Rule: Intuitive:
○ Friend's friend = Friend
○ Enemy's enemy = Friend
○ Friend's enemy = Enemy
- Theory: Graph is balanced
- Triangle unit: ⇔ No cycle with odd # (-) edges
○ 4:
Run BFS (from any node) → Form BFS tree:
If ∃ edge connecting 2 nodes in same layer → Unbalanced
- Measurement:
○ Baseline: User differ in fraction of (+) they give/receive:
Generative baseline: Fraction of (+) given by U (of all
(+) given out)
Receptive baseline: Fraction of (+) received by U (of all
(+) given out)
Social-Affiliation Network
Homophily
- Mechanism
Selection Social influence
Def Characteristics drive link Existing link shape peo's
formation mutable characteristics
A & B love dancing A start smoking → B
→ Form link follow - Def: Social network of peo + Affiliation network of peo & foci
Social policy Target characteristics Target "key players", let ○ Foci: Set of activities person participate
implication: family background, them positively influence ○ Affiliation network: Participation of some people in foci
Smoking … rest
prevention - Closure: 2 peo node have common neighbor → Form link
case
○ Conclusion:
∃ Solution: No segregation, all agents satisfy
Individual-based decision/mis-coordination → Segregation
○ Conclusion:
∃ Solution: No segregation, all agents satisfy
Individual-based decision/mis-coordination → Segregation
- TF.IDF: ∑𝒎
𝒊 𝟏 𝒓𝒂,𝒊 − 𝒓𝒂 𝒓𝒖,𝒊 − 𝒓𝒖
fij = Frequency of term ti in doc di 𝒄𝒐𝒗(𝒓𝒂 , 𝒓𝒖 ) = ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
𝒎
ni = # doc mentioning term ti
N = Total # docs 𝒄𝒐𝒗(𝒓 ,𝒓 )
a,u-similarity: 𝒄𝒂,𝒖 = ⎯⎯⎯⎯⎯⎯⎯
𝒂 𝒖
𝝈 𝝈 𝒓𝒂 𝒓𝒃
𝒇𝒊𝒋
𝑻𝑭𝒊𝒋 = ⎯⎯⎯⎯⎯⎯⎯
𝐦𝐚𝐱 𝒇𝒌𝒋 Significance Weighting: Not trust correlation based on very few co-rated
𝒌
items
𝑵 𝒘𝒂,𝒖 = 𝒔𝒂,𝒖 𝒄𝒂,𝒖
𝑰𝑫𝑭𝒊 = 𝐥𝐨𝐠 ⎯⎯
𝒏𝒊 𝟏 𝒊𝒇 𝒎 > 𝟓𝟎
𝒔𝒂,𝒖 = 𝒎
⎯⎯⎯ 𝒊𝒇 𝒎 ≤ 𝟓𝟎
𝟓𝟎
TF.IDF score: 𝒘𝒊𝒋 = 𝑻𝑭𝒊𝒋 × 𝑰𝑫𝑭𝒊
- Rating prediction:
∑𝒏𝒖 𝟏 𝒘𝒂,𝒖 𝒓𝒖,𝒊 − 𝒓𝒖
𝒑𝒂,𝒊 = 𝒓𝒂 + ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
∑𝒏𝒖 𝟏 𝒘𝒂,𝒖
- Popularity bias:
Tend to recommend popular item
Can't recommend to user with unique tastes