Professional Documents
Culture Documents
Input
Secondary Storages
CPU Kernel
Secondary
Edges
Storages
Memory
Dispatcher
Edges
Vertices
GPU Kernel
Activate
Streaming Multiprocessor
Edges
Edges
Host Memory
L1 Cache/
PCIe Shared
Device
Memory
Apply
Memory
Init
Vertices Sync Vertices Vertices
Apply
Shared Memory
Global Memory
Global Memory
- Conflicts
- Linear penalty
- Intra-warp >> Inter-warp
- Customized replication
- O(N) -> O(logN), N≤32
- Modeling: balance profits and costs Aggregation
-
Shared Memory
GlobalVertices
2. Apply
Thread 0 Thread 1 Thread … Thread p-1
Aggregation
1. Gather
0 1 2 … r0 r0 … r1 ……
LocalVertices
r_{p-1} … n
Edges
Rep Rep …… Rep
- CPU
- Pros: thread sequential processing
- Suit: pull/notify-pull dual-mode processing
- GPU
- Pros: SIMD parallel processing
- Suit: replication-based gather processing (only pull)
1.50
Runtime(s)
1.00
0.50
0.00
uk-2007-05@1M uk-2014-host enwiki-2013
CuSha Ligra Gemini Garaph-C Garaph-G Garaph-H
4.05x
Runtime(s)
50.00
40.00
30.00
20.00
10.00
0.00
gsh-2015-tpd twitter-2010 sk-2005 renren-2010
CuSha Ligra Gemini Garaph-C Garaph-G Garaph-H
1.00
Runtime(s)
0.50
0.00
uk-2007-05@1M uk-2014-host enwiki-2013
CuSha Ligra Gemini Garaph-C Garaph-G Garaph-H
250.00
Runtime(s)
150.00
- GPU is much slower than CPU without 100.00
activation scheme 50.00
32.15x