Professional Documents
Culture Documents
Seminar of
Shashidhar G
M.S. Seminar
Graph Algorithms
Graph Algorithms : Shortest Path, Triangle Counting, Community
detection, Chemical reaction simulation, PageRank etc.
Shortest Path Algorithm
d i s t [ 1 . . . . m] =
dist [0] = 0
For a l l n o d e s V[0...x] i n w o r k l i s t {
For a l l n e i g h b o r s n o f Vi {
i f ( d i s t [ n ] > d i s t [ Vi ]+ l e n ( Vi , n ) ) {
d i s t [ n ] = d i s t [ Vi ]+ l e n ( Vi , n )
add n t o w o r k l i s t
}
}
}
Graph Algorithms
Graph Algorithms : Shortest Path, Triangle Counting, Community
detection, Chemical reaction simulation, PageRank etc.
Shortest Path Algorithm
d i s t [ 1 . . . . m] =
dist [0] = 0
For a l l n o d e s V[0...x] i n w o r k l i s t {
For a l l n e i g h b o r s n o f Vi {
i f ( d i s t [ n ] > d i s t [ Vi ]+ l e n ( Vi , n ) ) {
d i s t [ n ] = d i s t [ Vi ]+ l e n ( Vi , n )
add n t o w o r k l i s t
}
}
}
Graph Algorithms
Graph Algorithms : Shortest Path, Triangle Counting, Community
detection, Chemical reaction simulation, PageRank etc.
Shortest Path Algorithm
d i s t [ 1 . . . . m] =
dist [0] = 0
For a l l n o d e s V[0...x] i n w o r k l i s t {
For a l l n e i g h b o r s n o f Vi {
i f ( d i s t [ n ] > d i s t [ Vi ]+ l e n ( Vi , n ) ) {
d i s t [ n ] = d i s t [ Vi ]+ l e n ( Vi , n )
add n t o w o r k l i s t
}
}
}
Graph Algorithms
Graph Algorithms : Shortest Path, Triangle Counting, Community
detection, Chemical reaction simulation, PageRank etc.
Shortest Path Algorithm
d i s t [ 1 . . . . m] =
dist [0] = 0
For a l l n o d e s V[0...x] i n w o r k l i s t {
For a l l n e i g h b o r s n o f Vi {
i f ( d i s t [ n ] > d i s t [ Vi ]+ l e n ( Vi , n ) ) {
d i s t [ n ] = d i s t [ Vi ]+ l e n ( Vi , n )
add n t o w o r k l i s t
}
}
}
Graph Algorithms
Graph Algorithms : Shortest Path, Triangle Counting, Community
detection, Chemical reaction simulation, PageRank etc.
Shortest Path Algorithm
d i s t [ 1 . . . . m] =
dist [0] = 0
For a l l n o d e s V[0...x] i n w o r k l i s t {
For a l l n e i g h b o r s n o f Vi {
i f ( d i s t [ n ] > d i s t [ Vi ]+ l e n ( Vi , n ) ) {
d i s t [ n ] = d i s t [ Vi ]+ l e n ( Vi , n )
add n t o w o r k l i s t
}
}
}
Graph Algorithms
Graph Algorithms : Shortest Path, Triangle Counting, Community
detection, Chemical reaction simulation, PageRank etc.
Shortest Path Algorithm
d i s t [ 1 . . . . m] =
dist [0] = 0
For a l l n o d e s V[0...x] i n w o r k l i s t {
For a l l n e i g h b o r s n o f Vi {
Critical Section for n
i f ( d i s t [ n ] > d i s t [ Vi ]+ l e n ( Vi , n ) ) {
d i s t [ n ] = d i s t [ Vi ]+ l e n ( Vi , n )
add n t o w o r k l i s t
}
Critical Section Ends for n
}
}
Irregularity : No regular
pattern in:
Work distribution
Memory accesses
Control flow
Communication
Cannot be predicted at
compile time.
Scalability :
Millions of Nodes, Edges.
Irregularity : No regular
pattern in:
Work distribution
Memory accesses
Control flow
Communication
Cannot be predicted at
compile time.
Scalability :
Millions of Nodes, Edges.
1
http://www.fmsasg.com/socialnetworkanalysis/
Shashidhar G LightHouse March 22, 2017 3 / 22
Parallelization of Graph Algorithms
Related Work
Motivation
Motivation
Motivation
Outline
Green-Marl: A Domain Specific Language
Front-end Compilation of Green-Marl
Contribution of LightHouse
Back-end code generation for GPUs(CUDA).
GPU code Optimization.
1 Eliminate Atomics in Boolean Reduction.
2 Loop Collapsing.
Experimental results and Conclusion.
1 Procedure T e s t (G : Graph ,
2 A : N_P<I n t >, r o o t : Node ) {
3
4 N_P<I n t > B ;
5 Int rootValue ;
6 Foreach ( n : G . Nodes )
7 Foreach ( s : n . Nbrs )
8 n .B = n .A + s .A;
9 rootValue = root .B;
10 }
1 Procedure T e s t (G : Graph ,
2 A : N_P<I n t >, r o o t : Node ) {
3
4 N_P<I n t > B ;
5 Int rootValue ;
6 Foreach ( n : G . Nodes )
7 Foreach ( s : n . Nbrs )
8 n .B = n .A + s .A;
9 rootValue = root .B;
10 }
1 Foreach ( n : G . Nodes ) {
2 sum += n . B ;
3 v a l <maxNode> max= n . B<n> ;
4 }
1 Node_Prop<I n t > A ;
2 Foreach ( n : G . Nodes )
3 Foreach ( t : n . Nbrs )
4 t .A = n .A;
2 Scope of Variables.
GPU_T = 0;
K e r n e l C a l l <<<LaunchPara >>>(C , R , A) ;
GPUMemCpy( from , GPU_from , D e v i c e T o H o s t ) ;
GPUMemCpy( to , GPU_to , D e v i c e T o H o s t ) ;
Normal Reductions converted to K e r n e l C a l l (C , R , A) {
atomic instructions. ...
localMax = 0;
expr = s .A + t .A;
argmin/argmax reductions are atomicMax(&GPU_T, expr);
if(localMax < expr) {
handled different. localMax = expr;
localFrom = s;
localTo = t;
Int T = 0; }
Node s r c , d s t ; SoftwareBarrier();
i f ( l o c a l M a x == GPU_T)
F o r e a c h ( s : G . Nodes ) chooseThread = threadID ;
F o r e a c h ( t : s . Nbrs ) SoftwareBarrier();
T<from,to> max= s . A + t . A<s,t> ; i f ( c h o o s e T h r e a d == t h r e a d I D ) {
GPU_from = l o c a l F r o m ;
GPU_to = l o c a l t o ;
}
}
GPU Optimizations
1 Boolean Value Reduction: Eliminates atomics.
GPU Optimizations
1 Boolean Value Reduction: Eliminates atomics.
Experiments
Bipartite Matching : Less conflicts across threads, load-balanced
task, Data Parallel.
Conductance : Lot of atomics, thread divergent code.
PageRank : Floating point operations.
SSSP : Atomic instructions.
Experiments
Bipartite Matching : Less conflicts across threads, load-balanced
task, Data Parallel.
Conductance : Lot of atomics, thread divergent code.
PageRank : Floating point operations.
SSSP : Atomic instructions.
Experiments
OMP-1T OMP-Fast CUDA CUDA-OPT OMP-1T OMP-Fast CUDA CUDA-OPT
60 10
50
8
7
40
6
Speed up
Speed up
30 5
20
3
2
10
0 0
Ep
Li
Po
Ep
Li
Po
U
rk
SA
rk
SA
ve
ve
ke
ke
in
in
ut
ut
Jo
Jo
io
io
c
c
ns
ns
u
u
rn
rn
al
al
(a) Matching (b) Conductance
Shashidhar G LightHouse March 22, 2017 17 / 22
Experiments
Experiments
OMP-1T OMP-Fast CUDA CUDA-OPT
14
OMP-1T OMP-Fast CUDA CUDA-OPT Ligra
25
12
20
10
15
8
Speed up
Speed up
6
10
Ep
Li
Po
U
rk
SA
ve
ke
0
in
ut
J
io
c
ou
ns
Ep
Li
Po
r
na
rk
SA
ve
ke
in
l
ut
Jo
io
c
ns
urn
(b) PageRank-Propagate
al
(a) PageRank-Gather
Shashidhar G LightHouse March 22, 2017 18 / 22
Experiments
Experiments
OMP-1T OMP-Fast CUDA CUDA-OPT LoneStarGPU Totem-OPT Ligra
60
50
40
Speed up
30
20
10
0
Ep
Li
Po
U
rk
SA
ve
ke
in
ut
Jo
io
c
ns
ur
n
al
(a) SSSP
Shashidhar G LightHouse March 22, 2017 19 / 22
Experiments
Future work
Conclusion
Conclusion
Thank You
Acknowledgements
Compute Work
GPU
Distribution
Geometry Controller Geometry Controller Geometry Controller Geometry Controller Geometry Controller Geometry Controller Geometry Controller Geometry Controller
SMC SMC SMC SMC SMC SMC SMC SMC
I-Cache I-Cache I-Cache I-Cache I-Cache I-Cache I-Cache I-Cache I-Cache I-Cache I-Cache I-Cache I-Cache I-Cache I-Cache I-Cache
MT Issue MT Issue MT Issue MT Issue MT Issue MT Issue MT Issue MT Issue MT Issue MT Issue MT Issue MT Issue MT Issue MT Issue MT Issue MT Issue
C-Cache C-Cache C-Cache C-Cache C-Cache C-Cache C-Cache C-Cache C-Cache C-Cache C-Cache C-Cache C-Cache C-Cache C-Cache C-Cache
SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP
SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU SFU
Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared Shared
Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory Memory
Texture Unit Texture Unit Texture Unit Texture Unit Texture Unit Texture Unit Texture Unit Texture Unit
Tex L1 Tex L1 Tex L1 Tex L1 Tex L1 Tex L1 Tex L1 Tex L1
Interconnection Network
SM Multithreaded Multiprocessor
GPU Architecture
References