Professional Documents
Culture Documents
Christian Terboven
terboven@rz.rwth-aachen.de
Center for Computing and Communication
RWTH Aachen University
PPCES 2010
March 24th, RWTH Aachen University
24.03.2010 C. Terboven
Agenda
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
o OpenMP is a parallel programming model for SharedMemory machines. That is, all threads have access to a
shared main memory. In addition to that, each thread
may have private data.
o The parallelism has to be expressed explicitly by the
programmer. The base construct is a Parallel Region:
A Team of threads is provided by the runtime system.
o Using the available Worksharing constructs, the work can be
distributed among the threads of a team, influencing the
scheduling is possible.
3
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Tools for
OpenMP
OpenMP 3.0
& Tasks
Fibonacci
w/ Tasks
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
flush
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Fibonacci
w/ Tasks
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
C/C++
#pragma omp flush [(list)]
Tools for
OpenMP
OpenMP 3.0
& Tasks
Fibonacci
w/ Tasks
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Book recommendation
9
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Agenda
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
Race Condition
24.03.2010 C. Terboven
11
Tools for
OpenMP
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
12
}
}
/* end of parallel region */
printf(error: %f, double);
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
14
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
o i
o resid
o error
15
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
}
/* end of parallel region */
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
17
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
Our advice
24.03.2010 C. Terboven
18
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Agenda
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
20
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
21
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
unbounded loops
recursive algorithms
Producer / Consumer patterns
and more
22
Code to execute
Data environment
Internal control variables (ICV)
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
23
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
C/C++
#pragma omp task [clause [[,] clause] ... ]
... structured block ...
Schedule clauses:
untied
shared(list)
private(list)
24
Other clauses:
firstprivate(list)
if(expr)
default(shared | none)
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
25
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
26
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
27
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
o Default: Tasks are tied to the thread that first executes them
not neccessarily the creator. Scheduling constraints:
Only the Thread a Task is tied to can execute it
A Task can only be suspended at a suspend point
Task creation, Task finish, taskwait, barrier
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
30
Repetition
Tools for
OpenMP
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
31
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
32
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
33
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
o Loop collapsing
#pragma omp for collapse(2)
for(i = 1; i < N; i++)
for(j = 1; j < M; j++)
for(k = 1; k < K; k++)
foo(i, j, k);
Iteration space from i-loop and j-loop is
collapsed into a single one, if loops are
perfectly nested and form a rectangular
iteration space.
34
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
35
vector v;
vector::iterator it;
#pragma omp for
for (it = v.begin(); it < v.end(); it++)
foo(it);
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
36
omp_set_num_threads(3);
#pragma omp parallel {
omp_set_num_threads(omp_get_thread_num() + 2);
#pragma omp parallel {
foo();
} }
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Agenda
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
int fib(int n)
{
if (n < 2) return n;
int x = fib(n - 1);
int y = fib(n - 2);
return x+y;
}
39
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
40
int fib(int n)
{
if (n < 2) return n;
int x, y;
#pragma omp task shared(x)
{
x = fib(n - 1);
}
#pragma omp task shared(y)
{
y = fib(n - 2);
}
#pragma omp taskwait
return x+y;
}
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
9
8
7
Speedup
6
5
4
optimal
omp-v1
2
1
0
1
#Threads
41
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
42
int fib(int n)
{
if (n < 2) return n;
int x, y;
#pragma omp task shared(x) \
if(n > 30)
{
x = fib(n - 1);
}
#pragma omp task shared(y) \
if(n > 30)
{
y = fib(n - 2);
}
#pragma omp taskwait
yet return x+y;
}
Tools for
OpenMP
OpenMP 3.0
& Tasks
Fibonacci
w/ Tasks
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
9
8
7
Speedup
6
5
optimal
omp-v1
omp-v2
2
1
0
1
#Threads
43
Tools for
OpenMP
OpenMP 3.0
& Tasks
Fibonacci
w/ Tasks
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
44
int fib(int n)
{
if (n < 2) return n;
if (n <= 30)
return serfib(n);
int x, y;
#pragma omp task shared(x)
{
x = fib(n - 1);
}
#pragma omp task shared(y)
{
y = fib(n - 2);
}
#pragma omp taskwait
return x+y;
OpenMP} overhead once a certain n
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
9
8
7
Speedup
6
5
optimal
omp-v1
omp-v2
omp-v3
1
0
1
#Threads
45
o Everything ok now
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Agenda
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
o Performance Measurements
o Performance Impacts
47
Load Imbalance
Data Locality on cc-NUMA architectures
Memory Bandwidth (consumption per thread)
Cache Effects
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
original building
lattice model
matrix shape
48
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
M
C
49
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
up to 25%
better
1200,000
ca. 1200
mflops
mflops
1000,000
800,000
ca. 850
mflops
600,000
400,000
200,000
0,000
0
threads
50
AMD Opteron
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
T
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
ca. 985
mflops
1400,000
1200,000
mflops
1000,000
800,000
600,000
ca. 660
mflops
400,000
200,000
0,000
0
threads
52
Intel Xeon, scatter
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
AMD Opteron
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
C
C
OpenMP 3.0
& Tasks
Tools for
OpenMP
Example +
Case Study
Repetition
OpenMP &
Architecture
53
C
C
Data
Data
Data
Summary
24.03.2010 C. Terboven
ca. 2000
mflops
2500,000
mflops
2000,000
1500,000
1000,000
ca. 1000
mflops
500,000
0,000
0
threads
54
AMD Opteron
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Intel Xeon
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
55
o If the matrix would be smaller and fit into the cache the
result would look different
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Agenda
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Metric \ Server
57
SF V40z
FSC RX200 S4
Processor Chip
# sockets
# cores
8 (dual-core)
8 (quad-core)
# threads
Accumulated L2 $
8 mb
16 mb
L2 $ Strategy
Shared by 2 cores
Technology
90 nm
45 nm
Peak Performance
35.2 GFLOPS
96 GFLOPS
Dimension
3 units
1 unit
Note: Here we compare machines of different ages which can be seen as unfair!
For example newer Opteron-based machines provide similar settings in 1 unit
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
M
C
C
M
59
C
C
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
M
C
M
C
M
C
C
M
60
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
C
M
C
M
C
M
ccNUMA!
C
M
61
Repetition
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
8 OpenMP
threads:
& 18.470 GB/s
Architecture
Summary
24.03.2010 C. Terboven
FSC RX200 S4
(Xeon)
GFLOPS
2.17
9.34
GFLOPS
1.47
0.91
62
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Agenda
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
24.03.2010 C. Terboven
Tools for
OpenMP
OpenMP 3.0
& Tasks
Example +
Case Study
OpenMP &
Architecture
Summary
The End
65
24.03.2010 C. Terboven