Professional Documents
Culture Documents
Decomposition Techniques
Recursive Decomposition Recursive Decomposition Exploratory Decomposition Hybrid Decomposition
b
n
n-1 Task n
Computation of each element of output vector y is independent of other elements. Based on this, a dense matrix-vector product can be decomposed into n tasks. The gure highlights the portion of the matrix and vector accessed by Task 1.
Observations: While tasks share data (namely, the vector b), they do not have any control dependencies i.e., no task needs to wait for the (partial) completion of any other. All tasks are of the same size in terms of number of operations. Is this the maximum number of tasks we could decompose this problem into?
Model
Civic Civic Civic Civic
ID#
7623 6734 5342 3845 4395
Year
2001 2001 2001 2001 2001
ID# ID#
3476 6734
Color
Green Green Green Green
Color
White White
Civic
2001
White
ID#
6734 4395
Model Year
Civic Civic 2001 2001
White OR Green
ID#
6734
Decomposing the given query into a number of tasks. Edges in this graph denote that the output of one task is needed to accomplish the next.
Model
Civic Civic Civic Civic
ID#
7623 6734 5342 3845 4395
Year
2001 2001 2001 2001 2001
ID# ID#
3476 6734
Color
Green Green Green Green
Color
White White
Civic
2001
White
Green
Color
White Green Green White Green Green
ID#
7623 6734 5342
Color
Green White Green
Year
2001 2001 2001
An alternate decomposition of the given problem into subtasks, along with their data dependencies. Different task decompositions may lead to signicant differences with respect to their eventual parallel performance.
b
n
A coarse grained counterpart to the dense matrix-vector product example. Each task in this example corresponds to the computation of three elements of the result vector.
Degree of Concurrency
The number of tasks that can be executed in parallel is the degree of concurrency of a decomposition. Since the number of tasks that can be executed in parallel may change over program execution, the maximum degree of concurrency is the maximum number of such tasks at any point during execution. What is the maximum degree of concurrency of the database query examples? The average degree of concurrency is the average number of tasks that can be processed in parallel over the execution of the program. Assuming that each tasks in the database example takes identical processing time, what is the average degree of concurrency in each decomposition? The degree of concurrency increases as the decomposition becomes ner in granularity and vice versa.
Task 5
Task 7
Task 7
(a)
(b)
What are the critical path lengths for the two task dependency graphs? If each task takes 10 time units, what is the shortest parallel execution time for each decomposition? How many processors are needed in each case to achieve this minimum parallel execution time? What is the maximum degree of concurrency?
0 1
6 7
8 Task 11 8 9 10 11
(a)
(b)
P0
Task 7
P0
Task 7
(a)
(b)
Mapping tasks in the database query decomposition to processes. These mappings were arrived at by viewing the dependency graph in terms of levels (no two nodes in a level have dependencies). Tasks within a single level are then assigned to different processes.
Decomposition Techniques
So how does one decompose a task into various subtasks? While there is no single recipe that works for all problems, we present a set of commonly used techniques that apply to broad classes of problems. These include: recursive decomposition data decomposition exploratory decomposition speculative decomposition
Recursive Decomposition
Generally suited to problems that are solved using the divideand-conquer strategy. A given problem is rst decomposed into a set of sub-problems. These sub-problems are recursively decomposed further until a desired granularity is reached.
5 12 11 10 6
9 12 11 10
10 12 11
10
11 12
11
12
In this example, once the list has been partitioned around the pivot, each sublist can be processed concurrently (i.e., each sublist represents an independent subtask). This can be repeated recursively.
min(4,1)
min(8,2)
min(4,9)
min(1,7)
min(8,11)
min(2,12)
Data Decomposition
Identify the data on which computations are performed. Partition this data across various tasks. This partitioning induces a decomposition of the problem. Data can be partitioned in various ways this critically impacts performance of a parallel algorithm.
Task 1: C1,1 = A1,1B1,1 + A1,2B2,1 Task 2: C1,2 = A1,1B1,2 + A1,2B2,2 Task 3: C2,1 = A2,1B1,1 + A2,2B2,1 Task 4: C2,2 = A2,1B1,2 + A2,2B2,2
Decomposition I
Task 1: Task 2: Task 3: Task 4: Task 5: Task 6: Task 7: Task 8: C1,1 = A1,1B1,1 C1,1 = C1,1 + A1,2B2,1 C1,2 = A1,1B1,2 C1,2 = C1,2 + A1,2B2,2 C2,1 = A2,1B1,1 C2,1 = C2,1 + A2,2B2,1 C2,2 = A2,1B1,2 C2,2 = C2,2 + A2,2B2,2
Decomposition II
Task 1: Task 2: Task 3: Task 4: Task 5: Task 6: Task 7: Task 8: C1,1 = A1,1B1,1 C1,1 = C1,1 + A1,2B2,1 C1,2 = A1,2B2,2 C1,2 = C1,2 + A1,1B1,2 C2,1 = A2,2B2,1 C2,1 = C2,1 + A2,1B1,1 C2,2 = A2,1B1,2 C2,2 = C2,2 + A2,2B2,2
(b) Partitioning the frequencies (and itemsets) among the tasks A, B, C, E, G, H B, D, E, F, K, L Database Transactions A, B, F, H, L D, E, F, H F, G, H, K, A, E, F, K, L B, C, D, G, H, L G, H, L D, E, F, K, L F, G, H, L task 1 A, B, C Itemsets D, E C, F, G A, E Itemset Frequency A, B, C, E, G, H B, D, E, F, K, L A, B, F, H, L D, E, F, H F, G, H, K, A, E, F, K, L B, C, D, G, H, L G, H, L D, E, F, K, L F, G, H, L task 2 C, D Itemsets D, K B, C, F C, D, K Itemset Frequency 1 2 0 0
Partitioning the transactions among the tasks Database Transactions Database Transactions A, B, C, E, G, H B, D, E, F, K, L A, B, F, H, L Itemsets D, E, F, H F, G, H, K, A, B, C Itemset Frequency D, E C, F, G A, E C, D D, K B, C, F C, D, K task 1 1 2 0 1 0 1 0 0 A, B, C C, F, G Itemsets A, E, F, K, L B, C, D, G, H, L G, H, L D, E, F, K, L F, G, H, L task 2 A, E C, D D, K B, C, F C, D, K Itemset Frequency D, E 0 1 0 1 1 1 0 0
Partitioning both transactions and frequencies among the tasks Database Transactions A, B, C, E, G, H B, D, E, F, K, L A, B, F, H, L Itemsets D, E, F, H F, G, H, K, A, B, C Itemset Frequency D, E C, F, G A, E 1 2 0 1 Database Transactions A, B, C, E, G, H A, B, F, H, L D, E, F, H F, G, H, K, Itemsets C, D D, K B, C, F C, D, K task 1 Database Transactions A, B, C C, F, G Itemsets A, E, F, K, L B, C, D, G, H, L G, H, L D, E, F, K, L F, G, H, L task 3 A, E Itemset Frequency D, E 0 1 0 1 Database Transactions task 2 Itemset Frequency Itemset Frequency B, D, E, F, K, L
0 1 0 0
A, E, F, K, L B, C, D, G, H, L G, H, L D, E, F, K, L F, G, H, L
Itemsets
C, D D, K B, C, F C, D, K
1 1 0 0
task 4
.
A 1,2 A 2,2
B 1,1
B 1,2
D1,1,1 D1,2,1
D1,1,2 D1,2,2
.
B 2,1 B 2,2
D2,1,1 D2,2,1
D2,1,2 D2,2,2
C 1,1 C 2,1
C 1,2 C 2,2
A1,1 A2,1
A1,2 A2,2
B1,1 . B2,1
B1,2 B2,2
Stage II
D1,1,1 D1,2,2
D1,1,2 D1,2,2
D2,1,1 D2,2,2
D2,1,2 D2,2,2
C1,1 C2,1
C1,2 C2,2
Task 01: Task 03: Task 05: Task 07: Task 09: Task 11:
D1,1,1 = A1,1 B1,1 D1,1,2 = A1,1 B1,2 D1,2,1 = A2,1 B1,1 D1,2,2 = A2,1 B1,2 C1,1 = D1,1,1 + D2,1,1 C2,1 = D1,2,1 + D2,2,1
Task 02: Task 04: Task 06: Task 08: Task 10: Task 12:
D2,1,1 = A1,2 B2,1 D2,1,2 = A1,2 B2,2 D2,2,1 = A2,2 B2,1 D2,2,2 = A2,2 B2,2 C1,2 = D1,1,2 + D2,1,2 C2,2 = D1,2,2 + D2,2,2
Exploratory Decomposition
In many cases, the decomposition of the problem goes handin-hand with its execution. These problems typically involve the exploration (search) of a state space of solutions. Problems in this class include a variety of discrete optimization problems (0/1 integer programming, QAP etc.), theorem , proving, game playing, etc.
9 10 7 11 13 14 15 12
9 10
9 10 11 13 14 15 12
9 10 11 12 13 14 15
13 14 15 12
(a)
(b)
(c)
(d)
Of-course, the problem of computing the solution, in general, is much more difcult than in this simple example.
The state space can be explored by generating various successor states of the current state and to view them as independent tasks.
1 5 9 10 13 14 15 12 11 6 7 8
task 4
1 5 9 10 11 13 14 15 12 1 5 9 10 11 12 13 14 15 1 5 9 10 13 14 15 12 1 5 1 5 9 1 5 9 6 10 11 13 14 15 12 1 5 2 6 13 1 5 2 6 3 7 4 8 9 14 10 11 15 12 3 8 9 10 7 11 13 14 15 12 1 5 1 5 2 6 3 4 8 2 3 6 4 8 9 10 7 11 13 14 15 12 4 7 8 2 3 4 13 14 15 12 10 11 6 7 8 13 14 15 12 2 3 4 9 10 11 6 7 8 2 3 4 11 6 7 8 2 3 4 6 7 8 2 3 4 13 14 15 12 9 10 11 8 6 7 8 5 6 7
1 5 9 10 13 14 15 12 11 6 7 8
task 2
task 3
9 10 7 11 13 14 15 12
1 5
2 6 3
4 8 9 10 7 11 13 14 15 12 1 5 2 6 9 10 3 7 4 8 11 13 14 15 12 1 5 2 6 3 7 13 14 12 1 5 2 6 3 7 4 8 1 5 2 6 3 7 4 8 4 8 9 10 15 11
task 1
9 10 15 11 13 14 12
9 10 15 11 13 1 5 2 6 9 10 14 12 3 7 4 8 11 13 14 15 12
m Solution
Speculative Decomposition
In some applications, dependencies between tasks are not known a-priori. For such applications, it is impossible to identify independent tasks. There are generally two approaches to dealing with such applications: conservative approaches, which identify independent tasks only when they are guaranteed to not have dependencies, and, optimistic approaches, which schedule tasks even when they may potentially be erroneous. Conservative approaches may yield little concurrency and optimistic approaches may require roll-back mechanism in the case of an error.
System Output
Hybrid Decompositions
Often, a mix of decomposition techniques is necessary for decomposing a problem. Consider the following examples:
In quicksort, recursive decomposition alone limits concurrency (Why?). A mix of data and recursive decompositions is more desirable. In discrete event simulation, there might be concurrency in task processing. A mix of speculative decomposition and data decomposition may work well. Even for simple problems like nding a minimum of a list of numbers, a mix of data and recursive decomposition works well.
3 7 2 9 2 1 11 4 5 8 7 10 6 13 1 1 19 3 9 Data decomposition
Recursive decomposition
Characteristics of Tasks
Once a problem has been decomposed into independent tasks, the characteristics of these tasks critically impact choice and performance of parallel algorithms. Relevant task characteristics include: Task generation. Task sizes. Size of data associated with tasks.
Task Generation
Static task generation: Concurrent tasks can be identied a-priori. Typical matrix operations, graph algorithms, image processing applications, and other regularly structured problems fall in this class. These can typically be decomposed using data or recursive decomposition techniques. Dynamic task generation: Tasks are generated as we perform computation. A classic example of this is in game playing each 15 puzzle board is generated from the previous one. These applications are typically decomposed using exploratory or speculative decompositions.
Task Sizes
Task sizes may be uniform (i.e., all tasks are the same size) or non-uniform. Non-uniform task sizes may be such that they can be determined (or estimated) a-priori or not. Examples in this class include discrete optimization problems, in which it is difcult to estimate the effective size of a state space.
Tasks
Pixels
0 1
6 7
8 Task 11 8 9 10 11
(a)
(b)
Mapping Techniques
Once a problem has been decomposed into concurrent tasks, these must be mapped to processes (that can be executed on a parallel platform). Mappings must minimize overheads. Primary overheads are communication and idling. Minimizing these overheads often represents contradicting objectives. Assigning all work to one processor trivially minimizes communication at the expense of signicant idling.
finish
(a)
(b)
ements
cements
(a)
(b)
ements
X =
(a)
B P0 P4
X =
C P1 P5 P9 P2 P6 P3 P7
P8
P10 P11
0 0 L1,1 U1,1 U1,2 U1,3 A1,1 A1,2 A1,3 A2,1 A2,2 A2,3 L2,1 L2,2 0 . 0 U2,2 U2,3 A3,1 A3,2 A3,3 0 0 U3,3 L3,1 L3,2 L3,3 1: 2: 3: 4: 5: A1,1 L1,1U1,1 1 L2,1 = A2,1U1,1 1 L3,1 = A3,1U1,1 U1,2 = L1 A1,2 1,1 U1,3 = L1 A1,3 1,1 6: A2,2 = A2,2 L2,1U1,2 7: A3,2 = A3,2 L3,1U1,2 8: A2,3 = A2,3 L2,1U1,3 9: A3,3 = A3,3 L3,1U1,3 10: A2,2 L2,2U2,2 11: 12: 13: 14:
1 L3,2 = A3,2U2,2 U2,3 = L1 A2,3 2,2 A3,3 = A3,3 L3,2U2,3 A3,3 L3,3U3,3
Inactive part
Column j
Row k
(k,k)
(k,j)
A[k,j] := A[k,j]/A[k,k]
Active part
Row i
(i,k)
(i,j)
P0
T1
P3
T4
P6
T5
P1
T2
P4 P5
P7 P8
T 6 T10 T 8 T12
P2
T3
T 7 T11 T 9 T13T14
Block-Cyclic Distribution
ements A cyclic distribution is a special case in which block size is one. A block distribution is a special case in which block size is n/p, where n is the dimension of the matrix and p is the number of processes. P4 P P5 0 P6 P1 P7 P2 P8 P 3 P9 P P10 0 P11 P1 P12 P2 P13 P 3 P14 P15 P0 P2 P0 P2
(a)
P1 P3 P1 P3
(b)
P0 P2 P0 P2
P1 P3 P1 P3
Random Partitioning
Process 1 3
6 7
9 Process 2
10
11
C2 = (1,2,4,5,7,8)
Hierarchical Mappings
Sometimes a single mapping technique is inadequate. For example, the task mapping of the binary tree (quicksort) cannot use a large number of processors. For this reason, task mapping can be used at the top level and data partitioning within each level.
P0 P1 P2 P3
P4 P5 P6 P7
P0 P1
P2 P3
P4 P5
P6 P7
P0
P1
P2
P3
P4
P5
P6
P7