Professional Documents
Culture Documents
Join Processing
Rows to be joined must be on the same AMP. For join processing, copies of some or all of
Join Processing
General scenarios:
Join column is the PI of both the tables. Join column is PI of one of the tables. Join column is not a PI of either of the table.
same AMP.
No data movement is necessary. Rows are already in sorted order (within the
block)
This is the best case scenario.
redistributed to their target AMPs by the hash code of the join column value.
If the table is small optimizer may choose to
to their target AMPs by the hash code of the join column value.
Optimizer might choose to duplicate the
of data movement.
Nested Join
Optimizer choose this join strategy when
An equality value for a unique index (UPI or USI) on table 1. A join on a column of that single row to any index on table 2.
NUSI = NUSI =
ALL AMPs 1 OR MORE ROWS RETURNED ALL AMPs 1 OR MORE ROWS RETURNED
Product Join
Most general for of join Optimizer chooses product join in following conditions WHERE clause is missing. Join condition is not based on equality condition. Join conditions are ORed together. Table alias are incorrectly used. Optimizer determines that it is less expensive than other join types. Identify the smaller table duplicate it in spool on all
AMPs. Join each spool row of the smaller table to every row of the larger table.
Merge Join
Commonly done when the join conditions are based on equality. Generally more efficient than Product Join as number of row
Identify the smaller table. Put the qualifying rows from one or both table into spool. Move the spool rows to the AMPs based on join column hash (if required). Sort the spool rows by join column hash value (if necessary). Compare those rows with matching join column hash values.
Merge Join
Row Hash 110A 120B 203C 210D Col1 Row Hash 110A 110A 111B 111B 203C 203C 203C 110E Col1 Col2.
Example
Table 1
Col1 (PK) 100 200 300 400 500 600 700 800 900 1000 2000 3000 4000 Col2 P Q R S T X Y Z A B C D E Col3 (FK) 600 600 700 200 500 200 300 500 800 300 300 300 200
Table 2
Col1 (PK) 100 200 300 400 500 600 700 800 Col2 K L M N O P Q R
Example
100 P 600 800 Z 500 1000 B 300 400 S 200 700 Y 300 2000 C 300 4000 E 200 400 N 700 Q 200 Q 600 500 T 500 3000 D 300 300 R 700 600 X 200 900 A 800
100 K 800 R
200 L 500 O
300 M 600 P
Case 1 - Example
SELECT * FROM Table1 t1 INNER JOIN Table2 t2 ON t1.Col1 = t2.Col1
400 S 200 700 Y 300 2000 C 300 4000 E 200 400 N 700 Q
100 K 800 R
200 L 500 O
300 M 600 P
One of the tables is already distributed on join Column Row Hash. Optimizer redistributes one of the tables and sort on join column row hash.
Case 2 Example
SELECT * FROM Table1 t1 INNER JOIN Table2 t2 ON t1.Col3 = t2.Col1
100 P 600 800 Z 500 1000 B 300 400 S 200 700 Y 300 2000 C 300 4000 E 200 400 N 700 Q 200 Q 600 500 T 500 3000 D 300 300 R 700 600 X 200 900 A 800
100 K 800 R
200 L 500 O 600 X 200 400 S 200 4000 E 200 800 Z 500 500 T 500
300 M 600 P 1000 B 300 3000 D 300 700 Y 300 2000 C 300 200 Q 600 100 P 600
900 A 800
300 R 700
100 K 800 R
400 N 700 Q
200 L 500 O
S P O O L
300 M 600 P
all AMPs and locally building the larger table and sorting it.
Optimizer considers this strategy if it finds
redistributing a larger table is more expensive than duplicating a the smaller table.
Case 2 Example
100 P 600 800 Z 500 1000 B 300 400 S 200 700 Y 300 2000 C 300 4000 E 200 400 N 700 Q 400 S 200 4000 E 200 700 Y 300 2000 C 300 100 200 300 400 500 600 700 800 K L M N O P Q R 200 Q 600 500 T 500 3000 D 300 300 R 700 600 X 200 900 A 800
K L M N O P Q R
K L M N O P Q R
K L M N O P Q R
S P O O L
Explain Facility
Provides an English translation of the steps
complex queries.
Helps physical designers in their index
.with no residual conditions : There is no residual conditions other than the conditions used locate the row. ..eliminating duplicates.. : DISTINCT operation being done. we do a SMS : Set manipulations like UNION, EXCEPT are being done. we do a BMSMS : NUSI Bit mapping being used. distributed by hash code to all AMPs duplicated on all AMPs
Statistics
Optimizer needs demographic information to create best
execution plan for a query. Number of rows in the table. Row size. Number of rows per value. Index information and demographics. the best plan.
Questions ?