You are on page 1of 23

Teradata Join Processing

Center of Excellence Data Warehousing Wipro Technologies

Join Processing
 Rows to be joined must be on the same AMP.  For join processing, copies of some or all of

the rows may have to be moved to a common AMP.


 Join plans  Product join.  Merge join  Nested join

Join Processing
 General scenarios:


Join column is the PI of both the tables. Join column is PI of one of the tables. Join column is not a PI of either of the table.

Case 1- PI of both the tables


 Rows taking part in the join are already in the

same AMP.
 No data movement is necessary.  Rows are already in sorted order (within the

block)
 This is the best case scenario.

Case 2 - PI of one of the tables


 One table has its rows on the target AMP.  Rows of the other table need to be

redistributed to their target AMPs by the hash code of the join column value.
 If the table is small optimizer may choose to

duplicate the table on all AMPs

Case 3 - not a PI of either of the table


 Rows of both the tables need to redistributed

to their target AMPs by the hash code of the join column value.
 Optimizer might choose to duplicate the

smaller table on all AMPs.


 This join scenario involves maximum number

of data movement.

Nested Join
 Optimizer choose this join strategy when


An equality value for a unique index (UPI or USI) on table 1. A join on a column of that single row to any index on table 2.

 This joining uses minimum system resource


data value data value data value data value data value data value UPI , data column USI , data column UPI , data column USI , data column UPI , data column USI , data column PI PI USI USI = = = = 2 AMPs 3 AMPs 3 AMPs 4 AMPs 1 OR MORE ROWS RETURNED 1 OR MORE ROWS RETURNED 1 ROW RETURNED 1 ROW RETURNED

NUSI = NUSI =

ALL AMPs 1 OR MORE ROWS RETURNED ALL AMPs 1 OR MORE ROWS RETURNED

Product Join
 Most general for of join  Optimizer chooses product join in following conditions  WHERE clause is missing.  Join condition is not based on equality condition.  Join conditions are ORed together.  Table alias are incorrectly used.  Optimizer determines that it is less expensive than other join types.  Identify the smaller table duplicate it in spool on all

AMPs. Join each spool row of the smaller table to every row of the larger table.

Merge Join
 Commonly done when the join conditions are based on equality.  Generally more efficient than Product Join as number of row

comparisons are less.


 Steps
    

Identify the smaller table. Put the qualifying rows from one or both table into spool. Move the spool rows to the AMPs based on join column hash (if required). Sort the spool rows by join column hash value (if necessary). Compare those rows with matching join column hash values.

Merge Join
Row Hash 110A 120B 203C 210D Col1 Row Hash 110A 110A 111B 111B 203C 203C 203C 110E Col1 Col2.

Example
Table 1
Col1 (PK) 100 200 300 400 500 600 700 800 900 1000 2000 3000 4000 Col2 P Q R S T X Y Z A B C D E Col3 (FK) 600 600 700 200 500 200 300 500 800 300 300 300 200

Table 2
Col1 (PK) 100 200 300 400 500 600 700 800 Col2 K L M N O P Q R

Example
100 P 600 800 Z 500 1000 B 300 400 S 200 700 Y 300 2000 C 300 4000 E 200 400 N 700 Q 200 Q 600 500 T 500 3000 D 300 300 R 700 600 X 200 900 A 800

100 K 800 R

200 L 500 O

300 M 600 P

Row Distribution Strategy 1


 No distribution needed.  No sorting needed.  Join columns of both the tables are PIs.


Rows involved in the join are located in the same AMP.

Case 1 - Example
SELECT * FROM Table1 t1 INNER JOIN Table2 t2 ON t1.Col1 = t2.Col1

100 P 600 800 Z 500 1000 B 300

400 S 200 700 Y 300 2000 C 300 4000 E 200 400 N 700 Q

200 Q 600 500 T 500 3000 D 300

300 R 700 600 X 200 900 A 800

100 K 800 R

200 L 500 O

300 M 600 P

Row Distribution Strategy 2


 Distributing and sorting one of the table on

join column row hash.


 Join column is PI of one of the tables.


One of the tables is already distributed on join Column Row Hash. Optimizer redistributes one of the tables and sort on join column row hash.

Case 2 Example
SELECT * FROM Table1 t1 INNER JOIN Table2 t2 ON t1.Col3 = t2.Col1
100 P 600 800 Z 500 1000 B 300 400 S 200 700 Y 300 2000 C 300 4000 E 200 400 N 700 Q 200 Q 600 500 T 500 3000 D 300 300 R 700 600 X 200 900 A 800

100 K 800 R

200 L 500 O 600 X 200 400 S 200 4000 E 200 800 Z 500 500 T 500

300 M 600 P 1000 B 300 3000 D 300 700 Y 300 2000 C 300 200 Q 600 100 P 600

900 A 800

300 R 700

100 K 800 R

400 N 700 Q

200 L 500 O

S P O O L

300 M 600 P

Row Distribution Strategy 3


 Duplicating and sorting the smaller table on

all AMPs and locally building the larger table and sorting it.
 Optimizer considers this strategy if it finds

redistributing a larger table is more expensive than duplicating a the smaller table.

Case 2 Example
100 P 600 800 Z 500 1000 B 300 400 S 200 700 Y 300 2000 C 300 4000 E 200 400 N 700 Q 400 S 200 4000 E 200 700 Y 300 2000 C 300 100 200 300 400 500 600 700 800 K L M N O P Q R 200 Q 600 500 T 500 3000 D 300 300 R 700 600 X 200 900 A 800

100 K 800 R 1000 B 300 100 P 600 800 Z 500

200 L 500 O 3000 D 300 500 T 500 200 Q 600

300 M 600 P 600 X 200 300 R 700 900 A 800

100 200 300 400 500 600 700 800

K L M N O P Q R

100 200 300 400 500 600 700 800

K L M N O P Q R

100 200 300 400 500 600 700 800

K L M N O P Q R

S P O O L

Row Distribution Strategy 4


 Duplicate the smaller table on every AMP.  Optimizer chooses this strategy the join

condition is not based on equality.


 Product join scenario.

Explain Facility
 Provides an English translation of the steps

chosen by the optimizer.


 Very helpful to estimate the performance of

complex queries.
 Helps physical designers in their index

selection by providing the execution strategy chosen by the optimizer.

Explaining the EXPLAIN


 Generally EXPLAIN outputs are clear and easy to

understand however it contains few phrases one needs to be familiar with.




    

.with no residual conditions : There is no residual conditions other than the conditions used locate the row. ..eliminating duplicates.. : DISTINCT operation being done. we do a SMS : Set manipulations like UNION, EXCEPT are being done. we do a BMSMS : NUSI Bit mapping being used. distributed by hash code to all AMPs duplicated on all AMPs

Statistics
 Optimizer needs demographic information to create best

execution plan for a query.  Number of rows in the table.  Row size.  Number of rows per value.  Index information and demographics. the best plan.

 Based on the statistics optimizer estimates the cost and creates

 Statistics must be collected for the columns and indexes being

accessed frequently. (Random AMP).

 If Statistics are not provided, optimizer does Dynamic Sampling

Questions ?

You might also like