You are on page 1of 39

Teradata V13

Join Strategy
LEVEL LEARNER

About the Author

Created By:

Sujata Datta (173236)

Credential
Information:

5 years of Teradata hands on experience in WellPoint account


NR 011(Basic SQL) and NR 013(Teradata SQL Assistant)

Version and
Date:

TD/PPT/2012-09-10/1.0

ognizant Certified Official Curriculum

Icons Used

Questions

Tools

Coding
Standards

Test Your
Understandi
ng

Referenc
e

A Welcome
Break

Contacts

Demonstration

Hands on
Exercise

Join Strategy - Overview


Introduction:
Teradata supports a number of Join strategies. The join strategy
for a query is chosen by the optimizer at compile time.
The optimizer chooses by evaluating the relative cost of each
possible strategy and choosing the best.
Most Teradata joins operate on two tables at a time. The optimizer
builds a query plan out of successive two table joins until the
result relation has been built.
In this chapter you will learn the various kind of join strategies.

Join Strategy - Objective


This chapter will give a brief idea of the following:
-- Join types
-- what is meant by join
-- different types of join strategies such as
Hash Join,
Merge Join,
Nested Join,
Exclusion Join,
Product Join
--Define usage of join index and hash Index
-- understand how Teradata chooses the best join strategy
5

Join Strategy Vs. Join


Types
Teradata Optimizer has the ability to interpret a users join
types and then make decisions on what is the best path or join
strategy to take in order complete the query.
Teradata allows up to 64 tables to be joined in a single query.
Some of the common join types are:
1. Inner
2. Outer (Left, Right, Full)
3.
Self
4. Cross
5. Cartesian
When user inputs a join type, Teradata will then utilize below
join strategies to perform the join.
1. Merge Join
2. Nested Join
3.
Hash Join
4. Product join
6

Teradata Join Concept


and key things
Each AMP holds a portion of a table.
Teradata uses the Primary Index to distribute the rows among
the AMPs.
Each AMP keeps their tables separated from other tables like
someone might keep clothes in a dresser drawer.
For a JOIN to take place the two rows being joined must find a
way to get to the same AMP.
If the rows to be joined are not on the same AMP, Teradata will
either redistribute the data or duplicate the data in spool to
make that happen.
7

Each AMP sorts their tables by Row ID.

Merge Join Strategy - 1


The 1st merge join will utilize the Primary Index on both tables in
the join equality.
The key here is that both of the Primary Index columns of each
table are used in the WHERE or ON clause in the join type.
SELECT E1.EMP, E2.DEPT, E1.Name, E2.Salary
FROM EMPLOYEE1 E1
INNER JOIN EMPLOYEE2 E2
ON E1.EMP=E2.EMP;
EMP is the Primary Index for both tables.
This first merge join type is extremely efficient because both
columns in the ON clause are the Primary Indexes of their
respective tables.
When this occurs, NO data has to be moved into spool and the
joins can be performed in what is called AMP LOCAL.
8

Merge Join Strategy - 1

The inner join above focuses on returning all rows when there
is a match between the two tables.
Teradata can perform this join with rapid speed.
9

Merge Join Strategy - 2


This merge join is performed on a Primary Index column(DEPT) of
one table (DEPARTMENT) to a non-primary indexed column of
another table( EMPLOYEE).

SELECT EMP, E.Dept, Dept_Name


FROM EMPLOYEE E
INNER JOIN DEPARTMENT D
ON E.DEPT=D.DEPT;
10

Merge Join Strategy - 2


The
department table
that has an
equality
condition match
on the Primary
Index Column
would be
stationary on the
AMP.
The next step
would be to
This would be accomplished by hashing (locating)
the the
columns
move
rows in
the employee table, and then moving these rows from
into spool
the to the
appropriate AMPs where the department table rows
reside. table
Employee
11
into spool.

Merge Join Strategy - 3


Strategy 3 happens when neither table is being joined on the primary index.
In this case Teradata will redistribute both tables into spool and sort them
by hash code.
When we want to redistribute and sort by hash code we merely hash the
non-primary index columns and move them to the AMPs spool where they are
sorted by hash.
Once this is accomplished, then the appropriate rows are together in spool
on all the AMPs.
The Primary Index of the department table is DEPT and the Primary Index
for the manager table is LOC.

12

Merge Join Strategy - 3


SELECT
LOC,
Dept_Name,Budget
FROM DEPARTMENT
INNER JOIN MANAGER
On MgrEmp=MgrNo;
Basically rows from
both tables will need to
be rehashed and
redistributed into
SPOOL.

Therefore, both tables are redistributed based on


the
13 ON clause.

The reason is
because neither
columns selected in
the ON Clause are the
Primary Index of the
respective tables.

Merge Join Strategy - 3


The next step in this process is to redistributed the rows and
locate them to the matching AMPs.
When this is completed, the rows from both tables will be located
in two different spools.
Lastly, the rows in each spool will be joined together to bring back
the matching rows.
This type of join strategy is extremely inefficient.
It consumes a ton of resources and time to manage and assemble
this type of join.

14

Merge Join Strategy - 4


This type of Join Strategy is taken for Small Table Big Table Join.
If one of the tables being joined is small, then Teradata may choose a
strategy that will duplicate the smaller table across all the AMPs.
The key about this strategy is that regardless if the table is part of the
Primary Index Column or not Teradata could still choose to duplicate the table
across all the AMPs.

15

Merge Join Strategy - 4


SELECT EMP, Salary , Dept_name
FROM EMPLOYEE E
INNER JOIN
DEPARTMENT D
ON e.dept=d.dept;

In this inner join above, the two tables involved in the join are the
Employee table and the Department table.
The DEPT column is the join equality and is the Primary Index
Column in the Department table.
The Employee table has the EMP column as the Primary Index.
The final analysis of this join is that the Department table is small
and makes a good candidate for this type of join strategy.
Teradata will choose to duplicate the entire Department table on
each AMP into spool.
Once this is completed, then the next step is for the AMPs to join
the base Employee rows with the Department rows.
16

Merge Join Strategy - 4


Instead of
redistributing the
larger Employee
table, which is not
part of the Primary
Index Column in the
equality (ON)
condition.
Strategy of
Teradata would be to
duplicate the smaller
table across all the
AMPs (Big Table
-Small Table Join).

17

This merge join


strategy will
consume minimal

Hash Join Strategy


A Hash Join can only take place if one or both of the tables on each
AMP can fit completely inside the AMPs memory.
SELECT Emp, DeptName, MgrEmp
FROM Employee_Table
INNER JOIN Department_Table
ON Emp = MgrEmp;

18

Hash Join Strategy Example


In the example, the join condition is based on EMP and MgrEmp column
and the column names are not same.
Now the smaller table( Department) among these two will be sorted by
row hash and duplicated in each AMP.
Teradata will use the join column of larger table for a match in duplicated
smaller table records.
Rows never get into spool and no disk intervention which increases
performance.

19

Nested join Strategy


This join is designed to utilize a unique index type (Either Unique
Primary Index or Unique Secondary Index) from one of the tables
in the join statement in order to retrieves a single row.
It then matches that row to one or more rows on the other table
being used in the join.
From the example below, the nested join has the join equality (ON)
condition based on the DEPT column.
The dept column is the Primary Index Column on the department
table. In addition, the dept column is the Secondary Index Column
in the employee table.

20

SELECT Emp, Salary , DeptName


FROM Employee_Table e
INNER JOIN Department_Table d
ON e.dept= d.dept
WHERE d.dept=10;

Nested join Strategy Example


Since there is only one row in the department table that has a
match for department =10, which is based on the AND option in
the join statement, the Teradata Optimizer will choose a path to
move the department table columns into spool and duplicate them
across all the AMPs.

21

Exclusion Join Strategy


SELECT EMP, Dept, Name
FROM EMPLOYEE E
WHERE E.dept='10'
AND EMP NOT IN ( SELECT MgrEmp FROM
DEPARTMENT
WHERE MgrEmp IS NOT NULL);
The above join exclude rows during the join, instead of finding
matching rows between the joined tables.
Exclusion joins are used for finding rows that dont have a
matching row in the other table.
Queries with the NOT IN operator are the types of queries that
always result in exclusion joins.
In this case, this query will find all the employees who belong to
department 10 who are NOT managers.
22

Exclusion Join Strategy


These joins will always involve a Full Table Scan because Teradata
will need to compare every record to eliminate rows that will need
to be excluded.
This type of join can be resource intensive if the two tables in this
comparison are large.
NULLs are considered as unknowns so the data returned in the
answer will be NULLs because of the NOT IN statement.
There are two ways to correct this:
Define NOT IN columns as NOT NULL on the CREATE TABLE.
Add the AND WHERE Column IS NOT NULL to the end of
the JOIN as seen in the above example.
23

Product Join
Product Joins compare every row of one table to every row of
another table.
The result of this join is a product of the number of rows in table
one multiplied by the number of rows in table two.
About 99% of the time, product joins are major mistakes, because
all rows in both tables will be compared.
SELECT EMP, E.Dept, Name , DeptName
FROM EMPLOYEE E, DEPARTMENT D
WHERE EMP LIKE '_b%';
To avoid a product join, the join should be based on an EQUALITY
condition.
The equality statement reads WHERE EMP Like _b%, but no
common domain condition between the two tables (i.e., e.dept =
d.dept).
Another cause of a product join is when aliases are not used after
24being established.

Cartesian Product Join


It is kind of Product join without even WHERE clause.
This kind of join utilize all the spool space assigned and is huge
performance bottleneck.
SELECT EMP, E.Dept, Name , DeptName
FROM EMPLOYEE E, DEPARTMENT D
WHERE EMP LIKE '_b%';
To avoid a Cartesian product join, the join should be based on an
EQUALITY condition. Or at least a WHERE condition should be
present.
The equality statement reads WHERE EMP Like _b%, but no
common domain condition between the two tables (i.e., e.dept =
d.dept).
Another cause of a product join is when aliases are not used after
being established.
Finally check your join syntax to ensure the WHERE clause is not
25missing.

Join Index Concept


The Join index will join two tables and hold the result set in
permanent space of Teradata. At the time of join, the Parsing
Engine will decide whether it is fast to build the result set from
Base tables or from the join index.
It can be defined on one or several tables.
The join index result set can not be accessed by the query
directly, only PE can access the same.
Main fundamentals of Join Index are:

26

1. Join Index is not a pointer to data it actually store data in


PERM space.
2. Users never query them directly , its PE who decide which
result set would be more suitable for efficient processing of
data.

Join Index Restrictions


Not more than 64 columns can be defined per joined table in a
join index.
No more than 128 columns can be defined for a compressed
join index definition.
There is not limit on how many columns can be defined in an
uncompressed join index other than system restrictions on the
amount of SQL text required to define them.
Although the Optimizer substitutes only one multitable join
index per referenced table in a query , it also considers
additional single table join indexes for inclusion in the join plan
after the optimal multitable join index has been substituted and
evaluated for the plan.
Columns with BLOB and CLOB data types can not be included
27 in a join index definition.

Join Index Types and


Examples
Multi table join Index:
CREATE JOIN INDEX EMP_DEPT AS
SELECT EMP_NO, EMP_NAME, EMP_DEPT,
EMP_SAL
FROM EMPLOYEE EMP
INNER JOIN DEPARTMENT DEPT
ON EMP.DEPT_NO = DEPT.DEPT_NO
UNIQUE PRIMARY INDEX (EMP_NO);
With above join index created, the result set of the join is held in
perm space. At the time of actually joining the tables during
execution, PE will refer to this result set for fast processing of
data.
28

Join Index Types and


Examples Contd.
Single table join Index:
CREATE JOIN INDEX EMP_SNAP AS
SELECT EMP_NO, EMP_NAME, EMP_DEPT,
EMP_SAL
FROM EMPLOYEE EMP
PRIMARY INDEX (EMP_DEPT);
This join will duplicate a single table , but with a different primary
index as mentioned in the index definition above.
When user queries the base tables, PE will decide which one is
faster based on the query issued. Obtaining the data from Join
index result set or from base table directly.
29

Join Index Types and


Examples Contd.
Aggregate join Index:
CREATE JOIN INDEX AGG_Table
SELECT EMP_NO, SUM(EMP_SAL)
FROM EMPLOYEE EMP
GROUP BY 1;
This join will allow the tracking of Average , sum or count of a
column in a table. This join helps PE to get aggregated data
faster during query processing.

30

Hash IndexPurpose
Hash Index is used for same purpose as single table join index,
i.e. generate the result set from the join and store in PERM space
for PEs use.
Hash Index create a partial or full replication of Base table with a
primary index on a foreign key column table to facilitate the joins
of very large table by hashing them to the same AMP.
Hash Index can not work on aggregates like single table join
index.
Hash Index can be defined on one table only.
The result set generated by Hash Index cannot be accessed by
the query.
The rows of Hash Index are sometimes a little shorter than the
31
Single table join index rows. Hence have a small storage

Hash Index Limitation


Excluding the Primary Index we can only define 32 indexes on
table. These 32 indexes can be a combination of Hash,
Secondary and Join Index.
Columns having BLOB and CLOB data types are not allowed in a
hash index definition.
A hash index can not have partitioned data index.

32

Hash IndexExamples
This index is built for the table 'emp1' which is defined as follows:
CREATE SET TABLE emp1
(employee_number INTEGER
, manager_employee_number INTEGER
, department_number INTEGER
, job_code INTEGER
, last_name CHAR(20) NOT NULL
, first_name VARCHAR(30) NOT NULL
, salary_amount DECIMAL(10,2) NOT NULL)
UNIQUE PRIMARY INDEX ( employee_number );
Example 1:
CREATE HASH INDEX hash_1
(employee_number, department_number) ON emp1
BY (employee_number)
ORDER BY HASH (employee_number);
33

Hash IndexExamples
Contd.
Each hash index row contains the employee number, the
department number.
Specifying the employee number is unnecessary, since it is the
primary index of the base table and will therefore be
automatically included.
The BY clause indicates that the rows of this index will be
distributed by the employee_number hash value.
The ORDER BY clause indicates that the index rows will be
ordered on each AMP in sequence by the employee_number
hash value.
Example 2:
The same hash index definition could have been abbreviated as
follows:
CREATE HASH INDEX hash_1
(employee_number, department_number) ON emp1;
34

Questions

35

Welcome Break

36

Test Your Understanding


Which Join strategy works for Equality join condition?
Which Join Strategy works for Inequality join condition?
Which Join Strategy works for filter condition?
Maximum how many columns can be defined in a table as join
index?
What should be done to avoid product join?

37

Source
Teradata Forum
Release 13.10

Disclaimer: Parts of the content of this course is based on the materials available from the
Web sites and books listed above. The materials that can be accessed from linked sites are
not maintained by Cognizant Academy and we are not responsible for the contents thereof.
All trademarks, service marks, and trade names in this course are the marks of the
respective owner(s).
38

Teradata 13.0
You have successfully
completed Join Strategy

You might also like