You are on page 1of 35

Existence check

Sometimes we need to check if a


specific datum is in the table
in order to decide what to do.

e.g. If it exists do “A”


Else do “B”. r
u la
s ing
m -
u
Dat lu r al
- p
a
Dat
Existence check -BAD (1)

SELECT DISTINCT 1
INTO :HV
FROM TABLE
WHERE ….

1. If there is no index to satisfy the “WHERE”


then we do tablespace scan and sort.
2. Even if we have an index, we may have duplicates

so we still invoke the sort.


3. Even if we have unique index – it may be changed

in the future.
Existence check –BAD (2)
SELECT COUNT(*)
INTO :HV
FROM TABLE
WHERE ….

1. If there is no index to satisfy the “WHERE”


then we do tablespace scan.
2. Even if we have index, we count all
occurrences.
(It’s o.k. If we were asked to count all rows).
Existence check – GOOD (3)

1. DECLARE CURSOR.
(Optimize for 1 row with UR)
2. OPEN CURSOR.
3. FETCH.
4. CLOSE CURSOR.

1. Need to code 4 SQL statements.


2. Looks cumbersome.
3. Uses more resource then the next idea.
Existence check – GOOD (4)

SELECT 1
INTO :HV
FROM TABLE
WHERE ….

IF SQLCODE = +100 THEN


“NOT_EXISTS”
IF SQLCODE = 0 or
SQLCODE = -811 THEN “EXISTS”.

1. Needs some documentation in the program.


2. Don’t use any data returned by it.
Internal secret
• A singleton select is done internally as a cursor !
• Does not use cross memory calls as regular cursor.
• Internal code length is shorter then regular cursor.
• DB2 builds a cursor read, internally, for the
singleton select and does 1 or 2 fetch commands:
– If the 1st command finds nothing,
we get a sqlcode = +100.
– If the 2nd command finds something,
we get a sqlcode = -811.
– Else we get the data and sqlcode = 0.
Sub Select - BAD

SELECT * FROM T1
WHERE T1.CODE IN
(select T2.code
from T2 where T2.key = ‘X’)

• Will cause tablespace scan on T1.


• DB2 may change this type of
sub select to Join (if possible).
Do it as join -GOOD

SELECT T1.*
FROM T1 , T2
WHERE T2.KEY = ‘X’
AND T1.CODE=T2.CODE
Sub Select - GOOD

SELECT * FROM T1
WHERE T1.CODE NOT IN
(select T2.code
from T2
where T2.key = ‘X’)

Can’t be done as a Join.


Sub Select - GOOD

SELECT * FROM T1
WHERE NOT EXISTS Can’t be done
(select 1 as a Join.
from T2
where T1.code=T2.code)
Sub Select – BAD or GOOD ?

SELECT A1, A2, A3


FROM T1
WHERE A1 = ?
AND A2 =
(select max(A2)
from T1)
Use a cursor - BAD or GOOD ?

DECLARE CRS1 CURSOR FOR


SELECT A1, A2, A3
FROM T1
WHERE A1 = ?
ORDER BY A2 DESC
OPTIMIZE FOR 1 ROW

Open crs1;
fetch crs1 into…. ;
close crs1
Statistics

Time CPU SQL Sorts Locks Rows


0.00392 0.00341 4 0 7
4
0.00625 0.00517 4 1 9
24

Cursor

Sub-Select
Assuming proper index
in both cases !!!
Sub-Query vs. Cursor - Conclusion

• Assuming proper index:


– If the command is used infrequently then we
can use the sub-query,
otherwise – use the cursor.
• If no proper index exists:
– The cursor will invoke sort on all the rows that
conform to the search criteria.
– The sub-query will scan all rows for the
max/min value but will not sort.
– Use the sub-query.
Conclusion

Proper indexes can help 


Real life example (1)

SELECT *
FROM MNTB.TVTNSDRA A

WHERE A.LOT_NUMBER IN
(SELECT B.LOT_NUMBER
FROM MNTB.TVTNITUR B
WHERE UNIT = '638‘ AND
B.LOT_NUMBER = A.LOT_NUMBER);
Canceled after 23 minutes elapsed

Join column not 1st in index


Real life example (2)

SELECT A.*
FROM MNTB.TVTNSDRA A,
MNTB.TVTNITUR B
WHERE UNIT = '638‘ AND
B.LOT_NUMBER = A.LOT_NUMBER
WITH UR ;

Canceled after 14 minutes elapsed

Join column not 1st in index


Real life example (3)

SELECT *
FROM MNTB.TVTNSDRA A
WHERE A.LOT_NUMBER IN
(SELECT DISTINCT B.LOT_NUMBER
FROM MNTB.TVTNITUR B
WHERE UNIT = '638‘)
WITH UR;

Finished after 14 seconds elapsed

Join column not 1st in index


Why?
• The 1st example is a correlated sub-query where
the inner query is executed for every row of the
outer query.
• The 2nd example is a join that has no suitable
index.
• The 3rd example is a non-correlated sub-query
where the inner query is executed only once, the
result table is kept sorted in memory and the
external query checks against it.
Need a date ?

Select distinct current date


from table1;

select current date


from sysibm.sysdummy1;

EXEC SQL
SET :HV = CURRENT DATE ;
Sub Select – IN vs. EXISTS (3)

SELECT A, B, C
FROM TAB1 OUTER
WHERE EXISTS
(SELECT 1
FROM TAB2 INNER
WHERE ……);
Sub Select – IN vs. EXISTS (4)

• If the “inner” table is big or if there is usable


index on it then EXISTS will perform better.

• If the “inner” table is small or there is no usable


index on it then IN will perform better.

• If there are few rows that qualify then the query


will be converted to IN (list) which allows a
matching index scan.
SELECT *
• Don’t use “SELECT *” unless you really need all
columns.

• Each column has to be moved from DB2 page to


the DM, then to the RDS and then to the working
program.

• This move is done field by field.


ORDER BY

• Include only the columns needed for the sort.

Select A1, B1, C1


From table
Where A1 = :hv1
Order by A1, A2, A3

Select A1, B1, C1


From table
Where A1 = :hv1
Order by A2, A3
Cursor within a cursor

• Cursor within a cursor (in program code) means a


lot of unnecessary open & close operations of the
internal cursor.
• Code it as a join / sub-select / in-list instead.
Divide and conquer
• Teachers table
• Courses table
• Each teacher can teach any
number of courses.
• We look for teachers who can
teach all courses.
DIVIDE (1)

CREATE TABLE DIV1


(KD1 INT NOT NULL,
DD1 CHAR(5) NOT NULL);

CREATE TABLE DIV2


(KD2 INT NOT NULL,
KD1 INT NOT NULL);

Bring all records from DIV2 which have all


occurrences from DIV1.
DIVIDE (2)
KD2 KD1
KD1 DD1 100 1
100 2
1 AAA
100 3
2 BBB
3 CCC 101 1
DIV2
101 5
102 1
DIV1 102 2
102 3
102 4

Result: 100 104 1


DIVIDE (3)
SELECT A.KD2
FROM (SELECT DISTINCT DIV2.KD2 AS KD2,
DIV2.KD1 AS KD1
FROM DIV2
GROUP BY DIV2.KD2, DIV2.KD1) AS A

GROUP BY A.KD2
HAVING COUNT(*) = (SELECT COUNT(*) FROM DIV1)
AND NOT EXISTS (SELECT DIV2.KD1
FROM DIV2

WHERE A.KD2=DIV2.KD2

AND DIV2.KD1 NOT IN


(SELECT DIV1.KD1 FROM DIV1));
Find Duplicates

SELECT A, B, C, COUNT(*) AS ‘NUM#’


FROM T1
GROUP BY A, B, C
HAVING COUNT(*) > 1
[ORDER BY 4 DESC]
GROUP BY ON FUNCTIONS (1)

SELECT DEPT, GROSS_SALARY


FROM (SELECT DEPT, SALARY+BONUS
AS GROSS_SALARY
FROM EMP
WHERE RANK >= 30) AS A
GROUP BY DEPT, GROSS_SALARY
GROUP BY ON FUNCTIONS (2)

SELECT SUM(SALARY), MONTH_SAL


FROM (SELECT SALARY
,MONTH(SALARY_DATE)
AS MONTH_SAL
FROM EMP ) AS A
GROUP BY MONTH_SAL
How much (does it costs) ?

Statement type Estimated number of


machine instructions
Simple FETCH 3,500 to 9,000
Singleton SELECT 12,000 to 40,000
Update/Delete/Insert 40,000 to 90,000
The END

You might also like