DB2

Relational Model
Overview
•Introduced by Dr.E.F.Codd in 1970

•Model based on mathematical foundations
•Earlier models intrinsically tied to internal representations
•Developer had to be aware of navigational principles
•Need for a model to be divorced from physical organization
•Require independence between physical & logical model
The Model
•In Oct.85 Dr.E.F.Codd published a two part paper

•It introduced rules for Relational model
•Rules determined whether a product is fully relational or not
Two papers
•Is your DBMS really relational - Oct 14,1985

•Does your DBMS run by the rules - Oct 21,1985
The implications were
•satisfying rules was a technically feasible proposition

•practical benefits if system did satisfy rules
A DBMS is said to be fully relational if it supports Codd’s 12

rules, 9 Structural, 3 Integrity and 18 Manipulative features.
Basic Concepts
•Relation corresponds to a table, resembles files

•Rows of a relation are called Tuples, resemble records
•Columns of a relation are called attributes, resemble fields
•Entity and relationships are both represented as relations
•A relation consists of same kind of tuples
•Structure of a relation is defined by Scheme (definition)
•An instance of a scheme is called Relational instance
Properties of a Relation
•Relation is a set of tuples

•No two tuples in a relation are identical
•Tuples in a relation have no order among themselves
•Attribute values are atomic
•Attribute values map onto a domain
Notion of Keys
Superkey - unique identifier for tuple

Candidate key - minimal superkey
Primary key - Designated candidate key
A Relational model consists of
Structural part
relations, domains, etc.
Integrity part
entity, referential, domain/user defined
Manipulative part
operators & extensions
Structural Features
Relations
Mathematical entity to hold data in the relational model.
A set of n-tuple, perceived as a two dimensional table
where an intersection of a row and a column is a atomic value.
Base tables
A named and autonomous relation, one that actually holds
data.
Query tables
Relations that result from execution of queries, not named and
do not have persistent existance.
View tables
A named, virtual and derived relation, that is defined in terms
of other named relations.
Snapshot tables
A named, derived and real relation, represented in terms of
other relations and also by it’s own materialized data.
Attributes
Correspond to columns of the table, all attribute values are of
the same type and atomic in nature.
Domains
All possible values from which attribute values are chosen.
Primary key
A set of attributes, whose value is a minimal unique identifier
to a row of the table. A designated candidate key.
Foreign keys
A set of attributes, whose value is the the primary key of
another table
Integrity Features
Entity Integrity
No component of primary key of base relation is allowed
to be NULLS
Referential Integrity
The database must not contain any unmatched foreign key
values
Domain defined Integrity

Attribute values should be those within the domain that it
is mapped onto
Manipulative Features
•Restrict
•Project
•Cross product
•Union, Intersection, Difference
•Join
•Divide
•Extensions
Twelve Rules
Information Rule
All information be represented in one and only one way,
i.e values in column positions within rows of tables.
Guaranteed access Rule

Every individual scalar value in database must be logically
addressable by specifying table name, column name and
primary key value.
Systematic treatment of NULL

Support representation of missing and inapplicable information
that is systematic and distinct from all regular values.
Active Online Catalog

Support for online, relational catalog accessible to authorised
users by means of regular query language.
Comprehensive data sublanguage Rule

Supports one relational language that has linear syntax,
that is used interactively and within application programs,
that supports data definition, manipulation, security, integrity
and transaction management operations.
View updating Rule

All views that are technically updatable must be updatable by
the system.
Twelve Rules
High level Insert, Update, Delete

Support for set-at-a-time Insert, Update, Delete operations.
Physical Data Independence

Changes at Internal level do not affect Conceptual level.
Logical Data Independence

Changes at Conceptual level do not affect External level.
Integrity Independence
Integrity constraints specified seperately from application
programs, they are stored in catalog, it is possible to change
integrity constraints without affecting existing code.
Distributive Independence
Existing applications should operate sucessfully when
distributed version of DBMS is first introduced and when
existing data is redistributed around the system.
Non subversion Rule

If the system provides low-level record-at-a-time interface,
then that interface cannot subvert the system, thereby bypassing
relational security or integrity constraints.
SQL
SQL
•special purpose language for accessing & manipulating data

•different from application progamming languages like C,Cobol
•uses a combination of relational algebra & calculus constructs
History
1970 - Dr.Codd proposed Relational model
1971-79 - SEQUEL implemented in System R
1980 - SEQUEL became SQL
1986 - SQL 86 (ANSI standard)
1989 - Follow on to SQL-86 - SQL-89
1992 - SQL-2
1995 - SQL-3
SQL components
Data Definition Language (DDL)

•specifies database schema
•creates, modifies, deletes database objects (tables, view, index)
Data Manipulation Language (DML)

•manipulates data using Insert, Modify, Delete operations
•accesses data from database for queries
Data Control Language (DCL)

•grants and revokes authorisation for database access
•audits database use
•provides transaction management
Data Definition
Base tables
•consists of a row of column headings

•zero or more rows of data values
•each data row contains one scalar value for each column
•all values in a column are of same data type
•row ordering is irrelevant
•order is imposed when rows are retrieved
•columns are considered to be ordered from left to right
•column ordering has a significance
•rows and columns do have a physical ordering as stored version
•physical row and column ordering is transparent to user
•base table is autonomous, it exists in it’s own right
Table Creation
Create Table S
( S# Char(5) Not Null,
SNAME Char(20) Not Null
With Default,
STATUS Smallint Not Null
With Default,
CITY Char(15) Not Null
With Default,
Primary Key (S#)
);
Create Table SCOPY Like S ;
Table Modification
Alter Table S
Add DISCOUNT SmallInt ;
Table Deletion
Drop Table S ;
Index
•indexes are created and dropped using SQL

•data manipulation statements do not refer to indexes at all
•decision to use or not to use index is made by DB2
Index Creation
Create [Unique] Index X on T

(P,Q Desc, R) ;
Index Deletion
Drop Index X ;
Notes on Data definition
•data definition statements can be executed at any time

•possible to create a few tables and start using them
•subsequently new columns could be added
•possible to experiment with effects of indexes
•permits one not to get everything right the first time
Data Manipulation
Select Statement reference
Select [All/Distinct] <scalar-expr>

From <table-names>
Where <condition-expr>
Group By <columns>
Having <condition-expr>
Order By <columns>
Sample table definitions
1. Supplier S (S#,SNAME,CITY,STATUS)
2. Part P (P#,PNAME,COLOR,WEIGHT)
3. Supp-Part SP (S#,P#)
Simple retrieval
Get part names of all parts
Select PNAME
From P
Retrieval with duplicate elimination
Get part numbers for all part supplied
Select Distinct P#
From SP
Retrieval of computed values
Select P#,’Height’,HEIGHT*250
From P
Retrieval of full details
Select *
From S
Qualified Retrieval
Get supplier numbers for suppliers in Bombay

with status above 20
Select S#
From S
Where CITY = ‘Bombay’ and
STATUS > 20
Retrieval using ordering
Get supplier numbers and status for suppliers in

Bombay in descending order of status
Select S#,STATUS
From S
Where CITY = ‘Bombay’
Order By STATUS desc
Order by 3rd column
Select P#,’Height’,’HEIGHT*250
From P
Order By 3,P#
Retrieval using Range of values
Get parts whose weight is in the range of 16..19

both limit values inclusive
Normal way
Select P#,PNAME
From P
Where WEIGHT >= 16 and
WEIGHT <= 19
Using BETWEEN
Select P#,PNAME
From P
Where WEIGHT Between 16 and 19
Similarly
Where WEIGHT Not Between 16 and 19

Retrieval using IN
Get parts whose weight is any one of the following

values 12,16,17
Normal way
Select P#,PNAME
From P
Where WEIGHT = 12 or
WEIGHT = 16 or
WEIGHT = 17
Using IN
Select P#,PNAME
From P
Where WEIGHT IN (12,16,17)
Similarly
Where WEIGHT NOT IN (12,16,17)

Retrieval using NULL
Since NULL is missing or inapplicable information,

normal comparisons won’t work
Get supplier numbers for those suppliers for whom

STATUS is inapplicable
Select S#
From S
Where STATUS Is Null
Similarly
Where STATUS Is Not Null

Cartesian product
Each row of one table joined with every row of

the other table
Select S.*,P.*
From S,P
Equi Join
If the join condition comprises of equality operator,

then the join is known as Equi Join
Select S.*,P.*
From S,P
Where S.CITY = P.CITY
Theta Join
If the rows from a cartesian product are eliminated

by restrict operation on the basis of any condition,
then the join is known as Theta Join.
Select S.*,P.*
From P,SP
Where P.P# = SP.P# And
P.RATE < SP.RATE
Natural Join
From a cartesian product, choose common fields

and compare their values for equality, finally
one of the common fields is eliminated from the
projection
Select S.*,SP.*
From S,SP
Where S.S# = SP.S#
Natural join on 3 tables
Select S.*,SP.*,P.*
From S,SP,P
Where S.S# = SP.S# And
P.P# = SP.P#
Join table with Self
Get employees with their manager names
Select First.E#,First.ENAME,First.M#,
Second.ENAME
From E First, E Second
Where First.M# = Second.E#
Simple Subquery
Suppose suppliers supplying part P2 are S1,S2,S3,S4,

then the query to get supplier names supplying part P2
could be as follows.
Select SNAME
From S
Where S# In (‘S1’,’S2’,’S3’,’S4’)
However we can get S# of suppliers supplying part P2

from the database by the following query
Select S#
From SP
Where P# = ‘P2’
The IN clause of SQL requires a list of values,

and a query with one attribute is also a list of values,
hence can be substituted in the IN clause
Select SNAME
From S
Where S# In ( Select S#
From SP
Where P# = ‘P2’)
This is known as subquery or nested query

Query with multiple nesting
Get supplier names for suppliers supplying at least

one red part
Select SNAME
From S
Where S# In (List of suppliers supplying red part)
Select SNAME
From S
From SP
Where P# In (List of red parts))
Seleet SNAME
From S
From SP
Where P# In ( Select P#
From P
Where COLOR =
‘RED’))
Subquery & Outer query referring to same table
Get supplier number for suppliers who supply atleast

one part supplied by supplier S2
Select Distinct S#
From SP
Where P# In (List of parts supplied by supplier S2)
Select Distinct S#
From SP
Where P# In ( Select P#
From SP
Where S# = ‘S2’)
Subquery with scalar comparisons
Get supplier number for suppliers located in the same

city as supplier S1
Select S#
From S
Where CITY = (City of supplier S2)
Select S#
From S
Where CITY = ( Select CITY
From S
Where S# = ‘S1’)
Query using EXISTS
EXISTS is always associated with a subquery.
EXISTS tests whether the results of an suquery

return zero rows or non zero rows.
EXISTS with a subquery is used as a condition in the

Where clause of the outer query
If the subquery returns one or more rows the

EXISTS condition is TRUE, otherwise it is FALSE
Similarly NOT EXISTS is negation if EXISTS

Query using EXISTS
Get supplier names for suppliers who supply part P2
Select SNAME
From S
Where Exists (
List of suppliers where S# is the same as that
of the outer query and P# is ‘P2’)
Select SNAME
From S
Where Exists ( Select *
From SP
Where S# = S.S# And
P# = ‘P2’)
Example with NOT EXISTS
Get supplier names for suppliers who do not supply

part P2
Select SNAME
From S
Where Not Exists ( Select *
From SP
Where S# = S.S# And
P# = ‘P2’)
Quantified comparisons
Get part names for parts whose height is greater than

every blue part
List of heights of blue parts
Select HEIGHT
From P
Where COLOR = ‘Blue’
Part names for required parts
Select PNAME
From P
Where HEIGHT > ALL
(List of heights of blue parts)
Select PNAME
From P
Where HEIGHT > ALL
( Select HEIGHT
From P
Where COLOR = ‘Blue’)
Similarly
Where HEIGHT > ANY

( Subquery )
Aggregate Functions
Aggregate functions operate on a collection of scalar values of

one column of a table to produce a single scalar value defined
as it’s result
Some aggregate functions are as follows:
COUNT - number of rows

SUM - sum of values
AVG - average of values
MAX - largest/maximum value
MIN - smallest/minimum value
Examples
Get total number of suppliers
Select Count(*)
From S
Get total suppliers supplying parts
Select Count(Distinct S#)

From SP
Examples
Get number of shipments for part P2
Select Count(*)
From SP
Where P# = ‘P2’
Get total quantity of part P2 supplied
Select Sum(QTY)
From SP
Where P# = ‘P2’
Get average quantity of part P3 supplied
Select Sum(QTY)/Count(*)
From SP
Where P# = ‘P3’
Or
Select Avg(QTY)
From SP
Where P# = ‘P3’
Aggregate functions in subquery
Get supplier numbers of suppliers with status less than

maximum status
Select S#
From S
Where STATUS < (Maximum status)
Select S#
From S
Where STATUS < ( Select Max(STATUS)
From S)
Get supplier number and city for all suppliers whose

status is greater than or equal to the average status
of their city
Select S#,STATUS,CITY
From S
Where STATUS >= (Average of their city)
Select S#,STATUS,CITY
From S SX
Where STATUS >= ( Select Avg(STATUS)
From S SY
Where SY.CITY = SX.CITY
)
Group By, Having
Group By statement
•rearranges rows into groups
•on the basis of Group By attributes
•such that each group has same value for Group By attributes
•expressions in Select should be single valued for the group
•such as Aggregate functions or Group By attributes
Example
A part is supplied by more than one supplier at different

rates, this information is maintained in SP table,
each row contains the part supplied, by a supplier and
at specific rate.
Get average rate for each part supplied.
Note: For each part there would be a group of rows,

the average rate for that part would be the average of
all values of RATE attribute in that group.
Select P#,Avg(RATE)
From SP
Group By P#
Use of Where and Group By
The use of Where clause restricts some rows from participating

in groups as specified by Group By clause
Get part number, total and maximum quantity for

parts excluding those supplied by supplier S1
Select P#,Sum(QTY),Max(QTY)
From SP
Where S# <> ‘S1’
Group By P#
Get part number and average rate supplied by suppliers

excluding those of type B
Select P#,Avg(RATE)
From SP
Where TYPE <> ‘B’
Group BY P#
Having
The Having clause restricts output of rows that result from the
Group By clause, the restriction is based on a condition that
usually includes an aggregate function.
Get part numbers for all such parts that are supplied
by more than one supplier
Select P#
From SP
Group By P#
The results of this query are all parts that are supplied,
we need to choose from this list only those that are
supplied by more than one supplier, this information
is available as an aggregate function i.e.Count(*)
Select P#
From SP
Group By P#
Having Count(*) > 1
Union
The Union operator performs a union of tuples from two tables,

the two tables are said to be union-compatible if the number of
columns are the same and column types are compatible.
Get part number of parts that either weigh more than

16 pounds or are supplied by supplier S2 or both.
Parts that weigh more than 16 pounds

UNION
Parts supplied by supplier S2
Select P#
From P
Where WEIGHT > 16
Union
Select P#
From SP
Where S# = ‘S2’
Note:
Union eliminates redundant duplicates
Union All retains duplicates
Union
The Output of the union can be ordered by the ORDER BY

clause, ORDER BY clause cannot appear in individual SQL
statements that participate in the Union.
Select P#,’Weight’
From P
Where WEIGHT > 16
Union All
Select P#,’Supplied’
From SP
Where S# = ‘S2’
Order By 2,1
Intersection
Intersection operator performs Intersection of tuples from two

tables, the output is common tuples from two tables.
Difference
Difference operator performs Difference of tuples from two

tables, the output is tuples from one table that are not there in
the other.
SQL Exercises
Schema
SAILORS (sid, sname, rating)
RES (sid, bid, date)
BOATS (bid, bname, color)
1. Find names of sailors who have reserved boat #2
2. Find names of sailors who have reserved a red boat
3. Find names of sailors with rating > 5
4. Find names of sailors who have reserved boats on 1/1/95
5. Find color of boats reserved by Ravi
6. Find names of sailors who have reserved at least one boat
7. Find names of sailors who have reserved all boats
8. Find names of sailors who have reserved red or green boats
9. Find names of sailors who have reserved red and green boats
10. Find names of sailors who have reserved boats reserved by Ravi
SQL Exercises
Schema
CUSTOMER (cno, cname, city, status, category)
PRODUCT (pno, pname, type)
SALES (cno, pno, date, qty, rate)
1. A sales report giving sales quantity for products of type ‘P01’,

report should have customer and product names,
report should be sorted on product type, customer name
2. A sales report giving total sales quantity for each product,

report should have product name
3. A sales report giving total sales value for each customer,

report should have customer name
4. A sales report giving citywise total sales value,

report should contain only type ‘P04’ products,
report should consider customer whose status is ‘SMP’ or ‘STP’
5. A sales report giving maximum & minimum sales quantity for

each product, report should contain product name
Embedded SQL
Introduction
Any SQL statement that is used as an online query can also be

used in an application program as an embedded statement.
This is known as dual-mode principle.
Embedded SQL statements are prefixed with EXEC SQL for

distinction. Executable SQL statements appear wherever an
executable host language statement can appear.
eg: Procedure division in Cobol program.
SQL statements include references to host variables, such

references are prefixed with colon ‘:’ to distinguish them from
column names.
Host variables must be declared in ‘DECLARE’ section.

Declaration of host variables must physically preceed use of
variable in SQL statement.
eg:
BEGIN DECLARE SECTION
...<host variable declaration>
END DECLARE SECTION

Introduction
In SQL statements, host variables can appear wherever a literal is

permitted. Host variables can also be used to place output from
SQL statement, they can thus be used as source and target for data
in SQL statements.
Any tables used in program can optionally be declared by means

of EXEC SQL DECLARE TABLE statement, this is to make the
program self-documentary.
After any SQL statement is executed, a feedback information

reflecting the outcome of execution is returned to the program
in two special host variables. They are
• SQLCODE - a 31 bit signed integer
• SQLSTATE - character string of length 5
In principle, every SQL statement should be followed by a test

on either SQLCODE or SQLSTATE.
A zero value in SQLCODE indicates successful completion,

a positive value means the statement executed but some
exceptional condition occurred, a negative value means the
statement did not execute successfully.
SQLSTATE is subdivided into a two character class code and

three character subclass code.
Introduction
A program can contain EXEC SQL INCLUDE SQLCA
This causes the precompiler to insert declaration of SQL

communication area, which contains declaration of SQLCODE
and SQLSTATE along with other feedback variables.
Host variables must have datatype compatibility with

SQL datatype of columns that they are to be compared or
assigned to or from.
SELECT statements usually retrieve multiple rows, and host

languages are not equipped to handle more than one row at a
time. It is therefore necessary to provide some kind of bridge
between set-at-a-time operations of SQL statements and
row-at-a-time operations that host language can handle.
Cursors provide such a bridge.
A cursor is a new kind of object relevant to embedded SQL,

interactive SQL has no need for it. It consists of a kind of a
pointer that is used to run thru a set of rows, thus providing
addressability to rows retrieved by SQL statement, one row at
a time.
SQL examples - Not involving Cursors
Singleton Select
EXEC SQL Select STATUS, CITY

Into :RANK, :CITY
From S
Where S# = :GIVENS# ;
EXEC SQL Select STATUS, CITY

Into :RANK Indicator :RANKIND, :CITY
From S
Where S# = :GIVENS# ;
If RANKIND = -1 Then
...<RANK has NULL value>
End-If
INSERT
EXEC SQL Insert

Into P (P#, PNAME, WEIGHT)
Values (:PNO, :PNAME, :PWT) ;
COLORIND = -1
CITYIND = -1
EXEC SQL Insert

Into P (P#, PNAME, COLOR, CITY)
Values (:PNO, :PNAME,
:PCOLOR Indicator :COLORIND,
:PCITY Indicator :CITYIND) ;
UPDATE
EXEC SQL Update S

Set STATUS = STATUS + :RAISE
Where CITY = ‘LONDON’ ;
RANKIND = -1
EXEC SQL Update S

Set STATUS = :RANK Indicator :RANKIND
EXEC SQL Update S

Set STATUS = NULL
DELETE
EXEC SQL Delete

From SP
Where :CITY = ( Select CITY
From S
Where S.S# = SP.S#) ;
SQL examples - Involving Cursors
The DECLARE X CURSOR ... statement defines a cursor called

X, and associates itself with a query specified by SELECT
statement, which is also a part of Cursor declaration. The query
is not executed at this point.
The SELECT statement is effectively executed when the cursor

is opened using current values of host variables in the procedural
part of the program.
The FETCH ... INTO statement is used to retrieve rows from the
result table, one row at a time. The INTO clause specifies a list
of host variables that match the SELECT clause declaration of
the cursor.
EXEC SQL Declare X Cursor For

Select S#, SNAME
From S
Where CITY = :Y ;
EXEC SQL Open X
For all rows accessible via X
EXEC SQL Fetch X Into :S#, :SNAME ;
EXEC SQL Close X ;
Since there are multiple rows in the result table, Fetch is placed in
a loop. A Fetch would result in SQLCODE as +100 if no more
rows exist in the result table, this condition is used to terminate
the loop. The cursor is finally closed by CLOSE statement.
Cursor Declaration
EXEC SQL Declare <cursor name> Cursor For

<union expression>
Order By <columns>
For [Fetch Only / Update of Columns]
Optimize For <n> Rows
Notes:
1. Union expression is a SELECT expression or a union of

SELECT expressions
2. DECLARE cursor is a declarative and not executable statement
3. ORDER BY cannot be specified if UPDATE or

DELETE CURRENT needs to be invoked
4. ORDER BY orders rows to be retrieved by FETCH statements
5. Specifying FOR FETCH ONLY performs better,

and is the default
6. OPTIMIZE may be specified purely for performance reasons,

it causes the optimizer to choose a more efficient access.
Executable Statements
EXEC SQL OPEN <cursor name>
•Opens or activites a specified cursor

•A set of rows are identifed and become active for the cursor
•Cursor also identifes a position within that set of rows
•Active set of rows is considered to have an order
EXEC SQL FETCH <cursor name> INTO :<host-var> ...
•identified cursor must be open

•advances cursor to next position
•assigns values from that row to host variables
•if there is no row then SQLCODE is +100
•fetch next is the only cursor movement operation
EXEC SQL CLOSE <cursor name>
•deactivates specified cursor, which is currently open

•closed cursor can be opened again
•when opened again, the active set of rows may be different
•values in host variables can be different
•changes to host variables while cursor is open is redundant
Executable Statements
Update & Delete
EXEC SQL Update <table name>

Set <column name> = <expr>,
<column name> = <expr> ...
Where Current of Cursor
EXEC SQL Delete

From <table name>
Where Current of Cursor
Update and Delete using cursor are not permissible if

cursor declaration involves Union or ORDER BY clause or
if union expression involves non-updatable view
Update should have FOR UPDATE clause identifying columns

that appear as targets of SET statement.
Relational Integrity
Relational Integrity Rules
Need for Integrity rules

•Any database consists of some configuration of data values
•That configuration is supposed to reflect real world situation
•Some configuration of values do not make sense
•These do not represent any possible state of real world
Rules as Database definition

•Database definition needs to be extended to include some rules
•rules inform DBMS of certain constraints in the real world
•rules can also prevent such impossible configuration of values
•Such rules are known as Integrity rules.
Since base tables are supposed to reflect reality, all Integrity rules
apply to base tables. Integrity rules are specific and general.
General Integrity rules are those that concern primary key and
foreign key. Specific Integrity rules are those concerning
domain & user defined integrity.
Notion of Primary key
Primary key
•is a unique identifier for a relation
•can be composite
•is a designated candidate key
•no component can be eliminated without destroying uniqueness
Notes
•Every relation has a primary key, moreso the base relations
•Reason for choosing primary key is outside the scope of model
•Important for primary key to be really significant
•There need not be an index on primary key
•Primary key is pre-requisite to foreign key support
•Provides tuple level addressing mechanism in relational system
Entity Integrity
Entity Integrity Rule

No component of primary key of base relation is allowed to be
NULLS.
Justification
Base relations correspond to real world and entities in real world

must be distinguishable. Hence their representatives in database
must also be distinguishable.
Null value for primary key implies that entity in database has
no full or partial identity. Primary key is supposed to perform a
unique identification function.
Null value imples ‘value is unknown’. Primary key with Null

implies that a tuple represents an identity we do not know. This
means that we do not know how many entities exist in the
database or real world.
An entity without identity does not exist.
Rule
In a relational model, we never record information about
something we cannot identify.
Foreign keys
In a table, a given value of an attribute should be permitted to

appear in the database if the same value also appears as a
primary key value of some other table in the database.
Foreign key value represents a reference to a tuple containing

a matching primary key value.
The relation containing foreign key is called referencing relation.

The relation containing the matching primary key is called the
referenced or target relation.
The problem ensuring that the database does not include any
invalid foreign key values is known as Referential Integrity
problem.
The foreign key is either wholly NULL or wholly NON NULL.
There should exist a base relation with primary key such that
each NON NULL foreign key value has a corresponding primary
key value.
Notes
1. Foreign key and primary key should be defined on the same

underlying domain.
2. Foreign keys need not be component of primary keys. Any

attribute can be a foreign key
3. A given relation can be referencing as well as referenced

relation.
4. A relation might include a foreign key where it’s values are

required to match the primary key value of the same relation.
Such relations are known as self-referencing relations.
5. Foreign keys, unlike primary keys, can have NULLS
6. Foreign key to primary key relationship is said to be the glue

that holds the database together.
Referential Integrity Rule
The database must not contain any unmatched foreign key values
An unmatched foreign key value is a NON NULL foreign key

value for which there does not exist a matching value of
primary key in the relevant target relation.
Note
1. Referential integrity requires foreign keys to match

primary keys
2. Foreign key and referential integrity are defined in terms of
each other
3. Support for referential integrity and support for foreign key
mean the same.
Foreign key rules
Referential integrity rule is framed purely in terms of database

states. Any state of database that does not satisfy the rule, is
by definition incorrect.
These incorrect states can be avoided as follows

- system could reject any operation that results in illegal state
- system accepts the operation and performs additional
compensating operation to guarantee that overall results are
a legal state
Some key issues
Can a foreign key accept NULLS ?
The answer to this question does not depend on the database

designer but the policies that are in effect in the real world.
What happens to an attempt to delete a target record of a

foreign key reference ?
Restricted - target record is not deleted if there are any referencing

records
Cascade - All referencing records are deleted
Nullifies - Foreign keys of all referencing records
are set to NULLS.
What happens on an attempt to update primary key of a target

of foreign key reference ?
Same as with Delete. Restrict, Cascade, Nullify

Primary key definition in DB2
Create Table SP
P# Char(6) Not Null,
QTY Integer,
Primary Key (S#, P#)
);
Foreign key definition in DB2
Create Table SP
P# Char(6) Not Null,
QTY Integer,
Primary Key (S#, P#),
Foreign Key SKF (S#) References S
On Delete Cascade,
Foreign Key PKF (P#) References P
On Delete Restrict
);
Views
Introduction
View is a named virtual table that is derived from a base table.

It does not exist in it’s own right, but appears to the user as if
it did.
Views do not have their own physical seperate, distinguishable

stored data. Instead their definition in terms of other tables is
stored in the catalog.
An example:
Create View GOOD_SUPPLIERS

As Select S#, STATUS, CITY
From S
Where STATUS > 15 ;
GOOD_SUPPLIERS is in effect a window into the real table S.
Further this window is dynamic in nature, changes to S would

automatically and instantaneously be visible thru that window.
Likewise changes to GOOD_SUPPLIERS would automatically
and instantaneously be applied to real table S.
Users can operate on GOOD_SUPPLIERS as if it were a real

table. The system handles the operation by converting it into an
equivalent operation on the underlying base table.
View Creation
Examples
Create View REDPARTS (P#, PNAME, WT, CITY)

As Select P#, PNAME, WEIGHT, CITY
From P
Where COLOR = ‘RED’ ;
Create View PQ (P#, TOTQTY)

As Select P#, Sum(QTY)
From SP
Group By P# ;
Create View SUPPLIER_PARTS (S#, P#, SNAME, PNAME)

As Select SP.S#, SP.P#, S.SNAME, P.PNAME
From S, SP, P ;
Create View LONDON_REDPARTS

As Select P#, WT
From REDPARTS

From S
Where STATUS > 15
With Check Option ;
Types of Views
Column subset Views
Create View SCITY

As Select S#, CITY
From S
Create View STATUS_CITY

As Select STATUS, CITY
From S
The above views are created such that they are vertical or
column subsets of base tables, hence they are called
column-subset views.
View SCITY includes the primary key of the base table, whereas
view STATUS_CITY does not.
For a given record, in the view STATUS_CITY, it would be

impossible to identify the corresponding record in the base table.
This is because the view does not include the primary key of the
base table.
Views that include the primary key of the base tables are known
as Key preserving views. Column subset views are theoretically
updatable if they preserve the primary key of the base table.
Types of Views
Row subset Views
Create View LONDON_SUPPLIERS

As Select S#, SNAME, STATUS, CITY
From S
The above view is created such that it is horizontal or row

subset of the base tables, hence it is called row-subset view.
The view LONDON_SUPPLIERS includes the primary key

of the base table, hence it is a Key preserving view.
Row subset views are theoretically updatable if they preserve

the primary key of the base table.
Types of Views
Join Views
Create View SUPPLIER_PART

As Select SP.S#, SP.P#, S.SNAME
From S,SP
Where S.S# = SP.S# ;
View SUPPLIER_PART is constructed from join of two tables,

these are known as colocated views. Colocated views suffer
from all kinds of problems from standpoint of updatability.
Statistical Summary
Create View PQ (P#, TOTQTY)

As Select P#, Sum(QTY)
From SP
Group By P# ;
The view PQ is constructed such that each row of the view is

a result of some aggregate function on a set of rows in the base
table.
Such a view cannot be updated as it would be impossible to

know how to distribute the updates in the rows of the base table.
View Updatability
Updatable views are those on which Insert, Delete and Update

operations can occur. Not all views are Updatable.
There are some views that are theoretically updatable but are
not updatable in SQL systems.
In general Join views cannot be updated, however there are some

views that are not joins which cannot be updated.
Check Option

From S
Where STATUS > 15 ;
The view is row-column subset, key preserving and updatable.

Would it be possible to insert a supplier with STATUS = 10,
the CHECK option is designed to deal with such situations.
During Insert and Update operations, the view is checked to

ensure that all Inserted and Updated rows satisfy the view
definition condition.
If Check option is not specified, all data would be accepted,

but some newly Inserted or Updated rows may disappear from
the view.
Logical Data Independence
Since application programs are not dependent on the physical

structure of stored database, DB2 provides physical data
independence.
If application programs are also independent of the logical

structure of database, the system is said to provide logical
data independence.
Logical structure of database can change due to two aspects,

Growth & Restructuring.
Growth
•Database grows to incorporate new kinds of information

•A table is expanded to include new fields
•The database is expanded to include a new table
•Growth does not affect application programs in DB2
Restructuring
•Database is restructured so that overall information remains same

•Placement of information within database changes
•Restructuring is undesirable, but unavoidable
Logical data Independence
An Example
A base table S(S#,SNAME,STATUS,CITY) is split into two

tables SX(S#,SNAME,CITY) and SY(S#,STATUS)
Application programs that referred to base table S would need to

be changed, since they would not have to refer to SX & SY
for any database operations.
The old table S can be reconstructed as a join of SX & SY.

Hence the view can substitute reference to old base table S after
it is split. Application programs that referred to base table S
would not refer to the view S, hence they need not undergo
change.
Create View S(S#,SNAME, STATUS,CITY)

As Select SX.S#,SX.SNAME,SY.STATUS,SX.CITY
From SX,SY
Where SX.S# = SY.S# ;
Having create the view S, Select operations would continue to

work as before, however Update operations would not work.
Although such a view is theoretically updatable, DB2 does not

allow updates on a view that is defined as a join.
Thus application programs performing update operations are not
immune to this type of change.
Advantages of Views
•Provide certain amount of logical data independence

•allow some data to be seen by different users in different ways
•simplifies user’ perception
•allows focus on data that is of concern and ignore the rest
•provides automatic security
SQL Access Guidelines
Never Use SELECT *
•Never ask DB2 anything more than required
•Query should access only those columns that are needed
•Changes to table structure may imply changes to program
Singleton SELECT verses Cursor

•Singleton Select outperforms Cursor
•When a row needs to be retrieved, cursor is preferred
•FOR UPDATE clause of cursor ensures integrity
•DB2 places an X lock prohibiting concurrent updates
Use FOR FETCH ONLY

•Enales DB2 to use block fetch
•Increases efficiency
Avoid using DISTINCT

•Distinct eliminates duplicates
•Invokes Sort to eliminate duplicates
•Code only when duplicate elimination is mandatory
Limit the SELECTed data

•Select should return minimal but required rows
•Do not code generic queries without WHERE clause
•More efficient to use WHERE clause to restrict retrieval
Code Predicates on Indexed columns
•Requests satisfied more efficiently using an existing index
•Not efficient when most rows in a table are to be accessed
Multi Column Indexes

•Used when high-level column is specified in WHERE clause
Several Indexes instead of multicolumn Index

•Multiple indexes more efficient than single multicolumn index
•Provide better overall performance for all queries
•At the expense of individual queries
Use ORDER BY when sequence is important

•DB2 doesn’t guarantee the order of rows returned
•Path of data retrieved may change from each execution
•ORDER BY is mandatory when sequence is important
Limit columns in ORDER BY clause

•For ORDER BY clause, DB2 invokes a Sort
•The more columns in ORDER BY, the less efficient
•Specify only those columns that are essential
Use Equivalent data types

•Essential when comparing column values to host variables
•Eliminates need for data conversion
•Index is not used if data types incompatible
Use Between instead of <= and >=
•Between more efficient than combination of <= and >=
•Optimizer selects a more efficient path
Use IN instead of LIKE

•For known list of data occurrences use IN
•IN with specific list is more efficient than LIKE
Formulate LIKE predicates with care

•Avoid % or _ at the begining of comparison string
•Avoid using LIKE with host variable
Avoid using NOT (except with EXISTS)

•Recode queries to avoid use of NOT
•By taking advantage of knowledge of data being accessed
Code most restrictive predicate first

•Place predicate that eliminates greatest number of rows first
Use Predicates wisely

•Reduce number of predicates
•Know your data to reduce predicates
Specify number of rows to be returned

•Code cursor statement with OPTIMIZE FOR n ROWS
•DB2 selects optimal path
•Does not prevent program from fetching more rows
Complex SQL Guidelines
UNION versus UNION ALL
•Union invokes a sort, UNION ALL does not
•Use UNION ALL when retrieved data does not have duplicates
Use NOT EXISTS instead of NOT IN

•NOT EXISTS verifies non existance
•NOT IN must have complete set of rows materialized
•Use NOT EXISTS for subquery using negation logic
Use constant for existance checking

•If subquery tests existance, specify constant in select-list
•Select-list of subquery is unimportant, will not return data
Predicate transitive closure rules

•DB2 optimizer uses the rule of transitivity, which states
•If a=b and b=c then a=c
•Efficient to code a redundant predicate to exploit transitivity
where a.col1 = b.col1 where a.col1 = b.col1
and a.col1 = :hostvar and a.col1 = :hostvar
and b.col1 = :hostvar
Minimize number of tables in a join

•Do not join more than 5 tables
•Eliminate unnecessary tables from join statements
Denormalize to reduce joining

•To minimize the need for joins, consider denormalizing
•Would imply redundancy, dual updating & usage of space
Reduce the number of rows to be joined
•number of rows participating in join determines response time
•to reduce join response time, reduce number of rows
•code predicates to minimize number of rows
Join using SQL instead of program logic

•efficient to join using SQL instead of application code
•optimizer has a vast array of tools of optimize performance
•application code would not consider equivalent optimizations
Use Joins instead of Subqueries

•Join is more efficient than a correlated subquery or using IN
Join on clustered column

•Use clustered columns in join criteria when possible
•reduces the need for immediate sorts
•might require clustering parent table by primary key
•and child table by foreign key
Join in Indexed columns

•efficient when tables are joined on indexed columns
•consider creating indexes for join predicates
Avoid Cartesian products

•Never use a join without a predicate
•A join without a predicate results in a cartesian product
•A lot of resources are spent on such a join
Provide adequate search criteria
•provide additional search criteria in WHERE clause
•these to be in addition to the join criteria
•provides DB2 an opportunity for ranking tables to be joined
•allows queries to perform adequately
Limit columns to be used in GROUP BY

•specify only minimum columns to be grouped
•DB2 needs to sort retrieved data
•the more columns, the more expensive is the sort
Increase the possibility of Stage 1 processing
A predicate that is satisfied by Stage 1 processing is evaluated

by the Data Manager portion of DB2 rather than the
Relational Data System. The Data Manager component is
at a closer level to data than the Relational System.
Stage 1 predicate is evaluated at earlier stage of data

retrieval, thereby avoiding the overhead of passing data
from one component to another.
The following list shows predicates satisfied by Stage 1
Colname operator value

Colname IS NULL
Colname BETWEEN val1 AND val2
Colname IN (list)
Colname LIKE pattern
Colname LIKE hostvariable
a.Colname operator b.Colname
The last item in list refers to two columns in different tables.

Predicates formulated with AND, OR, NOT are not at Stage 1
Increase the possibility of Index processing
A query that can use an index has more access path options,
so it has the capability of being a more efficient query than
the query that cannot use an index. DB2 optimizer can use
an index or indexes in a variety of ways to speed the retrieval
of data from DB2 tables.
The following list shows predicates satisfied by using Index
Colname operator value

Colname IS NULL
Colname BETWEEN val1 AND val2
Colname IN (list)
Colname LIKE pattern
Colname LIKE hostvariable
a.Colname operator b.Colname
The last item in list refers to two columns in different tables.

Predicates formulated with AND, OR, NOT are not Indexable

DB2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DB2

Uploaded by

Copyright:

Available Formats

Relational Model

•Introduced by Dr.E.F.Codd in 1970

•In Oct.85 Dr.E.F.Codd published a two part paper

•Is your DBMS really relational - Oct 14,1985

The implications were

•satisfying rules was a technically feasible proposition

A DBMS is said to be fully relational if it supports Codd’s 12

•Relation corresponds to a table, resembles files

•Relation is a set of tuples

Superkey - unique identifier for tuple

Domain defined Integrity

Guaranteed access Rule

Systematic treatment of NULL

Active Online Catalog

Comprehensive data sublanguage Rule

View updating Rule

High level Insert, Update, Delete

Physical Data Independence

Logical Data Independence

Non subversion Rule

•special purpose language for accessing & manipulating data

Data Definition Language (DDL)

Data Manipulation Language (DML)

Data Control Language (DCL)

•consists of a row of column headings

Create Table SCOPY Like S ;

•indexes are created and dropped using SQL

Create [Unique] Index X on T

Notes on Data definition

•data definition statements can be executed at any time

Select [All/Distinct] <scalar-expr>

Sample table definitions

Get part names of all parts

Retrieval with duplicate elimination

Get part numbers for all part supplied

Retrieval of computed values

Retrieval of full details

Get supplier numbers for suppliers in Bombay

Retrieval using ordering

Get supplier numbers and status for suppliers in

Order by 3rd column

Get parts whose weight is in the range of 16..19

Where WEIGHT Not Between 16 and 19

Get parts whose weight is any one of the following

Where WEIGHT NOT IN (12,16,17)

Since NULL is missing or inapplicable information,

Get supplier numbers for those suppliers for whom

Where STATUS Is Not Null

Each row of one table joined with every row of

If the join condition comprises of equality operator,

If the rows from a cartesian product are eliminated

From a cartesian product, choose common fields

Natural join on 3 tables

Join table with Self

Get employees with their manager names

Suppose suppliers supplying part P2 are S1,S2,S3,S4,

However we can get S# of suppliers supplying part P2

The IN clause of SQL requires a list of values,

This is known as subquery or nested query

Get supplier names for suppliers supplying at least

Get supplier number for suppliers who supply atleast

Subquery with scalar comparisons

Get supplier number for suppliers located in the same