You are on page 1of 80

Relational Model

Overview

•Introduced by Dr.E.F.Codd in 1970


•Model based on mathematical foundations
•Earlier models intrinsically tied to internal representations
•Developer had to be aware of navigational principles
•Need for a model to be divorced from physical organization
•Require independence between physical & logical model

The Model

•In Oct.85 Dr.E.F.Codd published a two part paper


•It introduced rules for Relational model
•Rules determined whether a product is fully relational or not

Two papers

•Is your DBMS really relational - Oct 14,1985


•Does your DBMS run by the rules - Oct 21,1985

The implications were

•satisfying rules was a technically feasible proposition


•practical benefits if system did satisfy rules

A DBMS is said to be fully relational if it supports Codd’s 12


rules, 9 Structural, 3 Integrity and 18 Manipulative features.
Basic Concepts

•Relation corresponds to a table, resembles files


•Rows of a relation are called Tuples, resemble records
•Columns of a relation are called attributes, resemble fields
•Entity and relationships are both represented as relations
•A relation consists of same kind of tuples
•Structure of a relation is defined by Scheme (definition)
•An instance of a scheme is called Relational instance

Properties of a Relation

•Relation is a set of tuples


•No two tuples in a relation are identical
•Tuples in a relation have no order among themselves
•Attribute values are atomic
•Attribute values map onto a domain

Notion of Keys

Superkey - unique identifier for tuple


Candidate key - minimal superkey
Primary key - Designated candidate key
A Relational model consists of

Structural part
relations, domains, etc.

Integrity part
entity, referential, domain/user defined

Manipulative part
operators & extensions
Structural Features

Relations
Mathematical entity to hold data in the relational model.
A set of n-tuple, perceived as a two dimensional table
where an intersection of a row and a column is a atomic value.
Base tables
A named and autonomous relation, one that actually holds
data.
Query tables
Relations that result from execution of queries, not named and
do not have persistent existance.
View tables
A named, virtual and derived relation, that is defined in terms
of other named relations.
Snapshot tables
A named, derived and real relation, represented in terms of
other relations and also by it’s own materialized data.
Attributes
Correspond to columns of the table, all attribute values are of
the same type and atomic in nature.
Domains
All possible values from which attribute values are chosen.
Primary key
A set of attributes, whose value is a minimal unique identifier
to a row of the table. A designated candidate key.
Foreign keys
A set of attributes, whose value is the the primary key of
another table
Integrity Features

Entity Integrity
No component of primary key of base relation is allowed
to be NULLS

Referential Integrity
The database must not contain any unmatched foreign key
values

Domain defined Integrity


Attribute values should be those within the domain that it
is mapped onto

Manipulative Features

•Restrict
•Project
•Cross product
•Union, Intersection, Difference
•Join
•Divide
•Extensions
Twelve Rules

Information Rule
All information be represented in one and only one way,
i.e values in column positions within rows of tables.

Guaranteed access Rule


Every individual scalar value in database must be logically
addressable by specifying table name, column name and
primary key value.

Systematic treatment of NULL


Support representation of missing and inapplicable information
that is systematic and distinct from all regular values.

Active Online Catalog


Support for online, relational catalog accessible to authorised
users by means of regular query language.

Comprehensive data sublanguage Rule


Supports one relational language that has linear syntax,
that is used interactively and within application programs,
that supports data definition, manipulation, security, integrity
and transaction management operations.

View updating Rule


All views that are technically updatable must be updatable by
the system.
Twelve Rules

High level Insert, Update, Delete


Support for set-at-a-time Insert, Update, Delete operations.

Physical Data Independence


Changes at Internal level do not affect Conceptual level.

Logical Data Independence


Changes at Conceptual level do not affect External level.

Integrity Independence
Integrity constraints specified seperately from application
programs, they are stored in catalog, it is possible to change
integrity constraints without affecting existing code.

Distributive Independence
Existing applications should operate sucessfully when
distributed version of DBMS is first introduced and when
existing data is redistributed around the system.

Non subversion Rule


If the system provides low-level record-at-a-time interface,
then that interface cannot subvert the system, thereby bypassing
relational security or integrity constraints.
SQL
SQL

•special purpose language for accessing & manipulating data


•different from application progamming languages like C,Cobol
•uses a combination of relational algebra & calculus constructs

History
1970 - Dr.Codd proposed Relational model
1971-79 - SEQUEL implemented in System R
1980 - SEQUEL became SQL
1986 - SQL 86 (ANSI standard)
1989 - Follow on to SQL-86 - SQL-89
1992 - SQL-2
1995 - SQL-3
SQL components

Data Definition Language (DDL)


•specifies database schema
•creates, modifies, deletes database objects (tables, view, index)

Data Manipulation Language (DML)


•manipulates data using Insert, Modify, Delete operations
•accesses data from database for queries

Data Control Language (DCL)


•grants and revokes authorisation for database access
•audits database use
•provides transaction management
Data Definition
Base tables

•consists of a row of column headings


•zero or more rows of data values
•each data row contains one scalar value for each column
•all values in a column are of same data type
•row ordering is irrelevant
•order is imposed when rows are retrieved
•columns are considered to be ordered from left to right
•column ordering has a significance
•rows and columns do have a physical ordering as stored version
•physical row and column ordering is transparent to user
•base table is autonomous, it exists in it’s own right
Table Creation

Create Table S
( S# Char(5) Not Null,
SNAME Char(20) Not Null
With Default,
STATUS Smallint Not Null
With Default,
CITY Char(15) Not Null
With Default,
Primary Key (S#)
);

Create Table SCOPY Like S ;

Table Modification

Alter Table S
Add DISCOUNT SmallInt ;

Table Deletion

Drop Table S ;
Index

•indexes are created and dropped using SQL


•data manipulation statements do not refer to indexes at all
•decision to use or not to use index is made by DB2

Index Creation

Create [Unique] Index X on T


(P,Q Desc, R) ;

Index Deletion

Drop Index X ;

Notes on Data definition

•data definition statements can be executed at any time


•possible to create a few tables and start using them
•subsequently new columns could be added
•possible to experiment with effects of indexes
•permits one not to get everything right the first time
Data Manipulation
Select Statement reference

Select [All/Distinct] <scalar-expr>


From <table-names>
Where <condition-expr>
Group By <columns>
Having <condition-expr>
Order By <columns>

Sample table definitions

1. Supplier S (S#,SNAME,CITY,STATUS)

2. Part P (P#,PNAME,COLOR,WEIGHT)

3. Supp-Part SP (S#,P#)
Simple retrieval

Get part names of all parts

Select PNAME
From P

Retrieval with duplicate elimination

Get part numbers for all part supplied

Select Distinct P#
From SP

Retrieval of computed values

Select P#,’Height’,HEIGHT*250
From P

Retrieval of full details

Select *
From S
Qualified Retrieval

Get supplier numbers for suppliers in Bombay


with status above 20

Select S#
From S
Where CITY = ‘Bombay’ and
STATUS > 20

Retrieval using ordering

Get supplier numbers and status for suppliers in


Bombay in descending order of status

Select S#,STATUS
From S
Where CITY = ‘Bombay’
Order By STATUS desc

Order by 3rd column

Select P#,’Height’,’HEIGHT*250
From P
Order By 3,P#
Retrieval using Range of values

Get parts whose weight is in the range of 16..19


both limit values inclusive

Normal way

Select P#,PNAME
From P
Where WEIGHT >= 16 and
WEIGHT <= 19

Using BETWEEN

Select P#,PNAME
From P
Where WEIGHT Between 16 and 19

Similarly

Where WEIGHT Not Between 16 and 19


Retrieval using IN

Get parts whose weight is any one of the following


values 12,16,17

Normal way

Select P#,PNAME
From P
Where WEIGHT = 12 or
WEIGHT = 16 or
WEIGHT = 17

Using IN

Select P#,PNAME
From P
Where WEIGHT IN (12,16,17)

Similarly

Where WEIGHT NOT IN (12,16,17)


Retrieval using NULL

Since NULL is missing or inapplicable information,


normal comparisons won’t work

Get supplier numbers for those suppliers for whom


STATUS is inapplicable

Select S#
From S
Where STATUS Is Null

Similarly

Where STATUS Is Not Null


Cartesian product

Each row of one table joined with every row of


the other table

Select S.*,P.*
From S,P

Equi Join

If the join condition comprises of equality operator,


then the join is known as Equi Join

Select S.*,P.*
From S,P
Where S.CITY = P.CITY

Theta Join

If the rows from a cartesian product are eliminated


by restrict operation on the basis of any condition,
then the join is known as Theta Join.

Select S.*,P.*
From P,SP
Where P.P# = SP.P# And
P.RATE < SP.RATE
Natural Join

From a cartesian product, choose common fields


and compare their values for equality, finally
one of the common fields is eliminated from the
projection

Select S.*,SP.*
From S,SP
Where S.S# = SP.S#

Natural join on 3 tables

Select S.*,SP.*,P.*
From S,SP,P
Where S.S# = SP.S# And
P.P# = SP.P#

Join table with Self

Get employees with their manager names

Select First.E#,First.ENAME,First.M#,
Second.ENAME
From E First, E Second
Where First.M# = Second.E#
Simple Subquery

Suppose suppliers supplying part P2 are S1,S2,S3,S4,


then the query to get supplier names supplying part P2
could be as follows.

Select SNAME
From S
Where S# In (‘S1’,’S2’,’S3’,’S4’)

However we can get S# of suppliers supplying part P2


from the database by the following query

Select S#
From SP
Where P# = ‘P2’

The IN clause of SQL requires a list of values,


and a query with one attribute is also a list of values,
hence can be substituted in the IN clause

Select SNAME
From S
Where S# In ( Select S#
From SP
Where P# = ‘P2’)

This is known as subquery or nested query


Query with multiple nesting

Get supplier names for suppliers supplying at least


one red part

Select SNAME
From S
Where S# In (List of suppliers supplying red part)

Select SNAME
From S
Where S# In ( Select S#
From SP
Where P# In (List of red parts))

Seleet SNAME
From S
Where S# In ( Select S#
From SP
Where P# In ( Select P#
From P
Where COLOR =
‘RED’))
Subquery & Outer query referring to same table

Get supplier number for suppliers who supply atleast


one part supplied by supplier S2

Select Distinct S#
From SP
Where P# In (List of parts supplied by supplier S2)

Select Distinct S#
From SP
Where P# In ( Select P#
From SP
Where S# = ‘S2’)

Subquery with scalar comparisons

Get supplier number for suppliers located in the same


city as supplier S1

Select S#
From S
Where CITY = (City of supplier S2)

Select S#
From S
Where CITY = ( Select CITY
From S
Where S# = ‘S1’)
Query using EXISTS

EXISTS is always associated with a subquery.

EXISTS tests whether the results of an suquery


return zero rows or non zero rows.

EXISTS with a subquery is used as a condition in the


Where clause of the outer query

If the subquery returns one or more rows the


EXISTS condition is TRUE, otherwise it is FALSE

Similarly NOT EXISTS is negation if EXISTS


Query using EXISTS

Get supplier names for suppliers who supply part P2

Select SNAME
From S
Where Exists (
List of suppliers where S# is the same as that
of the outer query and P# is ‘P2’)

Select SNAME
From S
Where Exists ( Select *
From SP
Where S# = S.S# And
P# = ‘P2’)

Example with NOT EXISTS

Get supplier names for suppliers who do not supply


part P2

Select SNAME
From S
Where Not Exists ( Select *
From SP
Where S# = S.S# And
P# = ‘P2’)
Quantified comparisons

Get part names for parts whose height is greater than


every blue part

List of heights of blue parts

Select HEIGHT
From P
Where COLOR = ‘Blue’

Part names for required parts

Select PNAME
From P
Where HEIGHT > ALL
(List of heights of blue parts)

Select PNAME
From P
Where HEIGHT > ALL
( Select HEIGHT
From P
Where COLOR = ‘Blue’)

Similarly

Where HEIGHT > ANY


( Subquery )
Aggregate Functions

Aggregate functions operate on a collection of scalar values of


one column of a table to produce a single scalar value defined
as it’s result

Some aggregate functions are as follows:

COUNT - number of rows


SUM - sum of values
AVG - average of values
MAX - largest/maximum value
MIN - smallest/minimum value

Examples

Get total number of suppliers

Select Count(*)
From S

Get total suppliers supplying parts

Select Count(Distinct S#)


From SP
Examples

Get number of shipments for part P2

Select Count(*)
From SP
Where P# = ‘P2’

Get total quantity of part P2 supplied

Select Sum(QTY)
From SP
Where P# = ‘P2’

Get average quantity of part P3 supplied

Select Sum(QTY)/Count(*)
From SP
Where P# = ‘P3’

Or

Select Avg(QTY)
From SP
Where P# = ‘P3’
Aggregate functions in subquery

Get supplier numbers of suppliers with status less than


maximum status

Select S#
From S
Where STATUS < (Maximum status)

Select S#
From S
Where STATUS < ( Select Max(STATUS)
From S)

Get supplier number and city for all suppliers whose


status is greater than or equal to the average status
of their city

Select S#,STATUS,CITY
From S
Where STATUS >= (Average of their city)

Select S#,STATUS,CITY
From S SX
Where STATUS >= ( Select Avg(STATUS)
From S SY
Where SY.CITY = SX.CITY
)
Group By, Having

Group By statement
•rearranges rows into groups
•on the basis of Group By attributes
•such that each group has same value for Group By attributes
•expressions in Select should be single valued for the group
•such as Aggregate functions or Group By attributes

Example

A part is supplied by more than one supplier at different


rates, this information is maintained in SP table,
each row contains the part supplied, by a supplier and
at specific rate.

Get average rate for each part supplied.

Note: For each part there would be a group of rows,


the average rate for that part would be the average of
all values of RATE attribute in that group.

Select P#,Avg(RATE)
From SP
Group By P#
Use of Where and Group By

The use of Where clause restricts some rows from participating


in groups as specified by Group By clause

Get part number, total and maximum quantity for


parts excluding those supplied by supplier S1

Select P#,Sum(QTY),Max(QTY)
From SP
Where S# <> ‘S1’
Group By P#

Get part number and average rate supplied by suppliers


excluding those of type B

Select P#,Avg(RATE)
From SP
Where TYPE <> ‘B’
Group BY P#
Having

The Having clause restricts output of rows that result from the
Group By clause, the restriction is based on a condition that
usually includes an aggregate function.

Get part numbers for all such parts that are supplied
by more than one supplier

Select P#
From SP
Group By P#

The results of this query are all parts that are supplied,
we need to choose from this list only those that are
supplied by more than one supplier, this information
is available as an aggregate function i.e.Count(*)

Select P#
From SP
Group By P#
Having Count(*) > 1
Union

The Union operator performs a union of tuples from two tables,


the two tables are said to be union-compatible if the number of
columns are the same and column types are compatible.

Get part number of parts that either weigh more than


16 pounds or are supplied by supplier S2 or both.

Parts that weigh more than 16 pounds


UNION
Parts supplied by supplier S2

Select P#
From P
Where WEIGHT > 16
Union
Select P#
From SP
Where S# = ‘S2’

Note:
Union eliminates redundant duplicates
Union All retains duplicates
Union

The Output of the union can be ordered by the ORDER BY


clause, ORDER BY clause cannot appear in individual SQL
statements that participate in the Union.

Select P#,’Weight’
From P
Where WEIGHT > 16

Union All

Select P#,’Supplied’
From SP
Where S# = ‘S2’

Order By 2,1

Intersection

Intersection operator performs Intersection of tuples from two


tables, the output is common tuples from two tables.

Difference

Difference operator performs Difference of tuples from two


tables, the output is tuples from one table that are not there in
the other.
SQL Exercises

Schema
SAILORS (sid, sname, rating)
RES (sid, bid, date)
BOATS (bid, bname, color)

1. Find names of sailors who have reserved boat #2

2. Find names of sailors who have reserved a red boat

3. Find names of sailors with rating > 5

4. Find names of sailors who have reserved boats on 1/1/95

5. Find color of boats reserved by Ravi

6. Find names of sailors who have reserved at least one boat

7. Find names of sailors who have reserved all boats

8. Find names of sailors who have reserved red or green boats

9. Find names of sailors who have reserved red and green boats

10. Find names of sailors who have reserved boats reserved by Ravi
SQL Exercises

Schema
CUSTOMER (cno, cname, city, status, category)
PRODUCT (pno, pname, type)
SALES (cno, pno, date, qty, rate)

1. A sales report giving sales quantity for products of type ‘P01’,


report should have customer and product names,
report should be sorted on product type, customer name

2. A sales report giving total sales quantity for each product,


report should have product name

3. A sales report giving total sales value for each customer,


report should have customer name

4. A sales report giving citywise total sales value,


report should contain only type ‘P04’ products,
report should consider customer whose status is ‘SMP’ or ‘STP’

5. A sales report giving maximum & minimum sales quantity for


each product, report should contain product name
Embedded SQL
Introduction

Any SQL statement that is used as an online query can also be


used in an application program as an embedded statement.
This is known as dual-mode principle.

Embedded SQL statements are prefixed with EXEC SQL for


distinction. Executable SQL statements appear wherever an
executable host language statement can appear.
eg: Procedure division in Cobol program.

SQL statements include references to host variables, such


references are prefixed with colon ‘:’ to distinguish them from
column names.

Host variables must be declared in ‘DECLARE’ section.


Declaration of host variables must physically preceed use of
variable in SQL statement.

eg:

BEGIN DECLARE SECTION

...<host variable declaration>

END DECLARE SECTION


Introduction

In SQL statements, host variables can appear wherever a literal is


permitted. Host variables can also be used to place output from
SQL statement, they can thus be used as source and target for data
in SQL statements.

Any tables used in program can optionally be declared by means


of EXEC SQL DECLARE TABLE statement, this is to make the
program self-documentary.

After any SQL statement is executed, a feedback information


reflecting the outcome of execution is returned to the program
in two special host variables. They are
• SQLCODE - a 31 bit signed integer
• SQLSTATE - character string of length 5

In principle, every SQL statement should be followed by a test


on either SQLCODE or SQLSTATE.

A zero value in SQLCODE indicates successful completion,


a positive value means the statement executed but some
exceptional condition occurred, a negative value means the
statement did not execute successfully.

SQLSTATE is subdivided into a two character class code and


three character subclass code.
Introduction

A program can contain EXEC SQL INCLUDE SQLCA

This causes the precompiler to insert declaration of SQL


communication area, which contains declaration of SQLCODE
and SQLSTATE along with other feedback variables.

Host variables must have datatype compatibility with


SQL datatype of columns that they are to be compared or
assigned to or from.

SELECT statements usually retrieve multiple rows, and host


languages are not equipped to handle more than one row at a
time. It is therefore necessary to provide some kind of bridge
between set-at-a-time operations of SQL statements and
row-at-a-time operations that host language can handle.

Cursors provide such a bridge.

A cursor is a new kind of object relevant to embedded SQL,


interactive SQL has no need for it. It consists of a kind of a
pointer that is used to run thru a set of rows, thus providing
addressability to rows retrieved by SQL statement, one row at
a time.
SQL examples - Not involving Cursors

Singleton Select

EXEC SQL Select STATUS, CITY


Into :RANK, :CITY
From S
Where S# = :GIVENS# ;

EXEC SQL Select STATUS, CITY


Into :RANK Indicator :RANKIND, :CITY
From S
Where S# = :GIVENS# ;

If RANKIND = -1 Then
...<RANK has NULL value>
End-If
SQL examples - Not involving Cursors

INSERT

EXEC SQL Insert


Into P (P#, PNAME, WEIGHT)
Values (:PNO, :PNAME, :PWT) ;

COLORIND = -1
CITYIND = -1

EXEC SQL Insert


Into P (P#, PNAME, COLOR, CITY)
Values (:PNO, :PNAME,
:PCOLOR Indicator :COLORIND,
:PCITY Indicator :CITYIND) ;
SQL examples - Not involving Cursors

UPDATE

EXEC SQL Update S


Set STATUS = STATUS + :RAISE
Where CITY = ‘LONDON’ ;

RANKIND = -1

EXEC SQL Update S


Set STATUS = :RANK Indicator :RANKIND
Where CITY = ‘LONDON’ ;

EXEC SQL Update S


Set STATUS = NULL
Where CITY = ‘LONDON’ ;

DELETE

EXEC SQL Delete


From SP
Where :CITY = ( Select CITY
From S
Where S.S# = SP.S#) ;
SQL examples - Involving Cursors

The DECLARE X CURSOR ... statement defines a cursor called


X, and associates itself with a query specified by SELECT
statement, which is also a part of Cursor declaration. The query
is not executed at this point.

The SELECT statement is effectively executed when the cursor


is opened using current values of host variables in the procedural
part of the program.

The FETCH ... INTO statement is used to retrieve rows from the
result table, one row at a time. The INTO clause specifies a list
of host variables that match the SELECT clause declaration of
the cursor.

EXEC SQL Declare X Cursor For


Select S#, SNAME
From S
Where CITY = :Y ;
EXEC SQL Open X
For all rows accessible via X
EXEC SQL Fetch X Into :S#, :SNAME ;
EXEC SQL Close X ;

Since there are multiple rows in the result table, Fetch is placed in
a loop. A Fetch would result in SQLCODE as +100 if no more
rows exist in the result table, this condition is used to terminate
the loop. The cursor is finally closed by CLOSE statement.
Cursor Declaration

EXEC SQL Declare <cursor name> Cursor For


<union expression>
Order By <columns>
For [Fetch Only / Update of Columns]
Optimize For <n> Rows

Notes:

1. Union expression is a SELECT expression or a union of


SELECT expressions

2. DECLARE cursor is a declarative and not executable statement

3. ORDER BY cannot be specified if UPDATE or


DELETE CURRENT needs to be invoked

4. ORDER BY orders rows to be retrieved by FETCH statements

5. Specifying FOR FETCH ONLY performs better,


and is the default

6. OPTIMIZE may be specified purely for performance reasons,


it causes the optimizer to choose a more efficient access.
Executable Statements

EXEC SQL OPEN <cursor name>

•Opens or activites a specified cursor


•A set of rows are identifed and become active for the cursor
•Cursor also identifes a position within that set of rows
•Active set of rows is considered to have an order

EXEC SQL FETCH <cursor name> INTO :<host-var> ...

•identified cursor must be open


•advances cursor to next position
•assigns values from that row to host variables
•if there is no row then SQLCODE is +100
•fetch next is the only cursor movement operation

EXEC SQL CLOSE <cursor name>

•deactivates specified cursor, which is currently open


•closed cursor can be opened again
•when opened again, the active set of rows may be different
•values in host variables can be different
•changes to host variables while cursor is open is redundant
Executable Statements

Update & Delete

EXEC SQL Update <table name>


Set <column name> = <expr>,
<column name> = <expr> ...
Where Current of Cursor

EXEC SQL Delete


From <table name>
Where Current of Cursor

Update and Delete using cursor are not permissible if


cursor declaration involves Union or ORDER BY clause or
if union expression involves non-updatable view

Update should have FOR UPDATE clause identifying columns


that appear as targets of SET statement.
Relational Integrity
Relational Integrity Rules

Need for Integrity rules


•Any database consists of some configuration of data values
•That configuration is supposed to reflect real world situation
•Some configuration of values do not make sense
•These do not represent any possible state of real world

Rules as Database definition


•Database definition needs to be extended to include some rules
•rules inform DBMS of certain constraints in the real world
•rules can also prevent such impossible configuration of values
•Such rules are known as Integrity rules.

Since base tables are supposed to reflect reality, all Integrity rules
apply to base tables. Integrity rules are specific and general.
General Integrity rules are those that concern primary key and
foreign key. Specific Integrity rules are those concerning
domain & user defined integrity.
Notion of Primary key

Primary key
•is a unique identifier for a relation
•can be composite
•is a designated candidate key
•no component can be eliminated without destroying uniqueness

Notes
•Every relation has a primary key, moreso the base relations
•Reason for choosing primary key is outside the scope of model
•Important for primary key to be really significant
•There need not be an index on primary key
•Primary key is pre-requisite to foreign key support
•Provides tuple level addressing mechanism in relational system
Entity Integrity

Entity Integrity Rule


No component of primary key of base relation is allowed to be
NULLS.

Justification

Base relations correspond to real world and entities in real world


must be distinguishable. Hence their representatives in database
must also be distinguishable.

Null value for primary key implies that entity in database has
no full or partial identity. Primary key is supposed to perform a
unique identification function.

Null value imples ‘value is unknown’. Primary key with Null


implies that a tuple represents an identity we do not know. This
means that we do not know how many entities exist in the
database or real world.

An entity without identity does not exist.

Rule
In a relational model, we never record information about
something we cannot identify.
Foreign keys

In a table, a given value of an attribute should be permitted to


appear in the database if the same value also appears as a
primary key value of some other table in the database.

Foreign key value represents a reference to a tuple containing


a matching primary key value.

The relation containing foreign key is called referencing relation.


The relation containing the matching primary key is called the
referenced or target relation.

The problem ensuring that the database does not include any
invalid foreign key values is known as Referential Integrity
problem.

The foreign key is either wholly NULL or wholly NON NULL.

There should exist a base relation with primary key such that
each NON NULL foreign key value has a corresponding primary
key value.
Notes

1. Foreign key and primary key should be defined on the same


underlying domain.

2. Foreign keys need not be component of primary keys. Any


attribute can be a foreign key

3. A given relation can be referencing as well as referenced


relation.

4. A relation might include a foreign key where it’s values are


required to match the primary key value of the same relation.
Such relations are known as self-referencing relations.

5. Foreign keys, unlike primary keys, can have NULLS

6. Foreign key to primary key relationship is said to be the glue


that holds the database together.
Referential Integrity Rule

The database must not contain any unmatched foreign key values

An unmatched foreign key value is a NON NULL foreign key


value for which there does not exist a matching value of
primary key in the relevant target relation.

Note

1. Referential integrity requires foreign keys to match


primary keys
2. Foreign key and referential integrity are defined in terms of
each other
3. Support for referential integrity and support for foreign key
mean the same.

Foreign key rules

Referential integrity rule is framed purely in terms of database


states. Any state of database that does not satisfy the rule, is
by definition incorrect.

These incorrect states can be avoided as follows


- system could reject any operation that results in illegal state
- system accepts the operation and performs additional
compensating operation to guarantee that overall results are
a legal state
Some key issues

Can a foreign key accept NULLS ?

The answer to this question does not depend on the database


designer but the policies that are in effect in the real world.

What happens to an attempt to delete a target record of a


foreign key reference ?

Restricted - target record is not deleted if there are any referencing


records
Cascade - All referencing records are deleted
Nullifies - Foreign keys of all referencing records
are set to NULLS.

What happens on an attempt to update primary key of a target


of foreign key reference ?

Same as with Delete. Restrict, Cascade, Nullify


Primary key definition in DB2

Create Table SP
( S# Char(5) Not Null,
P# Char(6) Not Null,
QTY Integer,
Primary Key (S#, P#)
);

Foreign key definition in DB2

Create Table SP
( S# Char(5) Not Null,
P# Char(6) Not Null,
QTY Integer,
Primary Key (S#, P#),
Foreign Key SKF (S#) References S
On Delete Cascade,
Foreign Key PKF (P#) References P
On Delete Restrict
);
Views
Introduction

View is a named virtual table that is derived from a base table.


It does not exist in it’s own right, but appears to the user as if
it did.

Views do not have their own physical seperate, distinguishable


stored data. Instead their definition in terms of other tables is
stored in the catalog.

An example:

Create View GOOD_SUPPLIERS


As Select S#, STATUS, CITY
From S
Where STATUS > 15 ;

GOOD_SUPPLIERS is in effect a window into the real table S.

Further this window is dynamic in nature, changes to S would


automatically and instantaneously be visible thru that window.
Likewise changes to GOOD_SUPPLIERS would automatically
and instantaneously be applied to real table S.

Users can operate on GOOD_SUPPLIERS as if it were a real


table. The system handles the operation by converting it into an
equivalent operation on the underlying base table.
View Creation

Examples

Create View REDPARTS (P#, PNAME, WT, CITY)


As Select P#, PNAME, WEIGHT, CITY
From P
Where COLOR = ‘RED’ ;

Create View PQ (P#, TOTQTY)


As Select P#, Sum(QTY)
From SP
Group By P# ;

Create View SUPPLIER_PARTS (S#, P#, SNAME, PNAME)


As Select SP.S#, SP.P#, S.SNAME, P.PNAME
From S, SP, P ;

Create View LONDON_REDPARTS


As Select P#, WT
From REDPARTS
Where CITY = ‘LONDON’ ;

Create View GOOD_SUPPLIERS


As Select S#, STATUS, CITY
From S
Where STATUS > 15
With Check Option ;
Types of Views

Column subset Views

Create View SCITY


As Select S#, CITY
From S

Create View STATUS_CITY


As Select STATUS, CITY
From S

The above views are created such that they are vertical or
column subsets of base tables, hence they are called
column-subset views.

View SCITY includes the primary key of the base table, whereas
view STATUS_CITY does not.

For a given record, in the view STATUS_CITY, it would be


impossible to identify the corresponding record in the base table.
This is because the view does not include the primary key of the
base table.

Views that include the primary key of the base tables are known
as Key preserving views. Column subset views are theoretically
updatable if they preserve the primary key of the base table.
Types of Views

Row subset Views

Create View LONDON_SUPPLIERS


As Select S#, SNAME, STATUS, CITY
From S
Where CITY = ‘LONDON’ ;

The above view is created such that it is horizontal or row


subset of the base tables, hence it is called row-subset view.

The view LONDON_SUPPLIERS includes the primary key


of the base table, hence it is a Key preserving view.

Row subset views are theoretically updatable if they preserve


the primary key of the base table.
Types of Views

Join Views

Create View SUPPLIER_PART


As Select SP.S#, SP.P#, S.SNAME
From S,SP
Where S.S# = SP.S# ;

View SUPPLIER_PART is constructed from join of two tables,


these are known as colocated views. Colocated views suffer
from all kinds of problems from standpoint of updatability.

Statistical Summary

Create View PQ (P#, TOTQTY)


As Select P#, Sum(QTY)
From SP
Group By P# ;

The view PQ is constructed such that each row of the view is


a result of some aggregate function on a set of rows in the base
table.

Such a view cannot be updated as it would be impossible to


know how to distribute the updates in the rows of the base table.
View Updatability

Updatable views are those on which Insert, Delete and Update


operations can occur. Not all views are Updatable.

There are some views that are theoretically updatable but are
not updatable in SQL systems.

In general Join views cannot be updated, however there are some


views that are not joins which cannot be updated.

Check Option

Create View GOOD_SUPPLIERS


As Select S#, STATUS, CITY
From S
Where STATUS > 15 ;

The view is row-column subset, key preserving and updatable.


Would it be possible to insert a supplier with STATUS = 10,
the CHECK option is designed to deal with such situations.

During Insert and Update operations, the view is checked to


ensure that all Inserted and Updated rows satisfy the view
definition condition.

If Check option is not specified, all data would be accepted,


but some newly Inserted or Updated rows may disappear from
the view.
Logical Data Independence

Since application programs are not dependent on the physical


structure of stored database, DB2 provides physical data
independence.

If application programs are also independent of the logical


structure of database, the system is said to provide logical
data independence.

Logical structure of database can change due to two aspects,


Growth & Restructuring.

Growth

•Database grows to incorporate new kinds of information


•A table is expanded to include new fields
•The database is expanded to include a new table
•Growth does not affect application programs in DB2

Restructuring

•Database is restructured so that overall information remains same


•Placement of information within database changes
•Restructuring is undesirable, but unavoidable
Logical data Independence

An Example

A base table S(S#,SNAME,STATUS,CITY) is split into two


tables SX(S#,SNAME,CITY) and SY(S#,STATUS)

Application programs that referred to base table S would need to


be changed, since they would not have to refer to SX & SY
for any database operations.

The old table S can be reconstructed as a join of SX & SY.


Hence the view can substitute reference to old base table S after
it is split. Application programs that referred to base table S
would not refer to the view S, hence they need not undergo
change.

Create View S(S#,SNAME, STATUS,CITY)


As Select SX.S#,SX.SNAME,SY.STATUS,SX.CITY
From SX,SY
Where SX.S# = SY.S# ;

Having create the view S, Select operations would continue to


work as before, however Update operations would not work.

Although such a view is theoretically updatable, DB2 does not


allow updates on a view that is defined as a join.
Thus application programs performing update operations are not
immune to this type of change.
Advantages of Views

•Provide certain amount of logical data independence


•allow some data to be seen by different users in different ways
•simplifies user’ perception
•allows focus on data that is of concern and ignore the rest
•provides automatic security
SQL Access Guidelines
Never Use SELECT *
•Never ask DB2 anything more than required
•Query should access only those columns that are needed
•Changes to table structure may imply changes to program

Singleton SELECT verses Cursor


•Singleton Select outperforms Cursor
•When a row needs to be retrieved, cursor is preferred
•FOR UPDATE clause of cursor ensures integrity
•DB2 places an X lock prohibiting concurrent updates

Use FOR FETCH ONLY


•Enales DB2 to use block fetch
•Increases efficiency

Avoid using DISTINCT


•Distinct eliminates duplicates
•Invokes Sort to eliminate duplicates
•Code only when duplicate elimination is mandatory

Limit the SELECTed data


•Select should return minimal but required rows
•Do not code generic queries without WHERE clause
•More efficient to use WHERE clause to restrict retrieval
Code Predicates on Indexed columns
•Requests satisfied more efficiently using an existing index
•Not efficient when most rows in a table are to be accessed

Multi Column Indexes


•Used when high-level column is specified in WHERE clause

Several Indexes instead of multicolumn Index


•Multiple indexes more efficient than single multicolumn index
•Provide better overall performance for all queries
•At the expense of individual queries

Use ORDER BY when sequence is important


•DB2 doesn’t guarantee the order of rows returned
•Path of data retrieved may change from each execution
•ORDER BY is mandatory when sequence is important

Limit columns in ORDER BY clause


•For ORDER BY clause, DB2 invokes a Sort
•The more columns in ORDER BY, the less efficient
•Specify only those columns that are essential

Use Equivalent data types


•Essential when comparing column values to host variables
•Eliminates need for data conversion
•Index is not used if data types incompatible
Use Between instead of <= and >=
•Between more efficient than combination of <= and >=
•Optimizer selects a more efficient path

Use IN instead of LIKE


•For known list of data occurrences use IN
•IN with specific list is more efficient than LIKE

Formulate LIKE predicates with care


•Avoid % or _ at the begining of comparison string
•Avoid using LIKE with host variable

Avoid using NOT (except with EXISTS)


•Recode queries to avoid use of NOT
•By taking advantage of knowledge of data being accessed

Code most restrictive predicate first


•Place predicate that eliminates greatest number of rows first

Use Predicates wisely


•Reduce number of predicates
•Know your data to reduce predicates

Specify number of rows to be returned


•Code cursor statement with OPTIMIZE FOR n ROWS
•DB2 selects optimal path
•Does not prevent program from fetching more rows
Complex SQL Guidelines
UNION versus UNION ALL
•Union invokes a sort, UNION ALL does not
•Use UNION ALL when retrieved data does not have duplicates

Use NOT EXISTS instead of NOT IN


•NOT EXISTS verifies non existance
•NOT IN must have complete set of rows materialized
•Use NOT EXISTS for subquery using negation logic

Use constant for existance checking


•If subquery tests existance, specify constant in select-list
•Select-list of subquery is unimportant, will not return data

Predicate transitive closure rules


•DB2 optimizer uses the rule of transitivity, which states
•If a=b and b=c then a=c
•Efficient to code a redundant predicate to exploit transitivity
where a.col1 = b.col1 where a.col1 = b.col1
and a.col1 = :hostvar and a.col1 = :hostvar
and b.col1 = :hostvar

Minimize number of tables in a join


•Do not join more than 5 tables
•Eliminate unnecessary tables from join statements

Denormalize to reduce joining


•To minimize the need for joins, consider denormalizing
•Would imply redundancy, dual updating & usage of space
Reduce the number of rows to be joined
•number of rows participating in join determines response time
•to reduce join response time, reduce number of rows
•code predicates to minimize number of rows

Join using SQL instead of program logic


•efficient to join using SQL instead of application code
•optimizer has a vast array of tools of optimize performance
•application code would not consider equivalent optimizations

Use Joins instead of Subqueries


•Join is more efficient than a correlated subquery or using IN

Join on clustered column


•Use clustered columns in join criteria when possible
•reduces the need for immediate sorts
•might require clustering parent table by primary key
•and child table by foreign key

Join in Indexed columns


•efficient when tables are joined on indexed columns
•consider creating indexes for join predicates

Avoid Cartesian products


•Never use a join without a predicate
•A join without a predicate results in a cartesian product
•A lot of resources are spent on such a join
Provide adequate search criteria
•provide additional search criteria in WHERE clause
•these to be in addition to the join criteria
•provides DB2 an opportunity for ranking tables to be joined
•allows queries to perform adequately

Limit columns to be used in GROUP BY


•specify only minimum columns to be grouped
•DB2 needs to sort retrieved data
•the more columns, the more expensive is the sort
Increase the possibility of Stage 1 processing

A predicate that is satisfied by Stage 1 processing is evaluated


by the Data Manager portion of DB2 rather than the
Relational Data System. The Data Manager component is
at a closer level to data than the Relational System.

Stage 1 predicate is evaluated at earlier stage of data


retrieval, thereby avoiding the overhead of passing data
from one component to another.

The following list shows predicates satisfied by Stage 1

Colname operator value


Colname IS NULL
Colname BETWEEN val1 AND val2
Colname IN (list)
Colname LIKE pattern
Colname LIKE hostvariable
a.Colname operator b.Colname

The last item in list refers to two columns in different tables.


Predicates formulated with AND, OR, NOT are not at Stage 1
Increase the possibility of Index processing

A query that can use an index has more access path options,
so it has the capability of being a more efficient query than
the query that cannot use an index. DB2 optimizer can use
an index or indexes in a variety of ways to speed the retrieval
of data from DB2 tables.

The following list shows predicates satisfied by using Index

Colname operator value


Colname IS NULL
Colname BETWEEN val1 AND val2
Colname IN (list)
Colname LIKE pattern
Colname LIKE hostvariable
a.Colname operator b.Colname

The last item in list refers to two columns in different tables.


Predicates formulated with AND, OR, NOT are not Indexable

You might also like