You are on page 1of 103

SQL and More Databases Final

Simple SQL Queries


A SQL query has a form:
SELECT . . .
FROM . . .
WHERE . . .;
The SELECT clause indicates which attributes should
appear in the output.
The FROM gives the relation(s) the query refers to
The WHERE clause is a Boolean expression indicating
which tuples are of interest.
The query result is a relation
Note that the result relation is unnamed.
Example SQL Query
Relation schema:
Course (courseNumber, name, noOfCredits)
Query:
Find all the courses stored in the database
Query in SQL:
SELECT -
FROM Course;
Note:
- means all the attributes in the relations involved.
Example SQL Query
Relation schema:
Movie (title, year, length, filmType)
Query:
Find the titles of all movies stored in the database
Query in SQL:
SELECT title
FROM Movie;
Example SQL Query
Relation schema:
Student (ID, firstName, lastName, address, GPA)
Query:
Find the ID of every student who has GPA > 3
Query in SQL:
SELECT ID
FROM Student
WHERE GPA > 3;
Example SQL Query
Relation schema:
Student (ID, firstName, lastName, address, GPA)
Query:
Find the ID and last name of every student with first name John,
who has GPA > 3
Query in SQL:
SELECT ID, lastName
FROM Student
WHERE firstName = John AND GPA > 3;
WHERE clause
The expressions that may follow WHERE are conditions
Standard comparison operators includes { =, <>, <, >, <=, >= }
The values that may be compared include constants and
attributes of the relation(s) mentioned in FROM clause
Simple expression
A op Value
A op B
Where A, B are attributes and op is a comparison operator
We may also apply the usual arithmetic operators, +,-,*,/, etc. to
numeric values before comparing them
(year - 1930) * (year - 1930) < 100
The result of a comparison is a Boolean value TRUE or FALSE
Boolean expressions can be combined by the logical operators
AND, OR, and NOT
Example SQL Query
Relation schema:
Movie (title, year, length, filmType)
Query:
Find the titles of all color movies produced in 1990
Query in SQL:
SELECT title
FROM Movie
WHERE filmType = color AND year = 1990;
Example SQL Query
Relation schema:
Movie (title, year, length, filmType)
Query:
Find the titles of all color movies that are either made after 1970 or
are less than 90 minutes long
Query in SQL:
SELECT title
FROM Movie
WHERE (year > 1970 OR length < 90) AND filmType = color;
Note on precedence rules:
AND takes precedence over OR, and
NOT takes precedence over both
Products and Joins
SQL has a simple way to couple relations in one query
list each relevant relation in the FROM clause
All the relations in the FROM clause are coupled through
Cartesian product (, in algebra)
Cartesian Product
From Set Theory:
The Cartesian Product of two sets R and S is the
set of all pairs (a, b) such that: a R and b S.
Denoted as R S
Note:
In general, R S = S R
Example
Instance S: Instance R:
R x S:
B C D
2 5 6
4 7 8
9 10 11
A B
1 2
3 4
A R.B S.B C D
1 2 2 5 6
1 2 4 7 8
1 2 9 10 11
3 4 2 5 6
3 4 4 7 8
3 4 9 10 11
Example
Instance of Course: Instance of Student:
SELECT - FROM Student, Course;
ID firstName lastName GPA Address courseNumber name noOfCredits
111 Joe
Smith 4.0 45 Pine av. Comp352 Data structures 3
111 Joe
Smith 4.0 45 Pine av. Comp353 Databases 4
222 Sue
Brown 3.1 71 Main st. Comp352 Data structures 3
222 Sue
Brown 3.1 71 Main st. Comp353 Databases 4
333 Ann
Johns 3.7 39 Bay st. Comp352 Data structures 3
333 Ann
Johns 3.7 39 Bay st. Comp353 Databases 4
ID firstName lastName GPA Address
111 Joe
Smith 4.0 45 Pine av.
222 Sue
Brown 3.1 71 Main st.
333 Ann
Johns 3.7 39 Bay st.
courseNumber name noOfCredits
Comp352 Data structures 3
Comp353 Databases 4
Example
Instance of Course: Instance of Student:
SELECT ID, courseNumber
FROM Student, Course;
ID firstName lastName GPA Address
111 Joe
Smith 4.0 45 Pine av.
222 Sue
Brown 3.1 71 Main st.
333 Ann
Johns 3.7 39 Bay st.
courseNumber name noOfCredits
Comp352 Data structures 3
Comp353 Databases 4
ID courseNumber
111
Comp352
111
Comp353
222
Comp352
222
Comp353
333
Comp352
333
Comp353
Example
Relation schemas:
Student (ID, firstName, lastName, address, GPA)
Ugrad (ID, major)
Query:
Find all information available about every undergraduate student
We can try to compute the Cartesian product ()
SELECT - FROM Student, Ugrad;
Example
Instance of Ugrad: Instance of Student:
SELECT - FROM Student, Ugrad;
ID firstName lastName GPA Address ID major
111 Joe
Smith 4.0 45 Pine av.
111 CS
111 Joe
Smith 4.0 45 Pine av.
333 EE
222 Sue
Brown 3.1 71 Main st.
111 CS
222 Sue
Brown 3.1 71 Main st.
333 EE
333 Ann
Johns 3.7 39 Bay st.
111 CS
333 Ann
Johns 3.7 39 Bay st.
333 EE
ID firstName lastName GPA Address
111 Joe
Smith 4.0 45 Pine av.
222 Sue
Brown 3.1 71 Main st.
333 Ann
Johns 3.7 39 Bay st.
ID major
111 CS
333 EE
Which tuples should
be in the query result
and which shouldnt?
Example
Instance of Ugrad: Instance of Student:
SELECT -
FROM Student, Ugrad
WHERE Student.ID = Ugrad.ID;
ID firstName lastName GPA Address ID major
111 Joe
Smith 4.0 45 Pine av.
111 CS
333 Ann
Johns 3.7 39 Bay st.
333 EE
ID firstName lastName GPA Address
111 Joe
Smith 4.0 45 Pine av.
222 Sue
Brown 3.1 71 Main st.
333 Ann
Johns 3.7 39 Bay st.
ID major
111 CS
333 EE
Joins in SQL
The above query is an example of Join operation
There are various kinds of joins and we will study them
later in detail
To join relations R
1
,,R
n
in SQL:
List all these relations in the FROM clause
Express the conditions in the WHERE clause in order to get the
desired join
Joining Relations
Relation schemas:
Movie (title, year, length, filmType)
Owns (title, year, studioName)
Query:
Find title, length, and studio name of every movie
Query in SQL:
SELECT Movie.title, Movie.length, Owns.studioName
FROM Movie, Owns
WHERE Movie.title = Owns.title AND Movie.year = Owns.year;
Is Owns in Owns.studioName necessary?


Joining Relations
Relation schemas:
Movie (title, year, length, filmType)
Owns (title, year, studioName)
Query:
Find the title and length of every movie produced by Disney
Query in SQL:
SELECT Movie.title, length
FROM Movie, Owns
WHERE Movie.title = Owns.title AND Movie.year = Owns.year
AND studioName = Disney;
Joining Relations
Relation schemas:
Movie (title, year, length, filmType)
Owns (title, year, studioName)
StarsIn (title, year, starName)
Query:
Find the title and length of each movie with Julia Roberts,
produced by Disney
Query in SQL:
SELECT Movie.title, Movie.length
FROM Movie, Owns, StarsIn
WHERE Movie.title = Owns.title AND Movie.year = Owns.year
AND Movie.title = StarsIn.title AND Movie.year = StarsIn.year
AND studioName = Disney AND starName = Julia Roberts;
Example
title year starName
T1 1990 JR
T2 1991 JR
title year studioName
T1 1990 Disney
T2 1991 MGM
title year length filmTyp
e
T1 1990 124 color
T2 1991 144 color
SELECT Movie.title, Movie.length
FROM Movie, Owns, StarsIn
WHERE Movie.title = Owns.title AND Movie.year = Owns.year AND
Movie.title = StarsIn.title AND Movie.year = StarsIn.year AND
studioName = Disney AND starName = Julia Roberts;
title length
T1 124
Movie
Owns
StarsIn
Example
Relation schemas:
Movie (title, year, length, filmType, studioName, producerC#)
Exec (name, address, cert#, netWorth)
Query:
Find the name of the producer of Star Wars
Query in SQL:
SELECT Exec.name
FROM Movie, Exec
WHERE Movie.title = Star Wars AND
Movie.producerC# = Exec.cert#;
Example
Relation schemas:
Movie (title, year, length, filmType, studioName, producerC#)
Exec (name, address, cert#, netWorth)
Query:
Find the name of the producer of Star Wars
Query with Subquery:
SELECT name
FROM Exec
WHERE cert# = ( SELECT producerC#
FROM Movie
WHERE title = Star Wars );
Example
Relation schemas:
Movie(title, year, length, filmType, studioName, producerC#)
Exec(name, address, cert#, netWorth)
StarsIn(title, year, starName)
Query: Find the names of the producers of Harrison Fords movies
Query in SQL:
SELECT name
FROM Exec
WHERE cert# IN (SELECT producerC#
FROM Movie
WHERE (title, year) IN (SELECT title, year
FROM StarsIn
WHERE starName = Harrison Ford));
Example
Relation schemas:
Movie(title, year, length, filmType, studioName, producerC#)
Exec(name, address, cert#, netWorth)
StarsIn(title, year, starName)
Query:Find names of the producers of Harrison Fords movies
Query in SQL:
SELECT Exec.name
FROM Exec, Movie, StarsIn
WHERE Exec.cert# = Movie.producerC# AND
Movie.title = StarsIn.title AND
Movie.year = StarsIn.year AND
starName = Harrison Ford;
Correlated Subqueries
Relation schema:
Movie(title, year, length, filmType, studioName, producerC#)
Query:
Find movie titles that appear more than once
Query in SQL:
SELECT title
FROM Movie Old
WHERE year < ANY (SELECT year
FROM Movie
WHERE title = Old.title);
Note the scopes of the variables in this query.
Correlated Subqueries
Query in SQL
SELECT title
FROM Movie Old
WHERE year < ANY (SELECT year
FROM Movie
WHERE title = Old.title);
The condition in the outer WHERE is true only if there is a movie with same
title as Old.title that has a later year
The query will produce a title one fewer times than there are movies with that title
What would be the result if we used <>, instead of < ?
For a movie title appearing 3 times, we would get 3 copies of the title in the output
Aggregation in SQL
SQL provides five operators that apply to a column of
a relation and produce some kind of summary
These operators are called aggregations
These operators are used by applying them to a
scalar-valued expression, typically a column name, in
a SELECT clause
Aggregation Operators
SUM
the sum of values in the column
AVG
the average of values in the column
MIN
the least value in the column
MAX
the greatest value in the column
COUNT
the number of values in the column, including the duplicates, unless
the keyword DISTINCT is used explicitly
Example
Relation schema:
Exec(name, address, cert#, netWorth)
Query:
Find the average net worth of all movie executives
Query in SQL:
SELECT AVG(netWorth)
FROM Exec;
The sum of all values in the column netWorth divided by
the number of these values
In general, if a tuple appears n times in a relation, it will be
counted n times when computing the average
Example
Relation schema:
Exec (name, address, cert#, netWorth)
Query:
How many tuples are there in the Exec relation?
Query in SQL:
SELECT COUNT(*)
FROM Exec;
The use of * as a parameter is unique to COUNT;
using * does not make sense for other aggregation operations
Example
Relation schema:
Exec (name, address, cert#, netWorth)
Query:
How many different names are there in the Exec relation?
Query in SQL:
SELECT COUNT (DISTINCT name)
FROM Exec;
In query processing time, the system first eliminates the duplicates
from column name, and then counts the number of values there
Aggregation -- Grouping
Often we need to consider the tuples in an SQL query in
groups, with regard to the value of some other column(s)
Example: suppose we want to compute:
Total length in minutes of movies produced by each studio:
Movie(title, year, length, filmType, studioName, producerC#)
We must group the tuples in the Movie relation according to
their studio, and get the sum of the length values within each
group; the result would be something like:
studio SUM(length)
Disney 12345
MGM 54321


Aggregation - Grouping
Relation schema:
Movie(title, year, length, filmType, studioName, producerC#)
Query: What is the total length in minutes produced by each studio?
Query in SQL:
SELECT studioName, SUM(length)
FROM Movie
GROUP BY studioName;
Whatever aggregation used in the SELECT clause will be applied
only within groups
Only those attributes mentioned in the GROUP BY clause may
appear unaggregated in the SELECT clause
Can we use GROUP BY without using aggregation? (Yes/No)
Aggregation -- Grouping
Relation schema:
Movie(title, year, length, filmType, studioName, producerC#)
Exec(name, address, cert#, netWorth)
Query:
For each producer (name), list the total length of the films produced
Query in SQL:
SELECT Exec.name, SUM(Movie.length)
FROM Exec, Movie
WHERE Movie.producerC# = Exec.cert#
GROUP BY Exec.name;
Aggregation HAVING clause
We might be interested in not all but some groups of tuples
that satisfy certain conditions
We can follow a GROUP BY clause with a HAVING clause
HAVING is followed by some conditions about the group
We can not use a HAVING clause without GROUP BY
Aggregation HAVING clause
Relation schema:
Movie (title, year, length, filmType, studioName, producerC#)
Exec(name, address, cert#, netWorth)
Query:
For those producers who made at least one film prior to 1930, list the
total length of the films produced
Query in SQL:
SELECT Exec.name, SUM(Movie.length)
FROM Exec, Movie
WHERE producerC# = cert#
GROUP BY Exec.name
HAVING MIN(Movie.year) < 1930;
Aggregation HAVING clause
This query chooses the group based on the property of the group

SELECT Exec.name, SUM(Movie.length)
FROM Exec, Movie
WHERE producerC# = cert#
GROUP BY Exec.name
HAVING MIN(Movie.year) < 1930;

This query chooses the movies based on the property of each movie tuple

SELECT Exec.name, SUM(Movie.length)
FROM Exec, Movie
WHERE producerC# = cert# AND Movie.year < 1930
GROUP BY Exec.name;

Note the difference!

Order By
The SQL statements/queries we looked at so far return an unordered
relation/bag (except when using ORDER BY)
Movie (title, year, length, filmType, studioName, producerC#)

SELECT Exec.name, SUM(Movie.length)
FROM Exec, Movie
WHERE producerC# = cert#
GROUP BY Exec.name
HAVING MIN(Movie.year) < 1930
ORDER BY Exec.name ASC;

In general:
ORDER BY A1 ASC, B DESC, C ASC;

Database Modifications
SQL & Database Modifications?
Next we will look at SQL statements that do not return something,
but rather change the state of the database
There are three types of such SQL statements/transactions:
Insert tuples into a relation
Delete certain tuples from a relation
Update values of certain components of certain existing tuples
We refer to these types of operations collectively as database
modifications, and refer to such requests as transactions
Insertion
The insertion statement consists of:
The keyword INSERT INTO
The name of a relation R
A parenthesized list of attributes of the relation R
The keyword VALUES
A tuple expression, that is, a parenthesized list of concrete values,
one for each attribute in the attribute list
The form of an insert statement:
INSERT INTO R(A
1
, A
n
)

VALUES (v
1
, v
n
);
A tuple is created and added, where v
i
is the value of

attribute
A
i
, for i =

1,2,,n
Insertion
Relation schema:
StarsIn (title, year, starName)
Update the database:
Add Sydney Greenstreet to the list of stars of The Maltese Falcon

In SQL:
INSERT INTO StarsIn (title,year, starName)
VALUES(The Maltese Falcon, 1942, Sydney Greenstreet);
Another formulation of this query:
INSERT INTO StarsIn
VALUES(The Maltese Falcon, 1942, Sydney Greenstreet);


Insertion
The previous insertion statement was very simple
It added only one tuple into a relation
Instead of using explicit values for one tuple, we can
compute a set of tuples to be inserted using a subquery
This subquery replaces the keyword VALUES and the tuple
expression in the INSERT statement
Insertion
Database schema:
Studio(name, address, presC#)
Movie(title, year, length, filmType, studioName, producerC#)
Update the database:
Add to Studio, all studio names mentioned in the Movie relation

If the list of attributes does not include all attributes of relation
R, then the tuple created has default values for the missing
attributes
Since there is no way to determine an address or a
president for such a studio value, NULL will be used for the
attributes address and presC#
Insertion
Database schema:
Studio(name, address, presC#)
Movie(title, year, length, filmType, studioName, producerC#)
Update the database:
Add to Studio, all studio names mentioned in the Movie relation

In SQL:
INSERT INTO Studio(name)
SELECT DISTINCT studioName
FROM Movie
WHERE studioName NOT IN (SELECT name
FROM Studio);
Deletion
A deletion statement consists of :
The keyword DELETE FROM
The name of a relation R
The keyword WHERE
A condition
The form of a delete statement:
DELETE FROM R WHERE <condition>;
The effect of executing this statement is that every tuple in
relation R satisfying the condition will be deleted from R
Note: unlike the INSERT, we need a WHERE clause here
Deletion
Relation schema:
StarsIn(title, year, starName)
Update:
Delete: Sydney Greenstreet was a star in The Maltese Falcon

In SQL:
DELETE FROM StarIn
WHERE title = The Maltese Falcon AND
starName = Sydney Greenstreet;
Deletion
Relation schema:
Exec(name, address, cert#, netWorth)
Update:
Delete every movie executive whose net worth is < $10,000,000

In SQL:
DELETE FROM Exec
WHERE netWorth < 10,000,000;

Anything wrong here?!
Deletion
Relation schema:
Studio(name, address, presC#)
Movie(title, year, length, filmType, studioName, producerC#)
Update:
Delete from Studio, all movies produced by studios not mentioned in
Movie (i.e., we dont want to have non-producing studios)

In SQL:
DELETE FROM Studio
WHERE name NOT IN (SELECT StudioName
FROM Movie);
Defining Database Schema
To create a table in SQL:
CREATE TABLE name (list of elements);
Principal elements are attributes and their types, but key
declarations and constraints may also appear
Example:
CREATE TABLE Star (
name CHAR(30),
address VARCHAR(255),
gender CHAR(1),
birthdate DATE
);
Defining Database Schema
To delete a table:
DROP TABLE name;
Example:
DROP TABLE Star;
Data types
INT or INTEGER
REAL or FLOAT
DECIMAL(n, d) -- NUMERIC(n, d)
DECIMAL(6, 2), e.g., 0123.45
CHAR(n)/BIT(B) fixed length character/bit string
Unused part is padded with the "pad character, denoted as
VARCHAR(n) / BIT VARYING(n) variable-length strings up
to n characters
Data types (contd)
Time:
SQL2 format is TIME 'hh:mm:ss[.ss...]'
Date:
SQL2 format is DATE yyyy-mm-dd (m =0 or 1)
The default format of date in Oracle is dd-mon-yy
Example:
CREATE TABLE Days(d DATE);
INSERT INTO Days VALUES(08-aug-02);
Oracle function to_date converts a specified format into
default format, e.g.,
INSERT INTO Days VALUES (to_date('2002-08-08', 'yyyy-mm-dd'));
Altering Relation Schemas
Adding Columns
Add an attribute to a relation R with
ALTER TABLE R ADD <column declaration>;
Example: Add attribute phone to table Star
ALTER TABLE Star ADD phone CHAR(16);
Removing Columns
Remove an attribute from a relation R using DROP:
ALTER TABLE R DROP COLUMN <column_name>;
Example: Remove column phone from Star
ALTER TABLE Star DROP COLUMN phone;
Note: Cant drop if it is the only column
Attribute Properties
We can assert that the value of an attribute to be:
NOT NULL
every tuple must have a real (non-null) value for this attribute
DEFAULT value
Null is the default value for every attribute A
The owner of the relation can define some other value as the
default, instead of NULL
Attribute Properties
CREATE TABLE Star (
name CHAR(30),
address VARCHAR(255),
gender CHAR(1) DEFAULT ?,
birthdate DATE NOT NULL);
Example: Add an attribute with a default value:
ALTER TABLE Star ADD phone CHAR(16) DEFAULT unlisted;
INSERT INTO Star(name, birthdate) VALUES (Sally ,0000-00-00)
name address gender birthdate phone
Sally NULL ? 0000-00-00 unlisted
INSERT INTO Star(name, phone) VALUES (Sally,333-2255)
this insertion could not be performed since the value for birthdate is
not given and it is disallowed to use NULL as the default
Schema Refinement
Functional Dependencies:
Essential to Normalization
Theory

Functional Dependencies
Consider the relation:
Movie (title, year, length, filmType, studioName, starName)
What are the functional dependencies?
title, year length
title, year filmType
title, year studioName
title, year length, filmType, studioName
Note that the FD title, year starName does not hold


Logical Implication: Reasoning with FDs
Consider relation R(A, B,C) with the set of FDs:
F = {AB, BC}
We can deduce from F that AC also holds on R.
How? Apply the definition
To detect possible redundancy, is it necessary to
consider all the given FDs?
As shown above, there might be some additional hidden
(nontrivial) FDs implied by a given set of FDs
Logical Implication (Contd)
Consider R(A
1
,A
2
,A
3
,A
4
,A
5
) with FDs:
F = { A
1
A
2
, A
2
A
3
, A
2
A
3
A
4
, A
2
A
3
A
4
A
5
}
Prove that F A
5
A
1

Solution method: Provide a counter-example; give a relation
instance r of R that satisfies every FD in F but not

A
5
A
1

A
1
A
2
A
3
A
4
A
5

t1: 0 1 1 1 1
t2: 1 1 1 1 1

A desired instance r of R.

Closure of a set of FDs
Defn: The closure of F, denoted F
+
, is the set of
FDs that are logically implied by F
How can we compute F
+
?
Definitely, F
+
includes F but possibly more FDs
We need to know how to reason about FDs

Equivalence
Defn: Suppose R is a relation schema, and S and T are
sets of functional dependencies on R.
T and S are equivalent (S T)

Example: Suppose R = {A,B,C}, and
S = {A B, B C, A C}
T = {A B, B C}
Can show that S T
Armstrongs Axioms [1974]
R is a relation schema, and X, Y and Z are subsets of R.
Reflexivity
If Y X, then X Y (trivial FDs)
Augmentation
If X Y, then XZ YZ, for every Z
Transitivity
If X Y and Y Z, then X Z
These are sound and complete inference rules for FDs
Additional rules / axioms
Other useful rules that follow from Armstrong Axioms
Union (Combining) Rule
If X Y and X Z, then X YZ
Decomposition (Splitting) Rule
If X YZ, then X Y and X Z
Pseudotransitivity Rule
If X Y and WY Z, then XW Z
NOTE: X, Y, Z, and W are sets of attributes
Example Discovering hidden FDs
Consider a relation schema R = {A, B, C, G, H, I} with
FDs F = { A B, A C, CG H, CG I, B H }
Using these rules, we can derive the following FDs
Since A B and B H, then A H, by transitivity
Since CG H and CG I, then CG HI, by union
Since A C then AG CG, by augmentation
Now, since AG CG and CG I, then AG I, by
transitivity (Do AG H)
Many trivial dependencies can be derived(!) by augmentation
Computing the Closure of Attributes
Given a set F of FDs and a set X of attributes, how do we
compute the closure of X w.r.t. F?
Starting with X, we repeatedly expand the set, by adding the right
hand side (RHS) of every FD, once we have included its LHD in
the set.
When the set cannot be expanded anymore, we have obtained
the result, X
+
An Algorithm to Compute X
+
under F

X
+
X (initialization step)
repeat
for each FD W Z in F do:
if W _ X
+
then

X
+
X
+
Z // include Z to the result
until X
+
does not change anymore



Complexity question: In the worst case, how many times
the repeat statement will be executed?

Example
Consider a relation scheme R = { A, B, C, D, E, F } with
the set of FDs { AB C, BC AD, D E, CF B }
Compute {A, B}
+

Execution result at each iteration:
X
+
= {A, B}
Using AB C, we get X
+
= {A, B, C}
Using BC AD, we get X
+
= {A, B, C, D}
Using D E, we get X
+
= {A, B, C,D, E}
No more change to X
+
is possible.
X
+
= {A, B}
+
= {A, B, C, D, E}
Does the order in which FDs appear matter in this computation?


Implication Problem Revisited
Is a given FD X Y implied by a set F of FDs?
That is to ask whether X Y is in F
+
?
How to answer this question?
An algorithm for this:
Compute X
+
under F, and
Check if Y is in X
+
If yes, then F X Y
Otherwise F X Y
Example
Consider a relation schema R = { A, B, C, D, E, F } with
the FDs F = { AB C, BC AD, D E, CF B }
True/false: F AB D?
Two steps:
Compute X
+
= {A, B}
+
= {A, B, C, D, E}
Check if D e X
+

Yes, AB D is implied by F
Example
Consider a relation scheme R = { A, B, C, D, E, F } with
FDs F = { AB C, BC AD, D E, CF B }
Is D A implied by F?
Two steps:
Compute X
+
= {D}
+
= {D, E}
Check if A e X
+

Since A is not in {D, E}, we conclude that D A is not
implied by F
Closures and Keys
When will X
+
include all attributes of a relation R?
Clearly, the answer is yes iff X

is a (superkey) key of R
To check if X

is a candidate key of R, we may check if:
1. X
+
contains all attributes of R, i.e., X
+
= R, and
2. No proper subset S of X has this property, i.e., AX, {XA}
+
= R
Knowledge about keys is essential for Normal forms
Canonical Cover
Number of iterations of the algorithm for computing the
closure of a set of attributes depends on the number of
FDs in F
The same will be observed for other algorithms that we will study
(such as the decomposition algorithms)
Can we minimize F?
Covers
FDs can be represented in several different ways without changing
the set of legal/valid instances of the relation
Let F and G be sets of FDs. We say G follows from F, if every
relation instance that satisfies F also satisfies G. In symbols: F G.
We may also say: G is implied by F or G is covered by F.
If both F G and G F hold, then we say that G and F are
equivalent and denote this by F G
Note that F G iff F
+
G
+
If F G we may also say: G is a cover of F and vice versa


Canonical Cover
Let F be a set of FDs. A canonical / minimal cover
of F is a set G of FDs that satisfies the following:
1. G is equivalent to F; that is, G F


2. G is minimal; that is, if we obtain a set H of FDs from
G by deleting one or more of its FDs, or by deleting
one or more attributes from some FD in G, then F H
3. Every FD in G is of the form X A, where A is a
single attribute
Canonical Cover
A canonical cover G is minimal in two respects:
1. Every FD in G is required in order for G to be equivalent to F

2. Every FD in G is as small as possible, that is,
each attribute on the left hand side is necessary.
Recall: the RHS of every FD in G is a single attribute
Computing Canonical Cover
Given a set F of FDs, how to compute a canonical cover G of F?
Step 1: Put the FDs in the standard form
Initialize G := F
Replace each FD X A
1
A
2
A
k
in G with XA
1
, XA2, , XA
k
Step 2: Minimize the left hand side of each FD
E.g., for each FD AB C in G, check if A or B on the LHS is redundant ,

i.e., (G {AB C } {A C })
+
F
+
?
Step 3: Delete redundant FDs
For each FD X A in G, check if it is redundant, i.e., whether
(G {X A })
+
F
+
?
Computing Canonical Cover
R = { A, B, C, D, E, H}
F = { A B, DE A, BC E, AC E, BCD A,
AED B }
Step one put FDs in the standard form
All present FDs are in the standard form
G = {AB, DE A, BC E, AC E, BCD A, AED B}
Computing Canonical Cover
Step two Check for left redundancy
For every FD X A in G, check if the closure of a subset of X
determines A. If so, remove redundant attribute(s) from X
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,
AED B }
Computing Canonical Cover
G = { A B, DE A, BC E, AC E, BCD A, AED B }
A B
obviously OK (no left redundancy)
DE A
D
+
= D
E
+
= E
OK (no left redundancy)

R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,
AED B }
Computing Canonical Cover
G = { A B, DE A, BC E, AC E, BCD A, AED B }
BC E
B
+
= B
C
+
= C
OK (no left redundancy)
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,
AED B }
Computing Canonical Cover
G = { A B, DE A, BC E, AC E, BCD A, AED B }
AC E
A
+
= AB
C
+
= C
OK (no left redundancy)
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,
AED B }
Computing Canonical Cover
G = { A B, DE A, BC E, AC E, BCD A, AED B }
BCD A
B
+
= B
C
+
= C
D
+
= D
BC
+
= BCE
CD
+
= CD
BD
+
= BD
OK (no left redundancy)
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,
AED B }
Computing Canonical Cover
G = { A B, DE A, BC E, AC E, BCD A, AED B }
AED B
A
+
= AB
E & D are redundant
we can remove them
from AED B
G = { A B, DE A, BC E, AC E, BCD A, A B }
G = { DE A, BC E, AC E, BCD A, A B }
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,
AED B }
Computing Canonical Cover
Step 3 Check for redundant FDs
For every FD X A in G
Remove X A from G; call the result G
Compute X
+
under G
If A e X
+
, then X A is redundant and hence we remove
the FD X A from G (that is, we rename G to G)
R = { A, B, C, D, E, H}
F = { A B, DE A, BC E, AC E, BCD A,
AED B }
Computing Canonical Cover
G = { DE A, BC E, AC E, BCD A, A B }
Remove DE A from G
G = { BC E, AC E, BCD A, A B }
Compute DE
+
under G
DE
+
= DE (computed under G)
Since A DE, the FD DE A is not redundant
G = { DE A, BC E, AC E, BCD A, A B }
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,
AED B }
Computing Canonical Cover
G = { DE A, BC E, AC E, BCD A, A B }
Remove BC E from G
G = { DE A, AC E, BCD A, A B }
Compute BC
+
under G
BC
+
= BC
BC E is not redundant
G = { DE A, BC E, AC E, BCD A, A B }
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,
AED B }
Computing Canonical Cover
G = { DE A, BC E, AC E, BCD A, A B }
Remove AC E from G
G = { DE A, BC E, BCD A, A B }
Compute AC
+
under G
AC
+
= ACBE
Since E ACBE, AC E is redundant remove it from G
G = { DE A, BC E, BCD A, A B }
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,
AED B }
Computing Canonical Cover
G = { DE A, BC E, BCD A, A B }
Remove BCD A from G
G = { DE A, BC E, A B }
Compute BCD
+
under G
BCD
+
= BCDEA
This FD is redundant remove it from G
G = { DE A, BC E, A B }
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,
AED B }
Computing Canonical Cover
G = { DE A, BC E, A B }
Remove A B from G
G = { DE A, BC E }
Compute A
+
under G
A
+
= A
This FD is not redundant (Another reason why this is true?)
G = { DE A, BC E, A B }
G is a minimal cover for F
R = { A, B, C, D, E, F }
F = { A B, DE A, BC E, AC E, BCD A,
AED B }
Several Canonical Covers Possible?
Relation R={A,B,C} with F = {A B, A C,
B A, B C, C B, C A}
Several canonical covers exist
G = {A B, B A, B C, C B}
G = {A B, B C, C A}
A B
C
A B
C
A B
C
Can you find more ?
How to Deal with Redundancy?
Name Address RepresentingFirm SpokesPerson
Carrie Fisher 123 Maple Star One Joe Smith
Harrison Ford 789 Palm dr. Star One Joe Smith
Mark Hamill 456 Oak rd. Movies & Co Mary Johns
Relation Instance:
Relation Schema:
Star (name, address, representingFirm, spokesPerson)
We can decompose this relation into two smaller relations
F = { name address, representingFirm, spokePerson,
representingFirm spokesPerson }
How to Deal with Redundancy?
Relation Schema:
Star (name, address, representingFirm, spokesperson)
Decompose this relation into the following relations:
Star (name, address, representingFirm)
with F1={ name address, representingFirm }
and
Firm (representingFirm, spokesPerson)
with F2= { representingFirm spokesPerson }


F = { representingFirm spokesPerson }
How to Deal with Redundancy?
Name Address RepresentingFirm Spokesperson
Carrie Fisher 123 Maple Star One Joe Smith
Harrison Ford 789 Palm dr. Star One Joe Smith
Mark Hamill 456 Oak rd. Movies & Co Mary Johns
Relation Instance before decomposition:
Name Address RepresentingFirm
Carrie Fisher 123 Maple Star One
Harrison Ford 789 Palm dr. Star One
Mark Hamill 456 Oak rd. Movies & Co
Relation Instances after decomposition:
RepresentingFirm Spokesperson
Star One Joe Smith
Movies & Co Mary Johns
Decomposition
A decomposition of a relation schema R consists of replacing R by
two or more non-empty relation schemas such that each one is a
subset of R and together they include all attributes of R. Formally,
R = {R
1
,,R
m
} is a decomposition if all conditions below hold:
(0) R
i
, for all i in {1,,m}
(1) R
1
R
m
= R
(2) R
i
R
j
,

for different i and j in {1,,m}
When m = 2, the decomposition R = { R
1
, R
2
} is called binary
Not every decomposition of R is desirable
Properties of a decomposition?
(1) Lossless-join this is a must
(2) Dependency-preserving this is desirable
Explanation follows
Example
Relation Instance: Decomposed into:
B C
2 3
2 5
A B C
1 2 3
4 2 5
A B
1 2
4 2
To recover information, we join the relations:
A B C
1 2 3
4 2 5
4 2 3
1 2 5
Why do we have new tuples?
Lossless-Join Decomposition
R is a relation schema and F is a set of FDs over R.
A binary decomposition of R into relation schemas R
1
and
R
2
with attribute sets X and Y is said to be a lossless-join
decomposition with respect to F, if for every instance r of
R that satisfies F, we have t
X
( r ) t
Y
( r ) = r
Thm: Let R be a relation schema and F a set of FDs on R.
A binary decomposition of R into R
1
and R
2
with attribute
sets X and Y is lossless iff X Y X or X Y Y,
i.e., this binary decomposition is lossless if the common
attributes of X and Y form a key of R
1
or R
2
Example: Lossless-join
Relation Instance: Decomposed into:
B C
2 3
A B C
1 2 3
4 2 3
A B
1 2
4 2
To recover the original relation r, we join the two relations:
A B C
1 2 3
4 2 3
F = { B C }
No new tuples !
Example: Dependency Preservation
Relation Instance:
Decomposed into:
B C D
2 5 7
3 6 8
A B
1 2
4 3
F = { B C, B D, A D }
A B C D
1 2 5 7
4 3 6 8
Can we enforce A D?
How ?
Dependency-Preserving Decomposition
A dependency-preserving decomposition allows us to enforce
every FD, on each insertion or modification of a tuple, by
examining just one single relation instance
Let R be a relation schema that is decomposed into two schemas
with attribute sets X and Y, and let F be a set of FDs over R. The
projection of F on X (denoted by F
X
) is the set of FDs in F
+
that
involve only attributes in X
Recall that a FD U V in F
+
is in F
X
if all the attributes in U
and V are in X; In this case we say this FD is relevant to X
The decomposition of < R, F > into two schemas with attribute sets
X and Y is dependency-preserving if ( F
X
F
Y
)
+
F
+

Normal Forms
Given a relation schema R, we must be able to determine
whether it is good or we need to decompose it into
smaller relations, and if so, how?
To address these issues, we need to study normal forms
If a relation schema is in one of these normal forms, we
know that it is in some good shape in the sense that
certain kinds of problems (related to redundancy) cannot arise
1NF 2NF 3NF BCNF
Normal Forms
The normal forms based on FDs are
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Boyce-Codd normal form (BCNF)
These normal forms have increasingly restrictive
requirements

You might also like