You are on page 1of 22

BTM 382 Database Management

Chapter 6:
Normalization of Database Tables

Chitu Okoli
Associate Professor in Business Technology Management
John Molson School of Business, Concordia University, Montral

Structure of BTM 382 Database


Week 1: Introduction and overview
Management

ch1: Introduction
Weeks 2-6: Database design
ch3: Relational model
ch4: ER modeling
ch6: Normalization
ERD modeling exercise
ch5: Advanced data modeling
Week 7: Midterm exam
Weeks 8-10: Database programming
ch7: Intro to SQL
ch8: Advanced SQL
SQL exercises
Weeks 11-13: Database management
ch2,12,14: Data models
ch13: Business intelligence and data warehousing
ch9,15,16: Selected managerial topics

Review of Chapter 6:
Normalization of Database Tables
What are dependencies between attributes in a table,
and how is tracing of dependencies used to
normalize tables?
What are the normal forms in a relational database?
Why and when would you consider denormalizing
tables in a relational database?

Problems with unnormalized tables


Needless redundancy, hence insert, update and
delete anomalies (inconsistencies)
Data updates are less efficient because tables are
larger
Indexing is less effective
Views (virtual tables) are more cumbersome

Understanding dependencies to be
able to properly normalize tables

Functional dependency
Functional dependency: AB or (A,B)(C,D)
B is functionally dependent on A means that if you know A, then there
you definitely know the correct value for B
E.g. Project.ID Project.Name
Also called determination: A determines B

Full functional dependency: (A,B)C where AC and BC


When all the attributes in a key are required for the determination (none is
optional)
E.g. (Project.ID, Project.Manager) Project.Name
Project.Manager is optionalthis is not a full functional dependency
E.g. (Project.Manager, Project.StartDate) Project.Name
This is a full functional dependency, assuming a manager can launch no
more than one project on a given date

Repeating group = multivalued


attribute

Attribute whose values contain multiple values (a list


or array of values), instead of a single value
Illegal in the relational model; troublesome for
normalization if you dont catch it
Two possible solutions
(e.g. Project.ID Project.Location):
1. Create multiple attributes for each possible value (e.g.
Project.Location1, Project.Location2, Project.Location3)
2. Create a new entity to store multiple possible values (e.g.
Location)

Multivalued dependency
Functional dependency: AB
Multivalued dependency: AB,C
A determines B and A determines C, but B and C have
nothing to do with each other
E.g. Project.ID Project.EmployeeID, Project.Location
Since a project might have multiple locations and multiple
employees work on a project, the EmployeeID and Location in
the same row might have nothing to do with each other

Usually indicates that one or more multivalued attributes


were not handled properly

Partial and transitive dependencies


Partial dependency: (A,B)(C,D) and BC
(A,B) is a candidate key (e.g. primary key)
C doesnt need both A and B to determine it; it only needs B
E.g. (Project.ID,Project.ManagerID) Project.Name
and Project.ID Project.Name

Transitive dependency: A(B,C) and BC


A is a candidate key
Technically speaking, a transitive dependency requires that B and C not be part of
any candidate key. However, if you expand the meaning to include even if they
are part of the key, then you will automatically avoid problems with BCNF

A determines C, but so does B, even though B is not a candidate


key
E.g. Project.ID (Project.Client,Project.Location)
and Project.Client Project.Location

The normal forms

Normalization of relations: https://youtu.be/NwcVv1cxflk


(note and 0:34 and 1:50)

Summary of attaining normal forms

1NF: Primary key identified and no multivalued attributes


Legitimate primary key selected (unique identifying key)
Only one value per table cell; no lists/arrays (multivalued attributes) in any table cell
If you split multivalued attributes off to separate tables, then you avoid 4NF violations

2NF: 1NF minus partial dependencies


All candidate key dependencies are fully functional
(A,B)C where AC and BC

3NF/BCNF: 2NF minus transitive dependencies


Only a candidate key determines any attribute
If A(B,C), then B C
There is a technical distinction between 3NF and BCNF, but if you keep this rule, then you
take care of both 3NF and BCNF

4NF: BCNF minus multivalued dependencies


Each row strictly describes just one entity
If you split multivalued attributes into separate tables to attain 1NF, then you also avoid
4NF violations

DKNF, 5NF, 6NF


relatively rare and often not worth the trouble normalizing, even if applicable

Dependency diagram:
Basic tool for normalization
Depicts all dependencies found in a given table structure
Gives birds-eye view of all relationships among tables
attributes
Makes it less likely that you will overlook an important
dependency

3NF vs BCNF

BCNF is only an issue


because of poor selection
of primary key for 1NF step
Regardless, dealing with all
dependencies resolves
table into BCNF

Fixing 4NF problem


The only reason a table
might be in 3NF/BCNF
but not in 4NF is
because two originally
multivalued attributes
existed at 1NF stage
Multivalued attributes
should always be placed
in separate tables, or be
split into multiple
attributes
If you do this in the first
step to resolve 1NF, you
will never have
problems with 4NF

Denormalization

Denormalization
Although normalization is important, processing speed
and efficiency are also important in database design

Summary of Chapter 6:
Normalization of Database Tables
Correctly identifying dependencies from the very
beginning is critical to properly normalize tables.
The most important normal forms are 1NF, 2NF, 3NF,
BCNF and 4NF.
Although normalization to 4NF is usually important, a
designer might sometimes want to denormalize a
table to a lower normal form.

Sources
Most of the slides are adapted from
Database Systems: Design, Implementation and
Management
by Carlos Coronel and Steven Morris. 11th edition
(2015) published by Cengage Learning. ISBN 13:
978-1-285-19614-5
Other sources are noted on the slides themselves

You might also like