You are on page 1of 35

Normalization

Intro
Good database design must be matched with good table structures E-R diagrams help use with overall design, but a good conceptual design does not necessarily lead to a good table structures Tables are the basic building blocks of the database We wish to avoid data anomalies/redundancies by controlling the table structure logically The process of identifying and eliminating data anomalies and redundancies is called normalization

Redundancy & Anomalies


Data redundancy = data stored in several places
Too much data redundancy causes problems--which value is correct? Data integrity and consistency suffer

Data anomaly = abnormal data relationships


Insertion anomaly - Cant add data because dont know entire primary key value, e.g., primary key based on first, middle, and last name Deletion anomaly - Deletions result in too many fields being removed unintentionally, e.g., delete an employee but lose transaction data Update anomaly - Change requires many updates, e.g., if you store customer names in transaction tables

Normalization
Step-by-step process for eliminating data redundancies and anomalies Enables us to recognize bad table structures Enables us to create good table structures Stages are called normal forms each better than previous (less anomalies/redundancy)
First normal form (1NF) Second normal form (2NF) Third normal form (3NF)

Normalization
There are fourth and fifth normal forms that are seldom used Highest level not always the most desirable In general, the higher the level the slower the database response because of underlying pointer movement Most professionally designed databases reach third normal form

Need for Normalization


To recognize good design, first look at bad one Example, construction company manages several projects and whose charges are dependent on employees position

Desired Report
Proj No Proj Name 1 Hurricane Emp No 101 102 104 Emp Name John News David Senior Anne Ramoras Job Class Elec Eng Comm Tech Comm Tech Chg/Hr Hrs Billed 65 13 60 16 60 19 Sub Tot 65 55 15 17 Sub Tot 18 14 Sub Tot Total Tot Chg 845 960 1,140 2,945 975 935 1,910 1,080 840 1,920 6,775

2 Coast

101 John News 103 June Arbough

Elec Eng Biol Eng

3 Satellite

104 Anne Ramoras 102 David Senior

Comm Tech Comm Tech

60 60

Table
P_No P_Name 1 Hurricane E_No 101 102 104 101 103 104 102 E_Name John News David Senior Anne Ramoras John News June Arbough Anne Ramoras David Senior Job_Class Chg_Hr Elec Eng 65 Comm Tech 60 Comm Tech 60 Elec Eng 65 Biol Eng 55 Comm Tech 60 Comm Tech 60 Hrs 13 16 19 15 17 18 14

2 Coast 3 Satellite

Another View of Table


Group 1
P_No 1 2 3 P_Name Hurricane Coast Satellite E_No1 101 101 104 E_Name1 John News John News Anne Ramoras Job_Class1 Elec Eng Elec Eng Comm Tech

Group 2
Chg_Hr1 Hrs1 E_No2 E_Name2 65 13 102 David Senior 65 15 103 June Arbough 60 18 102 David Senior

Etc.
Job_Class2 Chg_Hr2 Hrs2 Comm Tech 60 16 Biol Eng 55 17 Comm Tech 60 14

Problems
P_No intended to be primary key but contains null values Data redundancies
Invites data inconsistencies (Elect Eng & EE)

Anomalies
Update anomaly modify Job_Class for E_No 101 requires many alterations Insert anomaly to add a project row we need an employee Deletion anomaly delete E_No 101, we delete other vital data too

Problems
Date redundancy
If add new employee to project 2 must type:
2 C oas t 104 A nne Ram oras Com m Tec h 60 19

Wastes data entry time Wastes storage space Leads to data inconsistency
Huricane or Hurricane or Hurracaine

Conversion to 1NF
Table above has repeating groups Each P_No has a group of entries
P_No P_Name 1 Hurricane E_No 101 102 104 E_Name John News David Senior Anne Ramoras Job_Class Chg_Hr Elec Eng 65 Comm Tech 60 Comm Tech 60 Hrs 13 16 19

1NF
Eliminate repeating groups By adding entries in primary key column (at least)
P_No 1 1 1 2 2 3 3 P_Name Hurricane Hurricane Hurricane Coast Coast Satellite Satellite E_No 101 102 104 101 103 104 102 E_Name John News David Senior Anne Ramoras John News June Arbough Anne Ramoras David Senior Job_Class Chg_Hr Elec Eng 65 Comm Tech 60 Comm Tech 60 Elec Eng 65 Biol Eng 55 Comm Tech 60 Comm Tech 60 Hrs 13 16 19 15 17 18 14

Problems
Primary key P_No does not uniquely identify all attributes in row Must create composite key made up of P_No & E_No

Dependency Diagram
Helps us to discover relationships between entity attributes Upper arrows implies dependency on P_No & E_No Lower arrows implies dependency on only one attribute

P_No

P_Name

E_No E_Name

Job_Class

Chg_Hr

Hrs

Dependencies
Upper arrows If you know P_No & E_No you can determine the other row values Lower arrows Partial dependencies based on only part of key P_Name only dependent on P_No E_Name, Job_Class, Chg_Hr only dependent on E_No Dependency diagram may be written: P_No, E_No P_Name, E_Name, Job_Class, Chg_Hr, Hrs P_No P_Name E_No E_Name, Job_Class, Chg_Hr

New Table
Composite primary key P_No & E_No
Charges Table
P_No 1 1 1 2 2 3 3 E_No 101 102 104 101 103 104 102 P_Name Hurricane Hurricane Hurricane Coast Coast Satellite Satellite E_Name John News David Senior Anne Ramoras John News June Arbough Anne Ramoras David Senior Job_Class Chg_Hr Elec Eng 65 Comm Tech 60 Comm Tech 60 Elec Eng 65 Biol Eng 55 Comm Tech 60 Comm Tech 60 Hrs 13 16 19 15 17 18 14

1NF Definition
1. All the key attributes are defined Any attribute that is part of the primary key 1. There are no repeating groups in the table Each cell can contain one and only one value, rather than set 1. All attributes are dependent on the primary key

Problems
Contains partial dependencies
Dependencies base on only part of the primary key

This makes table subject to data redundancies and hence to data anomalies Redundancy caused by fact that every row entry requires duplicate data
E.g., suppose E_No 105 is entered 20 times, must also enter E_Name, Job_Class, Chg_Hr

Anomalies caused by redundancy


E.g., employee name may be spelled Dave Senior or D. Senior or David Senior

Conversion to 2NF
1. Starting with 1NF write each of the key components on separate lines, then write the original key on the last line P_No E_No P_No E_No Each will become key in a new table Original table split into three tables

Conversion to 2NF
2. Write the dependent attributes after each of the new keys using the dependency diagram P_No P_Name E_No E_Name, Job_Class, Chg_Hr P_No E_No Hrs

Three New Tables


Project Table P_No P_Name 1 Hurricane 2 Coast 3 Satellite Employee Table
E_No 101 102 103 104 E_Name John News David Senior June Arbough Anne Ramoras Job_Class Elec Eng Comm Tech Biol Eng Comm Tech Chg_Hr 65 60 55 60

Assign Table
P_No 1 1 1 2 2 3 3 E_No 101 102 104 101 103 104 102 Hrs 13 16 19 15 17 18 14

2NF Definition
1. Table is in 1NF and 2. It includes no partial dependencies (no attribute is dependent on only a portion of the primary key) Note: Since partial dependencies can exist only if there is a composite key, a table with a single attribute as primary key is automatically in 2NF if it is in 1NF

Problem - Transitive Dependency


Note that Chg_Hr is dependent on Job_Class, but neither Chg_Hr nor Job_Class is part of the primary key This is called transitive dependency
A condition in which an attribute is functionally dependent on non-key attributes (another attribute that is not part of the primary key)

Transitive dependency yields data anomalies

Conversion to 3NF
Break off the pieces that are identified by the transitive dependency arrows (lower arrows) in the dependency diagram Store them in a separate table
P_No P_Name E_No E_Name, Job_Class P_No E_No Hrs Job_Class Chg_Hr

Note: Job_Class must be retained in Employee table to establish a link to the newly created Job table

New Tables
Project Table P_No P_Name 1 Hurricane 2 Coast 3 Satellite Assign Table
P_No 1 1 1 2 2 3 3 E_No 101 102 104 101 103 104 102 Hrs 13 16 19 15 17 18 14

Employee Table
E_No 101 102 103 104 E_Name John News David Senior June Arbough Anne Ramoras Job_Class Elec Eng Comm Tech Biol Eng Comm Tech

Job Table

Job_Class Biol Eng Comm Tech Elec Eng

Chg_Hr 55 60 65

3NF Definition
1. Table is in 2NF and 2. It contains no transitive dependencies

Problem
Although the four tables are in 3NF, we have a potential problem The Job_Class is entered for each new employee in the Employee table For example, too easy to enter Electrical Engr, or EE, or El Eng

Problem
Employee Table

E_No 101 102 103 104 104 105 106

E_Name John News David Senior June Arbough Anne Ramoras John Smith Alice White Bob Jones

Job_Class Elec Eng Comm Tech Biol Eng Comm Tech Comm Tech Biol Eng Elec Eng

New Attribute
Create a Job_Code attribute to serve as primary key in the Job table and as a foreign key in the Employee table

Changed Tables
Project Table P_No P_Name 1 Hurricane 2 Coast 3 Satellite Assign Table
P_No 1 1 1 2 2 3 3 E_No 101 102 104 101 103 104 102 Hrs 13 16 19 15 17 18 14

Employee Table
E_No 101 102 103 104 E_Name John News David Senior June Arbough Anne Ramoras Job_Code 502 501 500 501

Job Table

Job_Code 500 501 502

Job_Class Biol Eng Comm Tech Elec Eng

Chg_Hr 55 60 65

3NF Version
Vast improvement over original design No data anomalies
In the Job table each job code has single job class and charge per hour entry
No opportunities to use different values describing same object

Similarly for Employee & Project tables only one entry for each attribute Also, Assign table has only what is needed

Data redundancy has been minimized


Keys are redundant but these are small Assign table is very active but requires only the P_No, E_No, and hours

Other Normal Forms


3NF is the most appropriate form for most applications There are other normal forms 4NF Isolate independent multiple relationships 5NF Isolate semantically related multiple relationships These are advanced and go beyond the scope of the course

Summary
1NF Eliminate repeating groups 2NF Eliminate partial dependencies 3NF Eliminate transitive dependencies Tables are the critical building blocks for the database It is important to design their structure well Normalization is a formal process for doing this that reduces data redundancy and anomalies

End

References: New Perspectives on Microsoft Access 2000, Introductory, by Adamski, Hommel, and Finnegan, Course Technology, 1999. Access Database Design and Programming, Third Edition, by Roman, OReilly, 2002. Database Systems: Design, Implementation, and Management, by Rob & Coronel, Boyd & Fraser, 1995.

You might also like