Professional Documents
Culture Documents
and
Normalization
Chapter 14
Designing a database
So far, we have designed our base tables from a E-R diagram or by common sense
We still need some formal measure of why one group of attributes is a better base
table than another group.
This chapter discusses some of the theory of what a good design is
Gathering the data
Identify the data requirements of the users.
This sounds simple, and it CAN be, but usually isn't because users rarely can
clearly describe what they need.
In addition, the data processing staff, who are quite knowledgeable how the system
functions, often have tunnel vision, which inhibits creative thought.
Informal discussion of some of the criteria for good and bad relational
schema.
1. The meaning (semantics) of attributes
2. Redundant Information and Update Anomalies
3. Avoiding NULL values
4. Avoiding generation of spurious tuples
1. The meaning (semantics) of attributes
EMP_PROJ
SSN PNUMBER hours ename pname plocation
The second guideline is consistent with and in a way a restatement of the first.
We begin to see a need for a more formal approach to help us evaluate whether our
design meets these guidelines.
3. Avoiding NULL values
In some base tables we may group many attributes together into a "fat" relation
If many of the attributes do not apply to all tuples in the relation, we end up with many
NULL's in the relation.
This can be wasteful of storage space, and may also lead to problems with
understanding of the meaning of the attributes.
NULL's can have multiple interpretations
The attribute does not apply to this tuple
The attribute value for this tuple is unknown
The value is known but absent (has not been recorded yet)
Functional Dependency
This is a more formal, more objective design criteria.
Functional dependency is the single most important concept in relational
database design.
A FD is a property of the meaning or semantics of the attributes of a relation.
Before I give a formal definition, lets look at dependency informally first.
Figure out which attributes are dependent on other attributes.
Inferred FDs
Suppose F is the set of functional dependencies that are specified on relational schema
R
Typically, the schema designer specifies the FD that are semantically obvious
But there are other depenencies that can be inferred or deduced from the FDs in F
The closure F+ of F is the set of all functional dependencies that can be inferred from
F
Emp Emp Emp Dept Dept dept Skill Skill Skill Skill
ID Name Phone Name Phone Mgr ID Name date Level
Semantics
An employee works for a single department, but may have multiple skills
Skills have an ID, and a name; an employee takes a test at a certain date to establish
a skill level
Definition: Determinate
any attribute on which some other attribute is dependent
Do you agree with these FD?
empID {empName, empPhone, deptName}
What about empPhone {empID}
deptName{deptPhone, deptMgr}
skillID{skillName}
What about skillName {skillID} ?
empID, skillID{skillDate, skillLevel}
If you had a different set of FD, were they equivalent?
Normalization overview
Originally, E.F. Codd proposed three normal forms, called first, second, and third
normal forms
1NF, 2NF and 3NF are based on the functional dependencies among the
attributes of a relation
Later a stronger definition of 3NF was proposed by Boyce and Codd and is known
as Boyce-Codd normal form.
Later, 4th and 5th normal forms were proposed, based on other properties
In the normalization process, you start with a universal relation (one table with all the
attributes) and functional dependencies
Then you apply the normal form restrictions, and decompose the tables as they
specify.
Normalization of data
This is a process of analyzing the relation schemas based on functional dependencies
(FD) and PK to achieve two desirable properties
1) Minimizing data redundancy
2) Minimizing insertion, deletion and update anomalies
Using normal forms
When a test fails, the relation violating that test must be decomposed
All these normal forms were based on FD among the attributes of a relation
Disclaimers to use of normal forms:
Normal forms when considered by themselves do not guarantee good db design.
Sometimes the best design may not be the highest normal form, for performance
reasons.
Summary
In general, it is best to have a relational schema in BCNF
If that is not possible, 3NF will do
2NF and 1NF are not considered good relation schema designs.
They allow too much data redundancy which leads to update anomalies
Summary of Normal forms
A relation is in 1NF the domains of attributes must include only atomic
values
A relation is in 2NF if every nonkey attribute is fully functionally
dependent on a candidate key.