You are on page 1of 62

Database Design: Logical Models: Normalization and The Relational Model

University of California, Berkeley School of Information IS 257: Database Management

IS 257 Fall 2006

2006.09.14 - SLIDE 1

Announcements
I will be away next week Instead we will have an informal workshop to work on issues of choosing and designing your personal Databases

IS 257 Fall 2006

2006.09.14 - SLIDE 2

Lecture Outline
Review
Conceptual Model and UML

Logical Model for the Diveshop database Normalization Relational Advantages and Disadvantages
IS 257 Fall 2006

2006.09.14 - SLIDE 3

Lecture Outline
Review Logical Design for the Diveshop database Normalization Relational Advantages and Disadvantages

IS 257 Fall 2006

2006.09.14 - SLIDE 4

DiveShop ER Diagram
Customer No Destination Name Destination no

DiveCust
1

Customer No
n n 1

ShipVia

Dest
Destination no Site No
1 1

DiveOrds
Order No

ShipVia

ShipVia

Destination
n

Sites

n 1 1/n

Order No Item No

Site No Species No

DiveItem
n

BioSite
n

ShipWrck

Site No
1 1

Species No

BioLife

DiveStok

Item No
2006.09.14 - SLIDE 5

IS 257 Fall 2006

Lecture Outline
Review
Conceptual Model and UML

Logical Model for the Diveshop database Normalization Relational Advantages and Disadvantages
IS 257 Fall 2006

2006.09.14 - SLIDE 6

Database Design Process


Application 1 Application 2 Application 3 Application 4

External Model
Application 1

External Model

External Model

External Model

Conceptual requirements
Application 2

Conceptual requirements
Application 3

Conceptual requirements
Application 4

Conceptual Model

Logical Model

Internal Model

Conceptual requirements

IS 257 Fall 2006

2006.09.14 - SLIDE 7

Logical Model: Mapping to a Relational Model


Each entity in the ER Diagram becomes a relation. A properly normalized (next time) ER diagram will indicate where intersection relations for many-to-many mappings are needed. Relationships are indicated by common columns (or domains) in tables that are related. We will examine the tables for the Diveshop derived from the ER diagram

IS 257 Fall 2006

2006.09.14 - SLIDE 8

DiveShop ER Diagram
Customer No Destination Name Destination no

DiveCust
1

Customer No
n n 1

ShipVia

Dest
Destination no Site No
1 1

DiveOrds
Order No

ShipVia

ShipVia

Destination
n

Sites

n 1 1/n

Order No Item No

Site No Species No

DiveItem
n

BioSite
n

ShipWrck

Site No
1 1

Species No

BioLife

DiveStok

Item No
2006.09.14 - SLIDE 9

IS 257 Fall 2006

Customer = DIVECUST
Customer No Name Street City State/Prov Zip/Postal Code Country 1480 Louis Jazdzewski 2501 O'Connor Orleans New LA 60332 U.S.A. 1481 Barbara Wright W. Freeway 6344 San Francisco CA 95031 U.S.A. 1909 Stephen Bredenburg 167 Place IN 559 N.E. Indianapolis 46241 U.S.A. 1913 Phillip Davoust First Street 123 Berkeley CA 94704 U.S.A. 1969 David Burgett Montgomery Street 320 Seattle WA 98105 U.S.A. 2001 Mary Rioux1701 Gateway Blvd. #385 Pueblo CO 81002 U.S.A. 2306 Kim Lopez 14134 Nottingham Lane Honolulu HI 96826 U.S.A. 2589 Hiram Marley Mill Run Drive CA 7233 San Francisco 94123 U.S.A. 3154 Tanya Kulesa S. Flower, Mail Stop 48943 10032 505 New York NY U.S.A. 3333 Charles Sekaron 110 East Park Avenue, Box 8 Miller SD 57362 U.S.A. 3684 Lowell Lutz915 E. Fesler Dallas TX 75043 U.S.A. 4158 Keith Lucas South Euclid 56 Chicago IL 60542 U.S.A. 4175 Karen Ng 2134 Elmhill Pike Falls Klamath OR 97603 U.S.A. 5510 Ken Soule 58 Sansome Street CO Aurora 89022 U.S.A. Phone First Contact (902) 555-8888 /29/95 1 (415) 555-43212/2/93 (317) 555-36441/5/93 (415) 555-91843/9/98 (206) 555-7580 /12/99 3 (719) 555-2010 /15/97 3 (808) 555-5050 /29/99 1 (415) 555-6430 /18/99 2 (212) 555-6750 /30/99 1 (613) 555-4333 /16/98 3 (214) 555-2722 /15/99 2 (312) 555-4310 /17/98 3 (503) 555-4700 /20/99 3 (303) 555-66952/5/99

IS 257 Fall 2006

2006.09.14 - SLIDE 10

Dive Order = DIVEORDS


Order No Customer No Date Sale 307 1480 9/1/99 310 1481 9/1/99 313 1909 9/1/99 314 1913 9/1/99 317 1969 9/1/99 320 2001 9/1/99 321 2306 9/1/99 325 2589 9/1/99 326 3333 9/1/99 327 3684 9/1/99 329 4158 9/1/99 330 4175 9/1/99 331 5510 9/1/99 333 5926 9/1/99 336 5719 9/1/99 Ship Via UPS FedEx Walk In FedEx FedEx Walk In Emery Emery FedEx DHL Walk In FedEx FedEx DHL FedEx PaymentMethod CcNumber CcExpDateNo Of People Depart Date Return Date DestinationVacationCost Visa 12345 678 90 1/1/01 2 11/8/00 11/15/00 Fiji 10000 Check 1 4/4/00 4/18/00 Santa Barbara 6000 Visa 456456456 9/11/00 4 6/27/00 7/11/00 Cozumel 8000 Check 3 2/7/00 2/14/00 Monterey 6000 AmEx 432432432 12/31/02 4 5/9/00 5/16/00 Fiji 20000 Cash 1 10/10/00 10/17/00 Santa Barbara 3000 Master Card112223334 8/12/00 1 1 3/15/00 4/12/00 New Jersey 8000 AmEx 332332332 12/10/99 1 3/15/00 4/12/00 New Jersey 8000 Money Order 2 2/10/00 2/17/00 Monterey 4000 Master Card22122321 11/9/99 1 4 3/10/00 3/23/00 Florida 24000 Cash 1 5/4/00 5/15/00 Cozumel 1571 Check 2 7/3/00 7/10/00 Florida 6000 Money Order 6 6/20/00 6/30/00 Santa Barbara 36000 Discover 123123123 12/21/00 2 6/10/00 6/17/00 Fiji 10000 Cash 10 4/2/00 4/24/00 Great Barrier Reef 200000

IS 257 Fall 2006

2006.09.14 - SLIDE 11

Line item = DIVEITEM


Order No Item No 307 90010 307 90020 307 90021 307 90030 307 90051 310 90011 310 90045 310 90059 310 90074 310 90078 313 90127 314 90072 314 90094 314 90100 317 90012
IS 257 Fall 2006

Rental/Sale Qty Rental Rental Rental Rental Rental Rental Rental Rental Rental Rental Sale Rental Rental Rental Sale

Line Note 4 1 1 2 2 1 1 1 1 1 1 3 3 3 2 This is our most popular mask. These are our best selling fins.

A good weight belt for beginners

Holds 10 cubic feet of cargo.

2006.09.14 - SLIDE 12

Shipping information = SHIPVIA

Ship Via DHL Emery FedEx UPS US Mail

Ship Cost 8 11 12 10 6

IS 257 Fall 2006

2006.09.14 - SLIDE 13

Dive Equipment Stock= DIVESTOK


Item No 90010 90011 90012 90020 90021 90022 90023 90024 90025 90030 90031 90032 90033 90040 90041 90042
IS 257 Fall 2006

DescriptionEquipment On Hand Reorder Point Class Cost Sale Price Rental Price Shotgun 2 Snorkel - Clear 12 2 $18.00 $30.00 $2.00 Shotgun 2 Snorkel - Red 12 2 $18.00 $30.00 $2.00 Shotgun 2 Snorkel - Teal 11 2 $18.00 $30.00 $2.00 Tri-Vent Mask - Clear Mask 14 2 $62.50 $100.00 $5.00 Tri-Vent Mask - Red Mask 10 2 $62.50 $100.00 $5.00 Tri-Vent Mask - Teal Mask 14 2 $62.50 $100.00 $7.00 Quad Vision Mask - Clear Mask 11 2 $48.25 $80.00 $7.00 Quad Vision Mask - Red Mask 13 2 $48.25 $80.00 $7.00 Quad Vision Mask - Teal Mask 10 2 $48.25 $80.00 $10.00 Sea Wing Fins - Clear Fins 12 2 $60.00 $100.00 $12.00 Sea Wing Fins - Red Fins 11 2 $60.00 $100.00 $12.00 Sea Wing Fins - Teal Fins 12 2 $60.00 $100.00 $12.00 Jet Fin - Black Fins 14 2 $30.00 $60.00 $10.00 D350 Second Stage Regulator 11 1 $162.50 $270.00 $20.00 G250 Second Stage Regulator 13 1 $144.50 $240.00 $20.00 G200 Second Stage Regulator 12 1 $105.25 $175.00 $20.00
2006.09.14 - SLIDE 14

Dive Locations = DEST

Destination No Destination Name Avg Temp Avg Temp Spring Temp (F) Temp (C) Temp (F) Temp Temp (F) Temp (C) (F) (C) Spring Summer Summer Fall (C) Fall Winter Temp (F) Temp (C) Winter Accomodations Life Night 1 Cozumel 78 25.556 76 24.444 84 28.889 78 25.556 74 23.333 Cheap Sleepy 2 Great Barrier Reef80 26.667 76 24.444 84 28.889 78 25.556 76 24.444 Moderate Pleasant 3 Monterey 60 15.556 62 16.667 64 17.778 64 17.778 58 14.444 Expensive Wild 4 Santa Barbara 75 23.889 73 22.777 78 25.556 72 22.222 70 21.111 Expensive Wild 5 Florida 77 25 75 23.889 85 29.444 78 25.556 70 21.111 Moderate Pleasant 6 Fiji 75 23.889 76 24.444 80 26.667 74 23.333 70 21.111 Expensive Sleepy 7 New Jersey 57 13.889 57 13.89 60 15.556 58 14.444 53 11.667 Expensive Pleasant

Body of Water Travel Cost Caribbean 1000 Coral Sea 5000 Pacific 2000 Pacific 3000 Caribbean 3000 South Pacific 5000 Atlantic 2000

IS 257 Fall 2006

2006.09.14 - SLIDE 15

Dive Sites = SITE

Site No Destination No Name Site Site Highlight ite Notes from Depth (m) S Distance Distance from (ft)Depth (m) Visibility (ft)Visibility (m) Town Town (km) Current 1001 1 Palancar Reef Reef 10 16.09 100 30.48 150 45.72 Strong 1002 1 Santa Rosa Reef Reef 8 12.87 80 24.384 150 45.72 Strong 1003 1 Chancanab Reef eef R 4 6.437 60 18.288 100 30.48 Mild 1004 1 Punta Sur Reef 13 20.92 120 36.576 175 53.34 Strong 1005 1 Yocab Reef Reef 6 9.656 50 15.24 100 30.48 Mild 2001 2 Heron Island Reef 50 80.47 90 27.432 150 45.72 Mild 2002 2 Cod Hole Fish 45 72.42 50 15.24 150 45.72 Mild 2003 2 Butterfly Bay Caves 20 32.19 70 21.336 70 21.336 None 2004 2 Wheeler Reef Marine Life 30 48.28 50 15.24 125 38.1 Mild 2005 2 Watanabe Marine Life 130 209.2 150 45.72 200 60.96 None 3001 3 Point Lobos Marine Life 3 4.828 60 18.288 75 22.86 None 3002 3 Macabee BeachMarine Life 0.1 0.161 40 12.192 40 12.192 None 3003 3 Pinnacles Pinnacle 1 1.609 60 18.288 50 15.24 Mild 3004 3 Monastery Beach arine Life M 3 4.828 50 15.24 40 12.192 Surge

Skill Level Intermediate Intermediate Beginning Advanced Beginning Intermediate Beginning Advanced Beginning Intermediate Beginning Beginning Beginning Beginning

IS 257 Fall 2006

2006.09.14 - SLIDE 16

Sea Life = BIOLIFE


Species No ategory Common Name Species Name Length (cm) C Length (in) Notes Graphic 90020 TriggerfishClown TriggerfishBallistoides conspicillum 19.685 50 90030 Snapper Red Emperor Lutjanus sebae 60 23.622 90050 Wrasse Giant Maori Wrasse Cheilinus undulatus 229 90.157 90070 Angelfish Blue Angelfish Pomacanthus nauarchus 11.811 30 90080 Cod Lunartail Rockcod Variola louti 80 31.496 90090 Scorpionfish Firefish Pterois volitans 38 14.961 90100 Butterflyfish Ornate Butterflyfish haetodon Ornatissimus9 7.4803 C 1 90110 Shark Swell Shark Cephaloscyllium ventriosum 102 40.157 90120 Ray Bat Ray Myliobatis californica 56 22.047 90130 Eel California Moray Gymnothorax mordax 150 59.055 90140 Cod Lingcod Ophiodon elongatus 150 59.055

IS 257 Fall 2006

2006.09.14 - SLIDE 17

BIOSITE -- linking relation


Species No Site No 90010 2001 90010 2002 90010 2003 90010 2004 90010 2005 90010 6001 90010 6003 90010 6004 90010 6005 90020 2001 90020 2002
IS 257 Fall 2006 2006.09.14 - SLIDE 18

Shipwrecks = SHIPWRK

Ship Name Site No Delaware 7007 F.S.Loop 4004 Gosford 4001 Great Isaac 7002 Lizzie D 7001 Mohawk 7004 R.P. Resor 7006 Star of Scotland 4002 Tolten 7008 USS Moody 4006 Valiant 4003

Category Type Interest TonnageLength (ft) Length (m) Beam (ft) Beam (m) Commercial Steam Freighter Treasure 1646 252 76.8096 37 11.2776 Commercial Steam Schooner Machinery 794 193 58.8264 39 11.8872 Commercial Barque Rigged Sail Fixture 2250 280 85.344 42 12.8016 Commercial Seagoing Tug Fixture 1117 185 56.388 37 11.2776 Commercial Tug/Rumrunner Treasure 122 84 25.6032 21 6.4008 PassengerOcean Liner Treasure 8140 402 122.5296 54 16.4592 Commercial Tanker Treasure Oil 7450 435 132.588 66.8 20.36064 PassengerBritish Q-Boat Treasure 1250 263 80.1624 35 10.668 Commercial Freighter Fixture 1858 280 85.344 43 13.1064 Military WWI Destroyer Treasure 1308 314 95.7072 31 9.4488 PassengerLuxury Motor Yacht Treasure 444 162.4 49.49952 26 7.9248

Cause Date Sunk Comments Survivors Passengers/Crew Condition Graph Fire 66 66 Broken Deliberate 1/1/47 0 Scattered Fire Intact Collision 4/16/47 27 27 Intact Unknown 10/19/22 8 0 Intact Collision 1/25/35 163 118 Scattered Military 2/28/42 50 2 Broken Weather 1/22/42 5 4 Broken Military 3/13/42 28 1 Intact Deliberate 1/1/33 0 Intact Fire 12/17/30 25 25 Intact

IS 257 Fall 2006

2006.09.14 - SLIDE 19

Mapping to Other Models


Hierarchical
Need to make decisions about access paths

Network
Need to pre-specify all of the links and sets

Object-Oriented
What are the objects, datatypes, their methods and the access points for them

Object-Relational
Same as relational, but what new datatypes might be needed or useful (more on OR later)
IS 257 Fall 2006 2006.09.14 - SLIDE 20

Lecture Outline
Review Logical Model for the Diveshop database Normalization Relational Advantages and Disadvantages

IS 257 Fall 2006

2006.09.14 - SLIDE 21

Normalization
Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting data than other sets of relations containing the same data Normalization is a multi-step process beginning with an unnormalized relation
Hospital example from Atre, S. Data Base: Structured Techniques for Design, Performance, and Management.
IS 257 Fall 2006 2006.09.14 - SLIDE 22

Normal Forms
First Normal Form (1NF) Second Normal Form (2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)

IS 257 Fall 2006

2006.09.14 - SLIDE 23

Normalization

No transitive dependency between nonkey attributes

All determinants are candidate keys - Single multivalued dependency

BoyceCodd and Higher

Functional dependency of nonkey attributes on the primary key - Atomic values only Full Functional dependency of nonkey attributes on the primary key

IS 257 Fall 2006

2006.09.14 - SLIDE 24

Unnormalized Relations
First step in normalization is to convert the data into a two-dimensional table In unnormalized relations data can repeat within a column

IS 257 Fall 2006

2006.09.14 - SLIDE 25

Unnormalized Relation
Patient # Surgeon # Surg. date Patient Name Patient Addr Surgeon Surgery Postop drug Drug side effects Jan 1, 1995; June 12, 1995 John White 15 New St. New York, NY Gallstone s removal; Beth Little Kidney Michael stones Penicillin, Diamond removal none-

145 1111 311

rash none

243 1234 467

Apr 5, 1994 May 10, 1995

Mary Jones

2345 189

Jan 8, 1996

Charles Brown

4876 145

Nov 5, 1995 May 10, 1995

Hal Kane

5123 145

Paul Kosher

Charles Field 10 Main St. Patricia Rye, NY Gold Dogwood Lane Harrison, David NY Rosen 55 Boston Post Road, Chester, CN Beth Little Blind Brook Mamaronec k, NY Beth Little

Eye Cataract removal Thrombos Tetracyclin Fever is removal e none none Open Heart Surgery

Cephalosp orin none

6845 243
IS 257 Fall 2006

Apr 5, 1994 Dec 15, 1984

Ann Hood

Hilton Road Larchmont, Charles NY Field

Cholecyst ectomy Gallstone s Removal Eye Cornea Replacem ent Eye cataract removal

Demicillin

none

none

none

Tetracyclin e Fever
2006.09.14 - SLIDE 26

First Normal Form


To move to First Normal Form a relation must contain only atomic values at each row and column.
No repeating groups A column or set of columns is called a Candidate Key when its values can uniquely identify the row in the relation.

IS 257 Fall 2006

2006.09.14 - SLIDE 27

First Normal Form


Patient # Surgeon # Surgery DatePatient Name Patient Addr Surgeon Name Surgery Drug adminSide Effects 15 New St. New York, NY 15 New St. New York, NY 10 Main St. Rye, NY 10 Main St. Rye, NY Dogwood Lane Harrison, NY Gallstone s removal Kidney stones removal Eye Cataract removal

1111

145

01-Jan-95 John White

Beth Little Michael Diamond

Penicillin

rash

1111

311

12-Jun-95 John White

none Tetracyclin e

none

1234

243

05-Apr-94 Mary Jones

Charles Field

Fever

1234

467

10-May-95 Mary Jones

Patricia Gold

Thrombos is removal none Open Heart Surgery

none

2345

189

Charles 08-Jan-96 Brown

David Rosen

Cephalosp orin

none

4876

145

05-Nov-95 Hal Kane

55 Boston Post Road, Chester, CN

Beth Little

Cholecyst ectomy Demicillin Gallstone s Removal none Eye Cornea Replacem Tetracyclin ent e Eye cataract removal

none

5123

145

Blind Brook Mamaronec 10-May-95 Paul Kosher k, NY Beth Little Hilton Road Larchmont, NY Hilton Road Larchmont, NY

none

6845

243

05-Apr-94 Ann Hood

Charles Field

Fever

6845

243

15-Dec-84 Ann Hood

Charles Field

none

none

IS 257 Fall 2006

2006.09.14 - SLIDE 28

1NF Storage Anomalies


Insertion: A new patient has not yet undergone surgery -- hence no surgeon # -- Since surgeon # is part of the key we cant insert. Insertion: If a surgeon is newly hired and hasnt operated yet -- there will be no way to include that person in the database. Update: If a patient comes in for a new procedure, and has moved, we need to change multiple address entries. Deletion (type 1): Deleting a patient record may also delete all info about a surgeon. Deletion (type 2): When there are functional dependencies (like side effects and drug) changing one item eliminates other information.
IS 257 Fall 2006 2006.09.14 - SLIDE 29

Second Normal Form


A relation is said to be in Second Normal Form when every nonkey attribute is fully functionally dependent on the primary key.
That is, every nonkey attribute needs the full primary key for unique identification

IS 257 Fall 2006

2006.09.14 - SLIDE 30

Second Normal Form


Patient # 1111 1234 2345 4876 5123 6845 Patient Name Patient Address 15 New St. New John White York, NY 10 Main St. Rye, Mary Jones NY Charles Dogwood Lane Brown Harrison, NY 55 Boston Post Hal Kane Road, Chester, Blind Brook Paul Kosher Mamaroneck, NY Hilton Road Ann Hood Larchmont, NY

IS 257 Fall 2006

2006.09.14 - SLIDE 31

Second Normal Form


Surgeon # Surgeon Name

145 Beth Little 189 David Rosen 243 Charles Field 311 Michael Diamond 467 Patricia Gold

IS 257 Fall 2006

2006.09.14 - SLIDE 32

Second Normal Form


Patient # Surgeon # Surgery Date 1111 1111 1234 1234 2345 4876 5123 6845 6845 145 311 243 467 189 145 145 243 243 Surgery Drug Admin Side Effects Penicillin none rash none Gallstones 01-Jan-95 Kidney removal stones 12-Jun-95 removal Eye Cataract 05-Apr-94 removal Thrombosis 10-May-95 removal Open Heart 08-Jan-96 Surgery Cholecystect 05-Nov-95 omy Gallstones 10-May-95 Removal Eye cataract 15-Dec-84 removal Eye Cornea 05-Apr-94 Replacement

Tetracycline Fever none none

Cephalospori n none Demicillin none none none none none

Tetracycline Fever

IS 257 Fall 2006

2006.09.14 - SLIDE 33

1NF Storage Anomalies Removed


Insertion: Can now enter new patients without surgery. Insertion: Can now enter Surgeons who havent operated. Deletion (type 1): If Charles Brown dies the corresponding tuples from Patient and Surgery tables can be deleted without losing information on David Rosen. Update: If John White comes in for third time, and has moved, we only need to change the Patient table
IS 257 Fall 2006 2006.09.14 - SLIDE 34

2NF Storage Anomalies


Insertion: Cannot enter the fact that a particular drug has a particular side effect unless it is given to a patient. Deletion: If John White receives some other drug because of the penicillin rash, and a new drug and side effect are entered, we lose the information that penicillin can cause a rash Update: If drug side effects change (a new formula) we have to update multiple occurrences of side effects.

IS 257 Fall 2006

2006.09.14 - SLIDE 35

Third Normal Form


A relation is said to be in Third Normal Form if there is no transitive functional dependency between nonkey attributes
When one nonkey attribute can be determined with one or more nonkey attributes there is said to be a transitive functional dependency.

The side effect column in the Surgery table is determined by the drug administered
Side effect is transitively functionally dependent on drug so Surgery is not 3NF

IS 257 Fall 2006

2006.09.14 - SLIDE 36

Third Normal Form


Patient # Surgeon # Surgery Date 1111 1111 1234 1234 2345 4876 5123 6845 6845
IS 257 Fall 2006

Surgery

Drug Admin Penicillin none

145 311 243 467 189 145 145 243 243

01-Jan-95 Gallstones removal Kidney stones 12-Jun-95 removal

05-Apr-94 Eye Cataract removal Tetracycline 10-May-95 Thrombosis removal 08-Jan-96 Open Heart Surgery 05-Nov-95 Cholecystectomy 10-May-95 Gallstones Removal 15-Dec-84 Eye cataract removal Eye Cornea 05-Apr-94 Replacement none Cephalosporin Demicillin none none Tetracycline
2006.09.14 - SLIDE 37

Third Normal Form

Drug Admin Cephalosporin Demicillin none Penicillin Tetracycline


IS 257 Fall 2006

Side Effects none none none rash Fever


2006.09.14 - SLIDE 38

2NF Storage Anomalies Removed Insertion: We can now enter the fact that a particular drug has a particular side effect in the Drug relation. Deletion: If John White recieves some other drug as a result of the rash from penicillin, but the information on penicillin and rash is maintained. Update: The side effects for each drug appear only once.
IS 257 Fall 2006 2006.09.14 - SLIDE 39

Boyce-Codd Normal Form


Most 3NF relations are also BCNF relations. A 3NF relation is NOT in BCNF if:
Candidate keys in the relation are composite keys (they are not single attributes) There is more than one candidate key in the relation, and The keys are not disjoint, that is, some attributes in the keys are common

IS 257 Fall 2006

2006.09.14 - SLIDE 40

Most 3NF Relations are also BCNF Is this one?


Patient # Patient Name Patient Address 15 New St. New 1111 John White York, NY 10 Main St. Rye, 1234 Mary Jones NY Charles Dogwood Lane 2345 Brown Harrison, NY 55 Boston Post 4876 Hal Kane Road, Chester, Blind Brook 5123 Paul Kosher Mamaroneck, NY Hilton Road 6845 Ann Hood Larchmont, NY
IS 257 Fall 2006 2006.09.14 - SLIDE 41

BCNF Relations

Patient # Patient Name 1111 John White 1234 Mary Jones Charles 2345 Brown 4876 Hal Kane 5123 Paul Kosher 6845 Ann Hood
IS 257 Fall 2006

Patient # 1111 1234 2345 4876 5123 6845

Patient Address 15 New St. New York, NY 10 Main St. Rye, NY Dogwood Lane Harrison, NY 55 Boston Post Road, Chester, Blind Brook Mamaroneck, NY Hilton Road Larchmont, NY
2006.09.14 - SLIDE 42

Fourth Normal Form


Any relation is in Fourth Normal Form if it is BCNF and any multivalued dependencies are trivial Eliminate non-trivial multivalued dependencies by projecting into simpler tables

IS 257 Fall 2006

2006.09.14 - SLIDE 43

Fifth Normal Form


A relation is in 5NF if every join dependency in the relation is implied by the keys of the relation Implies that relations that have been decomposed in previous NF can be recombined via natural joins to recreate the original relation.

IS 257 Fall 2006

2006.09.14 - SLIDE 44

Effectiveness and Efficiency Issues for DBMS Focus on the relational model Any column in a relational database can be searched for values. To improve efficiency indexes using storage structures such as BTrees and Hashing are used But many useful functions are not indexable and require complete scans of the the database
IS 257 Fall 2006 2006.09.14 - SLIDE 45

Example: Text Fields


In conventional RDBMS, when a text field is indexed, only exact matching of the text field contents (or Greater-than and Lessthan).
Can search for individual words using pattern matching, but a full scan is required.

Text searching is still done best (and fastest) by specialized text search programs (Search Engines) that we will look at more later.
IS 257 Fall 2006 2006.09.14 - SLIDE 46

Normalization
Normalization is performed to reduce or eliminate Insertion, Deletion or Update anomalies. However, a completely normalized database may not be the most efficient or effective implementation. Denormalization is sometimes used to improve efficiency.

IS 257 Fall 2006

2006.09.14 - SLIDE 47

Normalizing to death
Normalization splits database information across multiple tables. To retrieve complete information from a normalized database, the JOIN operation must be used. JOIN tends to be expensive in terms of processing time, and very large joins are very expensive.

IS 257 Fall 2006

2006.09.14 - SLIDE 48

Downward Denormalization
Before:
Customer ID Address Name Telephone

After:

Customer ID Address Name Telephone

Order Order No Date Taken Date Dispatched Date Invoiced Cust ID

Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Cust Name
2006.09.14 - SLIDE 49

IS 257 Fall 2006

Upward Denormalization
Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Cust Name Order Item Order No Item No Item Price Num Ordered Order Order No Date Taken Date Dispatched Date Invoiced Cust ID Cust Name Order Price Order Item Order No Item No Item Price Num Ordered
2006.09.14 - SLIDE 50

IS 257 Fall 2006

Denormalization
Usually driven by the need to improve query speed Query speed is improved at the expense of more complex or problematic DML (Data manipulation language) for updates, deletions and insertions.

IS 257 Fall 2006

2006.09.14 - SLIDE 51

Using RDBMS to help normalize


Example database: Cookie Database of books, libraries, publisher and holding information for a shared (union) catalog

IS 257 Fall 2006

2006.09.14 - SLIDE 52

Cookie relationships

IS 257 Fall 2006

2006.09.14 - SLIDE 53

Cookie BIBFILE relation


ACCNO A003 T082 C024 B006 B007 B005 B008 B010 B009 B012 B011 B014 B013 B016 B017 F047 B116 S102 B118 B018 C031 C032 C034 AUTHOR TITLE LOC PUBID DATE PRICE AMERICAN LIBRARY ASSOCIATION ALA BULLETIN CHICAGO 04 $3.00 ANDERSON, THEODOREHE TEACHING OF MODERN LANGUAGES T PARIS 53 1955 $10.95 AXT, RICHARD G. COLLEGE SELF STUDY : LECTURES ON INSTITU BOULDER, CO. 51 1960 $7.00 BALDERSTON, FREDERICK E. MANAGING TODAYS SAN FRANCISCO 27 UNIVERSITY 1975 $6.00 BARZUN, JACQUES TEACHER IN AMERICA GARDEN CITY 18 1954 $7.00 BARZUN, JACQUES THE AMERICAN UNIVERSITY : HOW IT 24 NEW YORK RUNS, W 1970 $5.00 BARZUN, JACQUES THE HOUSE OF INTELLECT NEW YORK 24 1961 $8.00 BELL, DANIEL THE COMING OF POST-INDUSTRIAL SOCIETY 1976 NEW YORK 09 : $10.00 BENSON, CHARLES S. IMPLEMENTING THE SAN FRANCISCO 27 LEARNING SOCIETY 1974 $9.00 BERG, IVAR EDUCATION AND JOBS : THE GREAT TRAINING BOSTON 10 1971 $12.00 BERSI, ROBERT M. RESTRUCTURING THE BACCALAUREATE WASHINGTON, D.C. 03 1973 $11.00 BEVERIDGE, WILLIAM I.THE ART OF SCIENTIFIC INVESTIGATION NEW YORK 58 1957 $14.00 BIRD, CAROLINE THE CASE AGAINST NEW YORK COLLEGE 08 1975 $13.00 BISSELL, CLAUDE T. THE STRENGTH OF THE UNIVERSITY 57 TORONTO 1968 $14.00 BLAIR, GLENN MYERS EDUCATIONAL PSYCHOLOGY NEW YORK 30 1962 $11.00 BLAKE, ELIAS, JR. THE FUTURE OF THECAMBRIDGE, MA.02 BLACK COLLEGES 1971 $14.25 BOLAND, R.J. CRITICAL ISSUES IN INFORMATION ENG. CHICHESTER, SYSTEMS 1987 63 R $30.95 BROWN, SANBORN C., SCIENTIFIC MANPOWER ED. CAMBRIDGE, MASS. 29 1971 $4.00 BUCKLAND, MICHAEL K. LIBRARY SERVICES ELMSFORD,AND CONTEXT IN THEORY NY 70 1983 $12.00 BUDIG, GENE A. ACADEMIC QUICKSAND : SOME TRENDS AND ISS LINCOLN, NEBRASKA 37 1973 $13.00 CALIFORNIA. DEPT. OF JUSTICE LAW IN THE SCHOOL MONTCLAIR, N.J. 35 1974 $0.50 CAMPBELL, MARGARET A. WOULD A GIRLOLD INTO MEDICINE? WHY GO WESTBURY, 48 N.Y. 1973 $1.50 CARNEGIE COMMISSION DIGEST OF REPORTS OF THE CARNEGIE COMM A ON HIGHER NEW YORK 30 1974 $3.50 PAGINATION ILL HEIGHT 63 V. ILL. 26 294 P. 22 X, 300 P. GRAPHS 28 XVI, 307 P. 24 280 P. 18 XII, 319 P. 20 VIII, 271 P. 21 XXVII, 507 P. 21 XVII, 147 P. 24 XX, 200 P. 21 IV, 160P. 23 XIV, 239 P. 18 XII, 308 P. 18 VII, 251 P. 21 678 P. 24 VIII, PP. 539 23 XV, 394 P. ILL. 24 X, 180 P. 26 XII, 201 P. ILL. 23 74 P. 23 IV, 87 P. 21 V, 114 P. 24 399 P. 24

IS 257 Fall 2006

2006.09.14 - SLIDE 54

How to Normalize?
Currently no way to have multiple authors for a given book, and there is duplicate data spread over the BIBFILE table Can we use the DBMS to help us normalize? Access example

IS 257 Fall 2006

2006.09.14 - SLIDE 55

Database Creation in Access


Simplest to use a design view
wizards are available, but less flexible

Need to watch the default values Helps to know what the primary key is, or if one is to be created automatically
Automatic creation is more complex in other RDBMS and ORDBMS

Need to make decision about the physical storage of the data


IS 257 Fall 2006 2006.09.14 - SLIDE 56

Database Creation in Access


Some Simple Examples

IS 257 Fall 2006

2006.09.14 - SLIDE 57

Lecture Outline
Review Logical Model for the Diveshop database Normalization Relational Advantages and Disadvantages

IS 257 Fall 2006

2006.09.14 - SLIDE 58

Advantages of RDBMS
Relational Database Management Systems (RDBMS) Possible to design complex data storage and retrieval systems with ease (and without conventional programming). Support for ACID transactions
Atomic Consistent Independent Durable
IS 257 Fall 2006 2006.09.14 - SLIDE 59

Advantages of RDBMS
Support for very large databases Automatic optimization of searching (when possible) RDBMS have a simple view of the database that conforms to much of the data used in business Standard query language (SQL)

IS 257 Fall 2006

2006.09.14 - SLIDE 60

Disadvantages of RDBMS
Until recently, no real support for complex objects such as documents, video, images, spatial or time-series data. (ORDBMS add -- or make available support for these) Often poor support for storage of complex objects from OOP languages (Disassembling the car to park it in the garage) Usually no efficient and effective integrated support for things like text searching within fields (MySQL does have simple keyword searching now with index support)
IS 257 Fall 2006 2006.09.14 - SLIDE 61

Next Week
Database Design Workshop

IS 257 Fall 2006

2006.09.14 - SLIDE 62

You might also like