You are on page 1of 150

Data Modeling and Database Design

Minder Chen, Ph.D. mchen@gmu.edu


Team Team number Specialty Customer Customer Customer Customer Customer Customer Customer number name address activity telephone fax

member

is a member of

Division Division number Division name Division address


belongs to

Employee Employee number First name Last name Employee function Employee salary

subcontract

staffed by

is assigned to

Project Project number Project name Project label Start date End date

Task
Task name Task cost
contains

Data Modeling and Database Design Course Outline


INTRODUCTION Introduction to Data Modeling Database Development Life Cycle Overview ENTITY AND RELATIONSHIP Develop the Subject Area Diagram Develop Preliminary Data Model: Entity & Relationship Identification ATTRIBUTES AND SUBTYPES Attributes Identification and Definition Develop Fully Attributed Data Model Identifiers Data Modeling Exercise Partitioning and Entity Subtypes NORMALIZATION Normalization Normalization Exercise De-normalization DATA MODEL EVALUATION AND MAPPING TO RELATIONAL DBMS Refine a Data Model: Analysis and Simplification Transform to Physical Data Base Design PowerDesigner: Data Architect Pysical DB Design and Data Warehouse DB Design
Minder Chen, 1993~2002 Data Modeling - 2 -

References
Data Modeling and Database Design 1. Batini, Ceri, Navathe, Conceptual Database Design, Redwood City, CA: The Benjamin/Cummings Publishing Company, Inc., 1992. 2. Teorey, T. J., Database Modeling and Design: The Entity-Relationship Approach, Morgan Kaufmann Publishers, Inc., 1990. 3. Thomas A. Bruce, Designing Quality Databases with IDEF1X Information Models, Dorset House Publishing, NY: New York, 1991. 4. Texas Instruments, A Guide to IE Using IEF, 2nd edition, Part No. 2739756-0001, 1990. 5. Martin, James, Information Engineering Book II: Planning and Analysis, Prentice-Hall Inc., 1989. 6. Dave Ensor, Ian Stevenson, Oracle Design, O'Reilly & Associates, 1997 7. Rob Gillette, etc., Physical Database Design for Sybase SQL Server, Prentice Hall, 1995. 8. Ralph Kimball, The Data Warehouse Toolkit, Wiley, 1996. JAD References 1. August, J. H.. Joint Application Design: The Group Session Approach to System Design. Englewood Cliffs, NY, Prentice Hall, Inc., 1991. 2. Wood, J. and Silver, D. Joint Application Design: How to Design Quality Systems in 40% Less Time. New York, NY, John Wiley & Sons, 1989. 3. Andrews, D. C. and Leventhal, N. S., Fusion: Integrating IE, CASE, and JAD: A Handbook for Reengineering the Systems Organization, Englewood Cliffs, NJ: Yourdon Press, 1993.
Minder Chen, 1993~2002 Data Modeling - 3 -

Data Modeling and Database Design: INTRODUCTION

Systems Development Life Cycle (SDLC) in a Client/Server Environment Introduction to Data Modeling Database Development Life Cycle Overview

Minder Chen, 1993~2002

Data Modeling - 4 -

Rationales for Data Modeling


Data is the foundation of modern information systems enabled by data base technologies. Data in an organization exist and can be described independently of how these data are used. Data should be managed as a corporate-wide resource. The types of data used in an organization do not change very much. Data have certain inherent properties which lead to correct structuring. If we structure data according to their inherent properties, the structure (i.e., data models) will be stable.
Minder Chen, 1993~2002 Data Modeling - 5 -

History of Data Modeling


Importance of Entity-Relationship Modeling Technique
Database Data modeling and enterprise-wide data Data quality Data updating and accessing tools and procedure Data sharing culture

ER modeling technique was first developed by Peter Chen in 1976


A conceptual/logical data modeling tool A user-oriented approach A graphic-based method

ER modeling technique is the major data modeling method in Information Engineering and is widely supported by most of CASE tools. Data modeling is the foundation of most database-centered transaction processing systems and data warehouse systems
Minder Chen, 1993~2002 Data Modeling - 6 -

CSC Development Strategies


HIGH RE-CREATE new business process & systems from scratch
Risk Long Term Reward Short Term Costs Degree of Change

RE-ENGINEER business process & systems RE-DESIGN current systems RE-HOST current systems RE-IMAGE current systems
Data Modeling - 7 -

LOW
Minder Chen, 1993~2002

Distribution of Business Function (Logic)


Presentation Space
Presentation Presentation Service Logic Function Logic Data Logic Data Service

Data Space

Client
Presentation logic Local input validation Output production logic Local peripheral drivers Performance critical processing

Server

Functions that access data on the server Functions that need input from multiple users Functions that coordinate the work of several user

Issues:

Minder Chen, 1993~2002

Distribution of data Platform-specific capabilities and interoperability Connectivity capabilities/platform Frequency of change to codes Configuration management
Data Modeling - 8 -

C/S Development Methodology


SDLC C/S Architecture User Interface Conceptual Analysis Work Flow rules=> Logical Design Form Sequences performance => Physical Design Forms, Screens

Application Logic

Process Flow

Object Interaction Model Database Schema

Programs, Procedures

Information & Data Base

Data Model

Tables, Indexes

Source: David Vaskevitch, Client/Server Strategies, IDG Books, 1993. Data Modeling - 9 Minder Chen, 1993~2002

Client/Server Application Development Methodology


Requirements

Information & Data Base

Processes Behavior

Workflow User Interface

Architecture

Application Design and Development


Source: David Vaskevitch, Client/Server Strategies, IDG Books, 1993. Minder Chen, 1993~2002 Data Modeling - 10 -

Data Modeling (Data Base Design) Process


Information Requirements Conceptual DB Design A conceptual DB schema is a highlevel description of the database, independent of the particular DBMS.

Conceptual (Enterprise) DB Schema Logical DB Design A logical DB schema is a description of the structure of the database that can be processed by a DBMS: relational, network, or hierarchical. A physical DB schema is a description of the implementation of the database in external memory; it describes the storage structures and access methods used in order to effectively access and maintain data.

Logical DB Schema
Physical DB Design

Physical DB Schema

Source: Batini, C., Ceri, S., and Navathe, S. B., Conceptual Database Design: An EntityRelationship Approach, The Benjamin/Cummings Publishing Company, Inc., 1992. Data Modeling - 11 Minder Chen, 1993~2002

Multiple Perspectives
We use this data

ONE BUSINESS

We do these things

DATA

ACTIVITY

HIRE EMPLOYEE

PAY EMPLOYEE

EMPLOYEE PROMOTE EMPLOYEE

...... ...... .... ....

Minder Chen, 1993~2002

FIRE EMPLOYEE

Data Modeling - 12 -

Data Model (Entity Relationship Diagram)


Member Order placed by; places is enrolled under; applies to

Member

Agreement

sells; is sold on

generates; generated by

established by; established

Product

is featured in; features

Promotion

sponsors; is sponsored by

Club

Minder Chen, 1993~2002

Data Modeling - 13 -

Entity Relationship Diagram: Subject Area and Entity Type

Subject Area and Subject Area Diagram Entity Types Entity Instances Finding Entity Types Evaluating Entity Types

Minder Chen, 1993~2002

Data Modeling - 14 -

Subject Area (Submodel)


A natural area of interest to the business that is centered on a major resource, inputs, outputs, or activity of the business. It contains a set of entity types. We start the data modeling in the ISP stage by identifying subject areas with names and descriptions. In BAA stage, subject areas are used to as high level grouping of entity types. Naming: a subject area is a noun in plural form and often has the name as the central entity type in the subject area. Examples: Projects
Project Member Task

Project
Minder Chen, 1993~2002 Data Modeling - 15 -

Subject Area Diagram

Customers

Products Orders

Raw-materials Suppliers Purchase Orders

Sales-persons Buyers
Legends

: Subject Area

: Association
Minder Chen, 1993~2002 Data Modeling - 16 -

Entity Types
Definition:
An entity is an object or event, real or abstract, about which we would like to store data. Entity is the abbreviation of entity type. It represent a set of entity instances which can be described by the same set of attribute types. The value of the same attribute for each entity instance may be different.

Identifying Entity Types


What information is required by the business? Things that are of interest to the business that need to be remembered in order to manage and track them. Things belong to the same entity type have common characteristics.
Minder Chen, 1993~2002 Data Modeling - 17 -

Naming Entity Types


The name of each entity is in singular form
a noun an adjective + a noun a noun + a noun => (noun string) an adjective + a noun + a noun

Examples
Customer, Customer Order, Product, Hourly Employee, Project, Department, Unfilled Customer Order

Be clear and concise Avoid abbreviation Be consist with users terminology Identify synonyms
Customer Product Supplier Teacher Client

Merchandise Vendor Faculty

Use one name as the official name and document others as aliases
Minder Chen, 1993~2002 Data Modeling - 18 -

Exercise: Entity Type Naming

Courses Department Customer Order PO

Minder Chen, 1993~2002

Data Modeling - 19 -

Properties of Entity Types Name Description Identifier Properties: Estimated number (Max., Min., Average) of entity instances Expected growth rate of entity instances Subject Area in which the Entity Type resides Attributes that describe the Entity Types Examples of entity type instances
Data Modeling - 20 -

Minder Chen, 1993~2002

Definition of an Entity Type A poor definition of Customer: Anyone that buys something from the company.
Can employees be a customer? Can a leasor be a customer? If the company sold a subsidiary to another company, does the new owner consider a customer?

Good definition should be:


Compatible Precise Concise Clear Complete
Data Modeling - 21 -

Minder Chen, 1993~2002

Good Definition
Compatible
Customer: An ORGANIZATION that purchase PRODUCTs for personal use. Distributor: An ORGANIZATION that purchase PRODUCTs for resale.

Precision:
With appropriate qualifiers Example: An ORGANIZATION is considered to have purchase a PRODUCT when we receive a valid PURCHASE ORDER from it.

Complete
ORGANIZATION, PRODUCT, PURCHASE ORDER need to be defined.

Concise and Clear


Use modular definition
Minder Chen, 1993~2002 Data Modeling - 22 -

Example of Entity Type Descriptions

Entity Type
Customer Product Raw-material Supplier Buyer

Description
Information about all persons or organizations who purchases

All goods manufactured and sold Components used to manufacture Products. Vendors of Raw Materials.
Company personnel responsible for purchasing Raw-Materials from Suppliers

Minder Chen, 1993~2002

Data Modeling - 23 -

Entity Type and Entity Instance (Occurrence) Entity Types Vendor Employee Course Department Entity Instance ABC Co. John Smith Intro. to IE
Marketing Department

Minder Chen, 1993~2002

Data Modeling - 24 -

Exercise: Entity Types or Entity Instances?

Maryland Organization Unit Customer President Bill Clinton Department of Commerce Address

Minder Chen, 1993~2002

Data Modeling - 25 -

Finding Entity Types Interviews with users JAD workshops Business forms Reports Computer files using reverse engineering Operation manuals

Minder Chen, 1993~2002

Data Modeling - 26 -

Where to Look for an Entity Type?


Tangible or Intangible Things
The nouns that are used to describe the problem domain will often correspond to the major Entity Types of the system, at least at a high level. Examples: Product, Sensor, and Employee, Department, and Sale Office.

Resources
Any resources that an organization needs to manage should be represented as an Entity Type. Information assists the efficient and effective use of other resources through improved decision. Examples: Inventory, Machine, Bank Account, and Customer.

Roles Played
Roles can be played by persons or organizational units. Examples: Customers, Managers, and Account representatives.

Events
Events are incidents that occur at points in time. An event often involved an interaction between two Entity Types or an action that changes the status of an Entity Type. Examples: Sale, Delivery, and Registration of a motor vehicle.
Minder Chen, 1993~2002 Data Modeling - 27 -

BIAIT:

Business Information Analysis and Integration Technique

Analysis of Orders Ordered entities can be a thing, a space, or a skill. View the order from supplier side. If an organization receives no orders, it has no reason for existing. An organization unit can receive multiple types of orders. 4 questions about the Supplier:
Billing (Cash)? Deliver Late (Immediate)? Profile customer? Negotiate price (Fixed)?

3 questions about the Ordered Entity:


Rented (Sold)? Tracked? Made to order (Stock)?
Source: Carlson, W. M., "BIAIT: Business Information Analysis and Integration Technique The New Horizon," Data Base, Vol. 10, No. 4, 1979, pp. 3-9.
Minder Chen, 1993~2002 Data Modeling - 28 -

Criteria for Evaluating an Entity Type


Need to be remembered by the information system in order to be functional. Can be operated on: CREATE, READ, UPDATE, DELETE. Has a set of operations/services that always apply to change the status of each occurrence of an Entity Type. Carry a set of attributes that always apply to describe each occurrence of an Entity Type. Have at least one relationship with other entity type. Exist more than one entity occurrence (instance) in an Entity Type. Have at least a unique identifier. Domain-based requirements: Something that the system must have in order to operate. These may be clearly specified in the problem description or known from subject matter experts.
Minder Chen, 1993~2002 Data Modeling - 29 -

Entity Relationship Modeling and Diagramming

Relationships Entity Relationship Diagramming Notation Attributes Identifiers Partitioning and Entity Subtypes

Minder Chen, 1993~2002

Data Modeling - 30 -

Relationship (Type)
Definition
A Relationship Type is an association among Entity Types. It indicates that there is a business relationship between these Entity Types. Relationship Membership is the participation of an Entity Type in a Relationship. In IE, a Relationship Type can involve only two Entity Types (binary relationship). Some other modeling techniques allow n-ary relationships.

Examples
CUSTOMER places ORDER ORDER is placed by CUSTOMER EMPLOYEE works on PROJECT PROJECT has project member EMPLOYEE
Data Modeling - 31 -

Minder Chen, 1993~2002

Paring (Relationship Instance)


Relationship paring is a pair of Entity Instances of two Entity Types associated by a Relationship Type between these two Entity Types. Entity Types
Student

Entity Instance
Student#1 Student#2
Course#A Course#B Course#C Course#D

Course

Relationship
Student takes Course
Minder Chen, 1993~2002

Relationship Paring
Student#1 takes Course#A Student#1 takes Course#B Student#1 takes Course#D Student#2 takes Course#A Student#2 takes Course#C Student#2 takes Course#D
Data Modeling - 32 -

Relationship Instances Grouping

Definition: A collection of pairings of a Relationship Membership in which an Entity Instance is involved. Examples:
Student#1 takes Course#A, #B, and #D Student#2 takes Course#A, #C, and #D Course#A is taken by Student#1 and Student#2

Minder Chen, 1993~2002

Data Modeling - 33 -

Relationship Cardinality
One-to-One
1:1
E1

E2

One-to-Many
1:M

E1

E2

Many-to-Many
M:N

E1

E2

Minder Chen, 1993~2002

Data Modeling - 34 -

Relationship Cardinality
The number of Entity Instances involved in the Relationship Instances Grouping in a Relationship Type. Three Forms of Cardinality
1. One-to-one (1:1) DEPARTMENT has MANAGER Each DEPARTMENT has one and only one MANAGER Each MANAGER manages one and only one DEPARTMENT 2. One-to-many (1:m) CUSTOMER places ORDER Each CUSTOMER sometimes (95%) place one or more ORDERs Each ORDER always is placed by exactly one CUSTOMER 3. Many-to-many (m:n) INSTRUCTOR teaches COURSE Each INSTRUCTION teaches zero, one, or more COURSEs Each COURSE is taught by one or more INSTRUCTORs

Minder Chen, 1993~2002

Data Modeling - 35 -

Entity Relationship Diagram (ERD): Notations


Graphical Notations

Cardinality indicator
relationship-description

zero one many

Entity-X

reversed-relation-description

Entity-Y

min max Translate into two structured statements

Each Entity-X relationship-description cardinality-indicator (one-or-many) Entity-Y

Each Entity-Y reversed-relationship-description (zero-or-one) Entity-Y


Example

is-managed-by Department manages

Manager

Minder Chen, 1993~2002

Data Modeling - 36 -

Optionality of Relationship Memberships Whether all entity instances of both entity types need to participate in relationship pairing. Optionality:
Mandatory Optional

Example:
CUSTOMER membership is optional ORDER membership is mandatory
places

CUSTOMER
Minder Chen, 1993~2002

is placed by

ORDER
Data Modeling - 37 -

Relationship Statements
Cardinality indicator
Graphical Notations one one or more

places

CUSTOMER

is placed by

ORDER
zero (sometimes) one (always)

Optionality indicator

Each Entity X optionality relationship cardinality Entity Y Each CUSTOMER sometimes places one or more ORDER. Each ORDER always is placed by one CUSTOMER.
Data Modeling - 38 -

Minder Chen, 1993~2002

Defining Relationships Name Description Property


Cardinality volumes Optionality percentage: % of Entity Type X's instances pairing with Entity Type's Y's instances Transferability: A relationship is transferable if an entity instance can change its pairing within the same relationship.
TRANSFERABLE: An EMPLOYEE can change to a different DEPARTMENT. NON-TRANSFERABLE: An ORDER cannot be transferred to another CUSTOMER.
Minder Chen, 1993~2002 Data Modeling - 39 -

ERD: More Examples


(a) Customer places belongs-to is-contained-in contains manages Employee is-managed-by works-for Project Order

Product

(b)

Parallel Relationship

has-project-members (c) is-consists-of

Part

contained-in

Involuted or Looped Relationship

Minder Chen, 1993~2002

Data Modeling - 40 -

ERD: Alternative Notations


Customer places

Order
belongs-to

Alternative Notations: Customer places belongs-to Order places 1 belongs-to Order places M Order

Customer
Customer

Minder Chen, 1993~2002

Data Modeling - 41 -

Identifying Relationships

Association between entity types Entity types that are used on the same forms or documents. A description in a business document that has a verb that relates two entity types
has consists of uses
Minder Chen, 1993~2002 Data Modeling - 42 -

Attributes
Definition
Characteristics that could be used to describe Entity Types and Relationship Types. However, in IE, relationship types are not allowed to have attributes.

Naming Conventions:
Names that have business meaning Don't use abbreviation or possessive case, e.g., PN and Customer's name Don't include entity type name because IEF will prefix the attribute name with entity type name automatically Use standard format: Entity Type Name (Qualifiers) Domain Name Customer Name Employee Starting Date

Examples
Customer has customer name, address, and telephone number Product has quantity-on-hand, weight, volume, color, and name. Employee has SSN, salary, and birthday. Employee-works-for-project has percentage-of-time, starting-date.
Data Modeling - 43 -

Minder Chen, 1993~2002

Attributes: Notations
Student
Student ID Student Name
Employee Employee number First name Last name Employee function Employee salary

Birth date Student ID Course no. enrollment


Birth date

Student studentID

name phone

Student(Student ID, Student Name, Birth Date)


Finding Attributes: Attributes are identified progressively during BAA phase. Data Analysis Activity Analysis Interaction Analysis Current Systems Analysis
Minder Chen, 1993~2002 Data Modeling - 44 -

Attribute Value
Definition
Attribute Values are instances of Attributes used to describe specific Entity Instances

Examples
Customer Number: 011334 Customer Name: Minder Chen State: VA Order Total: $23,000 Sale tax: $250

An attribute of an entity type should have only one value at any given time. (No repeating group) Avoid using complex coding scheme for an attribute. For example: PART Number: X-XXX-XXX
Part Type
Minder Chen, 1993~2002

Material

Sequence Number
Data Modeling - 45 -

Type & Instance


OBJECT TYPE Entity Type Entity Entity Type Relationship (Type) Attribute (Type) OCCURRENCE Entity Instance Entity Instance Entity Pairing (Relationship Instance) (Attribute) Value

Minder Chen, 1993~2002

Data Modeling - 46 -

Attribute Source Categories


Basic
Definition: An Attribute Value that cannot be deduced or calculated. Examples: Student name and Birthday

Derived
Definition: The Attribute Value can be calculated or deduced from relationship Groupings or from the values of other Attributes. The value of a Derived Attribute changes constantly. Examples: Student Age, Account Balance, Number of courses taken.

Designed
Definition: The Attribute is created to overcome the system constraints. The value of a Designed Attribute does not change. Examples: Student ID, Course number.
Minder Chen, 1993~2002 Data Modeling - 47 -

Data Types

Minder Chen, 1993~2002

Data Modeling - 48 -

Properties of Attributes
Name Description Attribute Source Category: Basic, Derived, Designed Domain or data type: Text, Number, Date, Time, Timestamp Optionality: Mandatory or optional Length and/or precision Permitted Values (Legal Values)
Ranges A set of values (Code Table)

Default value or algorithm

Tools such as PowerBuilder has additional properties for tables columns called extended attributes
Validation Rule Editing Format Reporting Format
Minder Chen, 1993~2002

Column Heading Form Label Code Table


Data Modeling - 49 -

Composite Attribute
Definition: Example:
Telephone Number =

Area code + Exchange + Extension There is no support of composite attribute type most of CASE tools. In such case, an composite attribute must be stored as an entity type.

Minder Chen, 1993~2002

Data Modeling - 50 -

Domain
A collection of values which can be taken by one or more attributes. Date is the domain for Ordered Date, Student's Birthday, Employee Starting Date. A used defined domain can have customized validation rules and formats. CASE tools such as IEF only supports the following basic domains:
Text Number Date Time Timestamp
Data Modeling - 51 -

Minder Chen, 1993~2002

Identifiers The identifier of an entity type is a set of attributes and/or relationships whose values can uniquely identify an entity. Entity types should have one identifier. Identifiers may consist of
A single attribute: Student ID A set of attributes: Students ID + Course ID An attribute and a relationship membership (implemented as a foreign Key): Order Item No + Order Has Order Item

Minder Chen, 1993~2002

Data Modeling - 52 -

Identifying Relationship

customer
places

product
Symbol for Identifying Relationship
is ordered by

ORDERS
is placed by contains has

order
is part of

order item

Minder Chen, 1993~2002

Data Modeling - 53 -

Data Modeling Case Study


The following is description by a pharmacy owner:

"Jack Smith catches a cold and what he suspects is a flu virus. He makes an appointment with his family doctor who confirm his diagnosis. The doctor prescribes an antibiotic and nasal decongestant tablets. Jack leaves the doctor's office and drives to his local drug store. The pharmacist packages the medication and types the labels for pill bottles. The label includes information about customer, the doctor who prescribe the drug, the drug (e.g., Penicillin), when to take it, and how often, the content of the pill (250 mg), the number of refills, expiration date, and the date of purchase."
Please develop a data model for the entities and relationships within the context of pharmacy. Also develop a definition for "prescription". List all your underlying assumptions used in your data models.
Minder Chen, 1993~2002 Data Modeling - 54 -

Data Modeling Case Study


Given the following narrative description of entities and their relationships, prepare a draft entity relationship diagram (ERD). Be sure any reasonable assumptions that you are making.

Burger World Distribution Center serves as a supplier to 45 Burger World franchises. You are involved with a project to build a database system for distribution. Each franchise submits a day-by-day projection of sales for each of Burger World's menu products - the products listed on the menu at each restaurant - for the coming month. All menu product require ingredients and/or packaging items. Based on projected sales for the store, the system must generate a day-by-day and ingredients need and then collapse those needs into one-per-week purchase requisitions and shipments.
Minder Chen, 1993~2002 Data Modeling - 55 -

Data Modeling Process


List entity types Create relationships
Pick a central entity type Work around the neighborhood
Add entity types to the diagram Build relationships among them

Determine cardinalities of relationships

Find/Create identifiers for each entity type Add attributes to the entity type in the data model Analyze and revise the data model

Minder Chen, 1993~2002

Data Modeling - 56 -

Classifying Attribute and Partitioning


An Entity Subtype A collection of Entities of the same type to which a narrower definition and additional Attributes and Relationships apply. An Entity Subtype inherits (retains) all the Attributes and Relationships of its parent Entity Type. Classifying Attribute: An attribute of the Base Entity Type whose values partition the Entity Instances into Subtypes. Partitioning: A basis for subdividing one entity type into subtypes. The process of dividing an Entity Type into several Subtypes based on a Classifying Attribute is called Partitioning. The Classifying Attribute is recorded as a property of the Partitioning and it appears on the diagram.
Minder Chen, 1993~2002 Data Modeling - 57 -

Characteristics of Partitioning
Optionality:
Mandatory: Every Entity instances of the Entity Type must fall into one of the Subtype categories. Optional: Not every Entity instances of the Entity Type must fall into one of the Subtype categories.

Entity Life Cycle: The states through which an Entity Type can pass are used for Partitioning. Enumeration:
Fully enumerated Not fully enumerated

Classifying Attributes and Values


Classifying Attribute: Type D: Domestic Subtype F: Foreign Subtype
Minder Chen, 1993~2002 Data Modeling - 58 -

Partitioning and Entity Subtype: Notation


ATTRIBUTE: Employee ID Name Birthday

ATTRIBUTE: Teaching Quality Indicator

Employee
Type

Lecturer Staff

Teaches

Seminar

Status

Wage Hourly
Minder Chen, 1993~2002 Data Modeling - 59 -

Alternative Notations for Subtypes


IDEF1X
Complete Category All categories shown
employeeID name phone

PowerDesigner

Account Account Number Name

employee type

full-time-emp employeeID (FK) salary

part-time-emp employeeID (FK) hourly-rate

Savings Rate

Checking Fees

Minder Chen, 1993~2002

Data Modeling - 60 -

Entity Subtype Partitioning


Life Cycle Partitioning
Order
Order Status

Taken Scheduled Shipped Billed Paid

Minder Chen, 1993~2002

Data Modeling - 61 -

Normalization
A data base is a model or an image of the reality. Logical Data Base Design is a process of modeling and capturing the end-user views of an application domain and synthesis them into a data base structure. Normalization is a logical data base design method. The basis for normalization is the functional dependencies among attributes in a table.
Minder Chen, 1993~2002 Data Modeling - 62 -

SQL Terminology
Column

Product Table
p_no 101 Row 201 202 product_name Color TV B&W TV PC quantity 24 10 5 price 500 250 2000

Create a table in SQL

CREATE TABLES (p_no CHAR(5) NOT NULL, product_name CHAR(20), quantity SMALLINT, price DECIMAL(10, 2));
Minder Chen, 1993~2002 Data Modeling - 63 -

SQL Terminology
Set Theory Relation Attribute Relational DB Table Column File File Data item Example Product_table Product_name

Tuple
Domain

Row
Pool of legal values

Record
Data type

Product_101's info.
DATE

Minder Chen, 1993~2002

Data Modeling - 64 -

SQL Principles
The result of a SQL query is always a table (View or Dynamic Table) Rows in a table are considered to be unordered Dominate the markets since late 1980s Can be used in interactive programming environments Provide both data definition language (DDL) and data manipulation language (DML) A non-procedural language Can be embedded in 3GL:
Embedded SQL Dynamic SQL
Minder Chen, 1993~2002 Data Modeling - 65 -

SQL: Data Definition Language (DDL)


CREATE DROP TABLE VIEW INDEX DATABASE

ALTER

TABLE

Minder Chen, 1993~2002

Data Modeling - 66 -

SQL: Introduction
A relational data base is perceived by its users as a collection of tables E. F. Codd 1969

Dominate the markets since late 1980s


Strengths:
Simplicity End-user orientation Standardization Value-based instead of pointer-based Endorsed by major computer companies

Most CASE products support the development of relational data base centered applications
Minder Chen, 1993~2002 Data Modeling - 67 -

SQL: Data Manipulation Language (DML)


SELECT UPDATE INSERT DELETE p_no 101 201 202 product_name Color TV B&W TV PC quantity 24 10 5 price 500 250 2000

The Generic Form of the SELECT Statement


SELECT [DISTINCT] column(s) FROM table(s) [WHERE conditions] [GROUP BY column(s) [HAVING condition]] [ORDER BY column(s)]

Minder Chen, 1993~2002

Data Modeling - 68 -

Database Table
The following code retrieves only the Last Name and the Employee ID where the Employee ID is greater than 5. The records are retrieved in descending order.
SELECT LastName, EmployeeID FROM Employees WHERE EmployeeID > 5 ORDER BY EmployeeID DESC

Minder Chen, 1993~2002

Data Modeling - 69 -

WHERE Clause
WHERE: Use the Where clause to limit the selection. The # symbol indicates literal date values. SELECT * FROM Employees WHERE LastName = "Smith" SELECT Employees.LastName FROM Employees WHERE Employees.State in ('NY','WA') SELECT OrderID FROM Orders WHERE OrderDate BETWEEN #01/01/93# AND #01/31/93#
Minder Chen, 1993~2002 Data Modeling - 70 -

Keys
A key, also called identifier, is an Attribute or a Composite Attribute that can be used to uniquely identify an instance of an entity type. Examples: Entity Type Key
Warehouse Product Student Ship Stock of Product Warehouse Number Product Number Student ID or SSN Name and Port of Registration Product Number and Warehouse No.

Minder Chen, 1993~2002

Data Modeling - 71 -

Types of Key
Primary Key: A unique key is an attribute or a set of attributes that has been used by the DBMS as the identifier of a table. Candidate (Alternative) Key: An attribute or a set of attributes that could have been used as the primary key of a table. Secondary (Index) Key: An attribute or a set of attributes that has been used to construct the data retrieval index. Concatenated (Combined or Composite) Key: A set of attributes that has been used as the key. Foreign Key: An attribute or a set of attributes that is used as the primary key in another table.
Minder Chen, 1993~2002 Data Modeling - 72 -

Purposes of Normalization

Avoid maintenance problems such as Update . Insert: There may be no place to insert new information. Delete: Some important information will be lost by deletion. Update: Inconsistency may occur because of the existence of data redundancy. Provide maximum flexibility to meet future information needs by keeping tables corresponding to object types in their simplified forms.
Minder Chen, 1993~2002 Data Modeling - 73 -

A Common Sense Approach to Normalization

Don't rush to put all the information in one table. Create a table to correspond to a class of a simple object type that should exist by itself, i.e., "one fact in one place." Include common fields (links) as ways of joining information from several related tables. Avoid redundancy by using links to retrieve data from related tables.

Minder Chen, 1993~2002

Data Modeling - 74 -

Normalization Theory

Normalization is a process of systematically breaking a complex table into simpler ones. It is built around the concept of normal forms. A relation is in a particular normal form if it satisfies a specific set of constraints such as dependencies among attributes in the relation. For x is an integer and x > 1, if a relation is in x-NF than it is in (x-1)-NF. Higher order normal forms are usually more desirable than lower order normal forms. Normalization process usually starts from complex relations which are usually drawn from some existing documents such as business forms.
Minder Chen, 1993~2002 Data Modeling - 75 -

A Business Form

Minder Chen, 1993~2002

Data Modeling - 76 -

An Informal Example of Normalization


A CUSTOMER ORDER contains the following information:
OrderNo OrderDate CustNo CustAddress CustType Tax Total one or more than one Order-Item which has
ProductNo Description Quantity UnitPrice Subtotal.
Data Modeling - 77 -

Minder Chen, 1993~2002

Solution
Unnormalized table
(OrderNo, OrderDate, CustNo, CustAddress, CustType, Tax, Total, 1{ProductNo, Description, Quantity, UnitPrice,Subtotal}n)

Remove repeating group

1st NF
Remove partial FD

(OrderNo, ProductNo, Description, Quantity, UnitPrice, Subtotal)

2nd NF
Remove transitive FD

(OrderNo, OrderDate, CustNo, CustAddress, CustType, Tax, Total)

(OrderNo, ProductNo, Quantity, UnitPrice, Subtotal) (ProductNo, Description, UnitPrice)

3rd NF
Minder Chen, 1993~2002

(OrderNo, OrderDate, CustNo, Tax, Total) (CustNo, CustAddress, CustType)


Data Modeling - 78 -

Unnormalized Form
A relation that has multi-valued attributes (repeating groups). Normalization Process: Remove Multi-value Attributes If an unnormalized relation R has a primary key K and a multi-value attribute M, the normalization process is:
The multi-value attribute M should be removed from R. A new relation will be created with (K,M) as the primary key of the relation. There may be some other attributes associated with this new relation. R will then be at least in 1NF.

Example: An Employee relation has an attribute language-spoken. For some employees there may be more than one language that they can speak.

EMP (employeeID, empName, empAddress, (language1, language2, ...)) EMP (employeeID, empName, empAddress) EMP-LANGUAGE (employeeID, language, skillLevel)
Minder Chen, 1993~2002 Data Modeling - 79 -

How Do You Remove the Repeating Groups?


CREATE TABLE MEM_CONDITION ( MEMBER# VARCHAR2(12) CASE# VARCHAR2(16) DIAG_ARRAY_1 VARCHAR2(6) DIAG_ARRAY_2 VARCHAR2(6) DIAG_ARRAY_3 VARCHAR2(6) DIAG_ARRAY_4 VARCHAR2(6) DIAG_ARRAY_5 VARCHAR2(6) DIAG_EX_ARRAY_1 VARCHAR2(2) DIAG_EX_ARRAY_2 VARCHAR2(2) DIAG_EX_ARRAY_3 VARCHAR2(2) DIAG_EX_ARRAY_4 VARCHAR2(2) DIAG_EX_ARRAY_5 VARCHAR2(2) DRUG_ARRAY_1 VARCHAR2(12) DRUG_ARRAY_2 VARCHAR2(12) DRUG_ARRAY_3 VARCHAR2(12) DRUG_ARRAY_4 VARCHAR2(12) DRUG_ARRAY_5 VARCHAR2(12) LC_ARRAY_1 VARCHAR2(4) LC_ARRAY_2 VARCHAR2(4) LC_ARRAY_3 VARCHAR2(4) LC_ARRAY_4 VARCHAR2(4) LC_ARRAY_5 VARCHAR2(4) MEM_REVIEW VARCHAR2(4) OP# VARCHAR2(4) Minder Chen, 1993~2002 NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NOT NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,

Data Modeling - 80 -

Functional Dependency

Notation: R.X => R.Y Definition: Attribute Y of Relation R is functionally dependent on the Attribute X of Relation R when there is each value of R.Y associated with no more than one value of R.X. R.X and R.Y may be composite attributes. Description:
R .Y is functionally dependent on R.X R.X functionally determines R.Y
Minder Chen, 1993~2002 Data Modeling - 81 -

Full & Partial Dependency

R.A => R.B If B is not functionally dependent on any subset of A (other than A itself), B is fully dependent on A in R. If B is functionally dependent on a subset of A (other than A itself), B is partially dependent on A in R.

Minder Chen, 1993~2002

Data Modeling - 82 -

First Normal Form (1NF)


A relation R is in the first normal form (1NF) if and only if all attributes of any tuple in R contain only atomic values. Normalization Process:
Remove Partial Functional Dependencies If R is in 1NF and has a composite primary key (K1,K2), an attribute P is functionally dependent on K1 (K1 => P) (i.e., P is partially dependent on (K1, K2)), the normalization process is: The attribute P should be removed from R and a new relation will be created with K1 as the primary key and P as a non-key attribute. A relation that is in 1NF and not in 2NF must have a composite primary key.

Example
Supplier-Part relation has attributes supplier#, part#, qty, city, distance, where (supplier#, part#) is the key. City is partially dependent on supplier#. SUPPLIER-PART (supplier#, part#, qty, city, distance) SUPPLIER-PART (supplier#, Part#, qty) SUPPLIER (supplier#, city, distance)
Minder Chen, 1993~2002 Data Modeling - 83 -

Non-loss Decomposition
Normalization is a reduction (decomposition) process that replaces a relation by suitable projections. Each of the projection is a new relation that is in a further normalized form than the original relation. The collection of projections is equivalent to the original relation. The original relation can always be recovered by taking the natural join of these projections. Any information that can be derived from the original relation can also be derived from the further normalized relations. The converse is not true. The process is reversible because no information is loss in the reduction process.
Minder Chen, 1993~2002 Data Modeling - 84 -

Transitive Dependency

In a relation R, if R.A =>R.B and R.B => R.C then attribute C is said to be transitively dependent on attribute A.

Minder Chen, 1993~2002

Data Modeling - 85 -

Second Normal Form (2NF)


A relation R is in the second normal form (2NF) if and only if it is in 1NF and every non-key attribute is fully dependent on the primary key. Normalization Process: Remove Transitive Dependencies If R is in 2NF and has two non-key attributes A1 and A2 where A2 is functionally dependent on A1 (A1 => A2). The A2 should be removed from R and a new relation will be created with A1 as the primary key and A2 as a non-key attribute. Example
Supplier relation has attributes supplier#, city, distance, where supplier# is the key and distance to a supplier can be determined by the city of the supplier. SUPPLIER (supplier#, city, distance, quality_level) SUPPLIER (Supplier#, city, quality_level) CITY-DISTANCE (city, distance)
Minder Chen, 1993~2002 Data Modeling - 86 -

Third Normal Form (3NF)


A relation R is in the third normal form (3NF) if and only if the non-key attributes (if there is any) are fully dependent on the primary key of R (i.e., R is in its 2NF) and are mutually independent. Heuristic to Check Whether a Relation Is in 3NF
All the non-key attributes (which are not multi-value attributes) are dependent on the (primary) key, the whole key, and nothing but the key.
Explanation

All the non-key attributes have atomic value and dependent on the key (1NF - No multi-value attribute), the whole key, (2NF - No Partially Functional Dependency) and nothing but the key (3NF - No Transitive Functional Dependency)
Minder Chen, 1993~2002 Data Modeling - 87 -

Normalization Process
Unnormalized Form

A
1NF

H
2NF

remove repeating groups

remove partial dependencies


3NF 3NF

remove transitive dependencies


3NF

G F H

3NF

D
B C D

Minder Chen, 1993~2002

Data Modeling - 88 -

Normalization: Pros and Cons


Pros
Reduce data redundancy & space required Enhance data consistency

Enforce data integrity


Reduce update cost Provide maximum flexibility in responding ad hoc queries

Cons
Many complex queries will be slower because joins have to be performed to retrieve relevant data from several normalized tables Programmers/users have to understand the underlying data model of an database application in order to perform proper joins among several tables The formulation of multiple-level queries is a nontrivial task.

Minder Chen, 1993~2002

Data Modeling - 89 -

Join Two Tables


SELECT Categories.CategoryName, Products.ProductName FROM Categories, Products WHERE Products.CategoryID = Categories.Category ID

Minder Chen, 1993~2002

Data Modeling - 90 -

Tables in Relational DB
Identify Primary Keys and Foreign Keys in the following Tables!!!

ID

ID

ID

Minder Chen, 1993~2002

Data Modeling - 91 -

Join Tables
SELECT Orders.OrderID, Orders.CustID, LastName, Firstname, Orders.ItemID, Description FROM Customer, Orders, Inventory WHERE Customer.CustID = Orders.CustID AND Orders.ItemID = Inventory.ItemID ORDER BY CustID, Orders.ItemID

ID

ID

Minder Chen, 1993~2002

Data Modeling - 92 -

Foreign Keys & Primary Keys in a Sample Access Database

Minder Chen, 1993~2002

Data Modeling - 93 -

An Example of a Complex Query


Please list name and phone number of customers who have ordered product number 007. SELECT customer_name, customer_phone FROM customer WHERE customer_number IN SELECT customer_number FROM order WHERE order_no IN SELECT order_no FROM orderItem WHERE product_number = 007

Minder Chen, 1993~2002

Data Modeling - 94 -

Denormalization
The process of intentionally backing away from normalization to improve performance. Denormalization should not be the first choice for improving performance and should only be used for fine tuning a database for a particular application. Requirements
Prior normalization Knowledge of data usage

Benefits
Minimize the need for joins Reduce number of tables Reduce number of foreign keys Reduce number of indices How often are two data items needed together How many rows are involved How volatile is denormalized data How important is visibility of data to users What is the minimum response time and frequency of an query
Data Modeling - 95 -

Knowledge of Data Usage

Minder Chen, 1993~2002

De-normalization: An Example
JOIN

R1

R2
Denormalization

R1 * R 2
Where:

R2

R1 (ProductNo, SupplierNo, Price) R2 (SupplierNo, Name, Address, Phone) R1*R2 (ProductNo, SupplierNo, Name, Address, Phone, Price)

R2 should be kept to prevent data loss. Data redundancy in R1*R2 and R2 could cause potential data inconsistency problems if the redundant data in these two tables are not maintained properly.
Minder Chen, 1993~2002 Data Modeling - 96 -

Data Model Refinement and Transformation


Data Model Refinement Associative Entity Type Removing Many-to-Many Relationships Keys Transformation to Relational Databases

Minder Chen, 1993~2002

Data Modeling - 97 -

Refinement of a Data Model: Analysis and Simplification

Isolated Entity Type Solitary Entity Type One-to-One Relationship Redundant Relationship Multi-Valued Attributes Attribute with Attributes Many-to-Many Relationship

Minder Chen, 1993~2002

Data Modeling - 98 -

Isolated Entity Type


An Entity Type that does not participate in a Relationship. Since every Entity Type should participate in at least one Relationship, there exist two alternatives: Identify a relevant Relationship Remove the Entity Type from the model

Minder Chen, 1993~2002

Data Modeling - 99 -

Solitary Entity Type


An Entity Type that has only one Entity Instance. Examples: Computer Center, Sales Tax, and Current Order Number. Solitary Entity Types may be too restrictive. Alternatives:
Introduce another Entity Type with a wider scope. Computer Center ==> Organization Unit Define it as an Attribute of an Entity Type. Sales Tax ==> Sales Tax of Order Define it as a data element in an parameter table. A parameter table has only one row. Current Order Number ==> Current Order Number of Parameter Table

Minder Chen, 1993~2002

Data Modeling - 100 -

Evaluate One-to-One Relationship


It may be an unnecessary relationship between two Entity Types if they have the same attribute and relationships (i.e., they are identical). It should be then combined into one Entity Type.
Maybe Incorrect
Purchase Request
becomes has request Purchase Order

Correct
Purchase Order

Minder Chen, 1993~2002

Data Modeling - 101 -

Redundant Relationship
Is this relationship redundant? has ordered

customer
places

product
is ordered by

ORDERS
is placed by contains has

order
is part of

order item

Differences in timing of an entity type in its life cycle:


Implemented as separate entity types or use subtypes Use value of attributes or additional attributes to differentiate them
Minder Chen, 1993~2002 Data Modeling - 102 -

Redundant Relationship
Redundant Product is held as Stock contains Non-redundant is held in stocks

Warehouse
holds

Product

is contained in contains

Order Line

is contained in contains

Order

is placed by is contained in is contained in contains places Customer

contains

Order History

Minder Chen, 1993~2002

Data Modeling - 103 -

Multi-Valued Attribute
Definition
An Attribute that may have more than one value at a time is called a multi-valued attribute.

Solution:

Create an Entity Type for the multi-valued attribute


Languages spoken by an Employee Employee(ID, Name, Phone, Languages) Employee(111, John Smith, 201-999-8888, (English, Chinese))

Example:

Employee(ID, Name, Phone) Employee(111, John Smith, 210-999-8888) Employee_language(ID, Language) Employee_language(111, English) Employee_language(111, Chinese)
Data Modeling - 104 -

Minder Chen, 1993~2002

Attribute with Attributes


An Attribute that can be described by other Attributes is called an attribute with attributes. Example:
College Degree by an Employee (John Smith has a College Degree in Computer Sciences from George Mason University)

Solution:
Create an Entity Type to avoid an Attribute with Attributes. Add new attributes to the existing Entity Type.
Minder Chen, 1993~2002 Data Modeling - 105 -

Associative Entity Type


An Associative Entity Type is an Entity Type whose existence is meaningful only if it participates in several (>=2) Relationship Types at the same time. Associative Entity Types are often introduced to represent additional information in many-tomany Relationships or to decompose a many-tomany Relationship into two one-to-many Relationships. Associative Entity Types are also used to represent n-ary Relationships in a binary data model.

Minder Chen, 1993~2002

Data Modeling - 106 -

Remove Many-to-Many Relationship


Given contains

Order

belongs-to

Product

Why?

There is no place to attach Attributes that are required to describe a many-to-many Relationship. It is difficult to translate many-to-many Relationships into relational tables automatically.
How?

A many-to-many relationship can be decomposed into two one-to-many Relationships by creating an Associative Entity Type between the existing two Entity Types.
contains has

Order

belongs to

Order Line
is contained in

Product

Minder Chen, 1993~2002

Data Modeling - 107 -

Remove Many-to-Many Relationships: Exercises


Remove the many-to-many relationship from the following ER diagrams
(a) Product has-sources offers Supplier

(b)

Student

takes
is-taken-by

Course

(c)

consists-of

Part

is-contained-in
Data Modeling - 108 -

Minder Chen, 1993~2002

Bills of Material
Part consists-of A

is-a-component-in

2
E D

1
F

Product Structure

Product-Structure(Parent Part No, Child Part No, Quantity) A A B B C C


Minder Chen, 1993~2002

B C D E D F

2 1 1 3 2 2
Data Modeling - 109 -

Using an Associative Entity Type to Represent an N-ary Relationship

involved in product usage Product

involved in product usage Project

involved in product usage Supplier

Product Usage is an Associative Entity Type for a 3-ary Relationship.

is used in Product Product Usage

uses Project

supplies Supplier
Minder Chen, 1993~2002 Data Modeling - 110 -

Translate Data Models to Relational Tables


Given
contains has

Order
belongs to

Order Line

is contained in

Product

Key: Order# Attribute: Order date Customer ID Sale Person ID

Key: Order#+Product# Attribute: Quantity Unit Price

Key: Product# Attribute: Description Qty-on-hand Unit Price

Relational Tables Created

CREATE TABLE ORDER (OrderNo CHAR(10) OrderDate DATE, CustomerID CHAR(10), SalePersonID CHAR(10));
Minder Chen, 1993~2002

NOT NULL,

Data Modeling - 111 -

Transformation of Data Models to Relational Database Tables

The entire, or part of, a data (entity-relationship) model can be translated into a normalized database design. Objects Created
At most one relational database One or more relations (tables) Data structures (DDL) representing the elements (attributes) and the primary key of each relation Data type of each data elements

Minder Chen, 1993~2002

Data Modeling - 112 -

Heuristics of Transformation
A table is created for each Entity Type in the ER diagram. A table is created for each multi-valued attribute. Relationship Types are implemented as tables or as foreign keys in other tables. Many-to-many relationship types are translated into tables. Foreign keys are used for implementing one-to-one and one-to-many Relationship Types. For one-to-many Relationship Types, the foreign key is placed in the table that represents the Entity Type on the "many" end of the Relationship Type. For identifying one-to-many Relationship Types, the PK of the "one" table migrate to the "many" table as a FK and the FK is also part of the PK of the "many" table. For non-identifying one-to-many Relationship Types, the PK of the "one" table migrate to the "many" table as a FK and the FK is a non-key attribute of the "many" table.
Minder Chen, 1993~2002 Data Modeling - 113 -

PowerDesign: Data Architect

Generation/Reverse Engineering: CDM, PDM

Generation & Reverse Engineering:

Generation & Reverse Engineering:

Database Structure Triggers & Stored Procedures

Extended Attributes Database Structure


Target 4GL Tool

http://www.powersoft.com/
Data Modeling - 114 -

Target DBMS
Minder Chen, 1993~2002

PowerDesigner

Minder Chen, 1993~2002

Data Modeling - 115 -

A Sample Conceptual Data Model


Division Division number Division name Division address Team Team number Speciality

Conceptual Data Model Project : Management Model : Project Management Author : User Version 6.x 7/21/98
Customer Customer number Customer name Customer address Customer activity Customer telephone Customer fax

Is member of supervises Employee Employee number First name Last name Employee function Employee salary Is manager of

Uses

Subcontract

Activity Start date End date

Material Material number Material name Material type Participate Start date End date

Project Project number Project name Project label

Task Task name Task cost

composes

composed of

Minder Chen, 1993~2002

Data Modeling - 116 -

Notations
Employee
Entity

Employee number First name Last name Employee function Employee salary

Relationship
Employ ee Division Division number Division name Division address Employ number ee First name Last name Employ function ee Employ salary ee

One-to-many
Minder Chen, 1993~2002 Data Modeling - 117 -

More on Relationships
Employee Employee number First name Last name Employee function Employee salary is a member of Team Team number Specialty

member
Many-to-many cardinality

Project Project number Project name Project label

Task Task name Task cost

A project 'contains one or more tasks, and a task's existence is dependent on the project.

Minder Chen, 1993~2002

Data Modeling - 118 -

Advanced Concepts
Material Material number Material name Material type
composes composed of

Account Account Number Name

Savings Rate

Checking Fees Employee

Subtype

Employee number First name Last name Employee function Employee salary

Reflexive relationship

Minder Chen, 1993~2002

Data Modeling - 119 -

Define Entities

Minder Chen, 1993~2002

Data Modeling - 120 -

Define Attributes

Minder Chen, 1993~2002

Data Modeling - 121 -

Check Parameters

Minder Chen, 1993~2002

Data Modeling - 122 -

Relationship Definition

Minder Chen, 1993~2002

Data Modeling - 123 -

Dependent (Identifying Relationship)


Check the box to indicate a dependent relationship. "One to many" and "mandatory" are automatically chosen as the cardinality and optionality. At the physical data model level, the parent entity type's primary key (PK) will become part of the dependent child entity type's PK. It is also a foreign key.

Minder Chen, 1993~2002

Data Modeling - 124 -

Inheritance (Super-Type and Sub-Type)

Minder Chen, 1993~2002

Data Modeling - 125 -

Generate Physical Data Model

Minder Chen, 1993~2002

Data Modeling - 126 -

Physical Data Model


Conceptual Data Model Transformation
Physical Data Model

Division Division number belongs to Division name Division address

Employee Employee number First name Last name Employee function Employee salary

Do not define FK as an attribute.

DIVISION DIVNUM <pk> DIVNAME DIVADDR

DIVNUM = DIVNUM

EMPLOYEE EMPNUM <pk> DIVNUM <fk> EMPFNAM EMPLNAM EMPFUNC EMPSAL

DIVNUM automatically migrates as a foreign key.


Minder Chen, 1993~2002 Data Modeling - 127 -

Dependent Relationship
Conceptual Data Model Transformation
Physical Data Model
Minder Chen, 1993~2002

Project Project number Project name Project label

Task Task name Task cost

PROJECT PRONUM <pk> CUSNUM <fk> EMPNUM <fk> ACTBEG ACTEND PRONAME PROLABL

PRONUM = PRONUM

TASK PRONUM <pk,fk> TSKNAME <pk> ACTBEG ACTEND TSKCOST

Data Modeling - 128 -

Physical Data Model


DIVISION DIVNUM <pk> DIVNAME DIVADDR TEAM TEANUM <pk> TEASPE CUSTOMER CUSNUM <pk> CUSNAME CUSADDR CUSACT CUSTEL CUSFAX

Physical Data Model Project:Management Model : Project Management Author :User Version6.x 7/21/98

TEANUM = TEANUM

MEMBER TEANUM <pk,fk> EMPNUM <pk,fk>


DIVNUM = DIVNUM

EMPLOYE_MATERIAL MATERIAL.MATNAME char(30) PROJ.EMPLOYEE.EMPNUM numeric(5) PROJ.EMPLOYEE.EMPFNAM char(30) PROJ.EMPLOYEE.EMPLNAM char(30) PROJ.EMPLOYEE.EMPFUNC char(30) MATERIAL PROJ.EMPLOYEE USED

CUSNUM = CUSNUM EMPNUM = EMPNUM

USED MATNUM <pk,fk> EMPNUM <pk,fk>


EMPNUM = EMPNUM MATNUM = MATNUM

EMPNUM = EMP_EMPNUM

MATERIAL MATNUM <pk> MATNAME MATTYPE


MATNUM = CPD_MATNUM

EMPLOYEE EMPNUM <pk> EMP_EMPNUM <fk> DIVNUM <fk> EMPFNAM <ak> EMPLNAM <ak> EMPFUNC <ak> EMPSAL

PROJECT PRONUM <pk> CUSNUM <fk> EMPNUM = EMPNUM EMPNUM <fk> ACTBEG ACTEND PRONAME PROLABL

EMPNUM = EMPNUM

PRONUM = PRONUM

TNUM = CPN_MATNUM

COMPOSE CPD_MATNUM <pk,fk> CPN_MATNUM <pk,fk>

PARTICIPATE PRONUM <pk,fk> PRONUM = PRONUM TSKNAME <pk,fk> TSKNAME = TSKNAME EMPNUM <pk,fk> PARBEG PAREND

TASK PRONUM <pk,fk> TSKNAME <pk> ACTBEG ACTEND TSKCOST

Minder Chen, 1993~2002

Data Modeling - 129 -

References (Relationships at the Physical Data Model)

Minder Chen, 1993~2002

Data Modeling - 130 -

Referential Integrity
The arrow is pointing from the table containing the foreign key to the table where the foreign key is used as a primary key.

Minder Chen, 1993~2002

Data Modeling - 131 -

Deletion Rules

Update Constraints Delete Constraints None Restrict Cascade Set null Set Default

Minder Chen, 1993~2002

Data Modeling - 132 -

Generation of Oracle SQL DLL


-- ============================================================ -- Database name: PROJECT -- DBMS name: ORACLE Version 8 -- Created on: 7/21/98 8:59 PM -- ============================================================ -- ============================================================ -- Table: DIVISION -- ============================================================ create table ADMIN.DIVISION ( DIVNUM numeric(5) not null constraint CKC_DIVNUM_DIVISION check (DIVNUM >= '1'), DIVNAME char(30) not null, DIVADDR char(80) null , constraint PK_DIVISION primary key (DIVNUM) ) /

Minder Chen, 1993~2002

Data Modeling - 133 -

Referential Integrity
alter table PROJ.EMPLOYEE add constraint FK_EMPLOYEE_CHIEF_EMPLOYEE foreign key (EMP_EMPNUM) references PROJ.EMPLOYEE (EMPNUM) / alter table PROJ.EMPLOYEE add constraint FK_EMPLOYEE_BELONGS_T_DIVISION foreign key (DIVNUM) references ADMIN.DIVISION (DIVNUM) / alter table PROJ.PROJECT add constraint FK_PROJECT_SUBCONTRA_CUSTOMER foreign key (CUSNUM) references PROJ.CUSTOMER (CUSNUM) / alter table PROJ.PROJECT add constraint FK_PROJECT_IS_RESPON_EMPLOYEE foreign key (EMPNUM) references PROJ.EMPLOYEE (EMPNUM) / alter table PROJ.TASK add constraint FK_TASK_BELONGS_T_PROJECT foreign key (PRONUM) references PROJ.PROJECT (PRONUM) /
Minder Chen, 1993~2002 Data Modeling - 134 -

Physical Database Design Activities


Define Tables & Columns

Define Keys
Identify Critical Transactions

Add Columns:
Redundant columns Derived data columns

Manipulate Tables:
Collapse tables Supertypes & subtypes

Add Tables:
Derived data tables

Handle Integrity Issues:


Row uniqueness & Domain restrictions Referential integrity & Generate sequence numbers Derived and redundant data

Controlling Access
Source: Gillete, Rob, etc., Physical Database Design for Sybase SQL Server, Prentice Hall, 1995.

Manage Objects:
Sizes Placement Data Modeling - 135 -

Minder Chen, 1993~2002

Architecture of Data Warehouse


Data Warehouse
Corporate Operational Database Metadata Info. Directory End User Access and OLAP frontend Tools

Summarized

Data Replication & Cleansing

Derived

Detailed Past

Informational Database

Projecte Current d

Data Bridging/ Transformation


Minder Chen, 1993~2002

Data extraction Data filtering Table joining Translation Re-Formatting

EIS DSS Report Writers Spreadsheets

Data Modeling - 136 -

Operational vs. Informational Databases


Characteristics
Data Content Data organizations Data Volatility Data normalization

Operational Database
Current value

Informational Database
Archival data, summarized data, calculated data Subject areas across enterprise Static until refreshed Joined views suitable for business analysis Low - Medium Access only; no direct update Highly unstructured, heuristic or analytical processing Several seconds to minutes
Data Modeling - 137 -

Application by application

Dynamic Fully normalized for transaction processing High Updated on a record and field basis Highly structured transaction processing Sub-second to 2-3 seconds

Access frequency Data Update


Usage

Response Time
Minder Chen, 1993~2002

Excel Pivot Table Wizard

Relational View

Multidimensional View
Minder Chen, 1993~2002 Data Modeling - 138 -

Dimensional Model
Product
Key Name Description Size Price

Sale
Product Key Market Key Promotion Key Time Key
Dollars Units Price Cost

Market Region
Key Description District Region Demographics

Time
Key Weekday Holiday Fiscal

Promotion
Key Description Discount Media

Region

Product

Minder Chen, 1993~2002

Time

Data Modeling - 139 -

Modeling a Data Warehouse


MDM: Multidimensional Modeling
A logical model of business information Easy to understand Applicable to relational and multidimensional databases Extremely useful for analysis A tried-and-tested techniques

Why?
An OLTP (On-Line Transaction Process) design of an order processing system may have dozens or hundreds of tables. It becomes difficult for business managers to understand the design in order to analyze the data.
Data Modeling - 140 -

Minder Chen, 1993~2002

Approach Designed around numeric data:


values counts weights occurrence

An example of a MDM problem statement:


"What is my profitability by customer over time, by organization?"

Minder Chen, 1993~2002

Data Modeling - 141 -

The Classic Star Schema


Market Dimension Market ID description region state district city Each dimension is described by its own table and the facts are arranged in a single large table with a concatenated primary key comprises the individual keys of each dimension. Fact Table Market ID Product ID Period ID dollars units price Period Dimension Period ID description year quarter month current flag resolution sequence

Product Dimension Product ID description supplier ID brand color size


Minder Chen, 1993~2002

Data Modeling - 142 -

Snow Flake Structure


Brand Brand identifier <pk> int Brand name char(30)
Brand identifier = Brand identifier

Product Product identifier <pk> Brand identifier <fk> Product description Product category Product unit price

int int char(80) char(30) int

Customer Customer identifier <pk> Customer name Customer address Customer activity Customer phone number Customer fax number

int char(30) char(80) char(80) char(12) char(12) Year identifier Year name Year <pk> int char(30)

Customer identifier = Customer identifier Year identifier = Year identifier Product identifier = Product identifier

Sale Time identifier Customer identifier Store identifier Product identifier Sale total Sale revenu <fk> <fk> <fk> <fk> int int int int real real

Quarter Quarter identifier <pk> int Year identifier <fk> int Quarter name char(10)
Quarter identifier = Quarter identifier

Country Country identifier <pk> int Country name char(80)

Country identifier = Country identifier

Month Month identifier <pk> int Quarter identifier <fk> int Month name char(10)
Month identifier = Month identifier

Region Region identifier <pk> int Country identifier <fk> int Region name char(30)
Region identifier = Region identifier

Store identifier = Store identifier Time identifier = Time identifier

Week Week identifier <pk> int Month identifier <fk> int Week name char(30) Week number in year int
Week identifier = Week identifier

Store Store identifier Region identifier Store name Store address Store manager Store phone number Store FAX number Store financial services type Store photo services type

<pk> int <fk> int char(50) char(80) char(30) char(20) char(20) char(10) char(10)

Day Week identifier <fk> Time identifier <pk> Date Day of week Day number in month

int int datetime char(30) int

Minder Chen, 1993~2002

Data Modeling - 143 -

Steps to Build MDM Pick a business subject area


Weekly sales reports, monthly financial statements, insurance claim costs.

Asking six fundamental questions:


What business process is being modeled? At what level of detail (granularity) is "active" analysis conducted? What do the measures have in common (the "dimensions")? What are the dimensions' attributes? Are the attributes stable or variable over time and is their "cardinality" bounded or unbounded?
Minder Chen, 1993~2002 Data Modeling - 144 -

Issues Active analysis


Mechanical manipulation: Pivoting, Drilling down, Graphing Agent-based manipulation: Alert reporting, exception reporting Workflow manipulation: Publishing, distributing documents.

Cardinality means "how many"


A relational database usually has "unbounded" cardinality A multidimensional database usually has "bounded" cardinality. Complete reorganization is needed to change cardinality.
Minder Chen, 1993~2002 Data Modeling - 145 -

dept_id = parent_id dept dept_id int parent_id int name varchar(255) description text date_changed datetime

A Data Model for an Electronic Commerce Application


dept_id = dept_id
pfid = pfid product_attribute pfid varchar(30) attribute_id tinyint attribute_index tinyint attribute_value varchar(20) pfid = pfid promo_cross pfid varchar(30) related_pfid varchar(30) description varchar(255) pfid = pfid promo_upsell pfid varchar(30) related_pfid varchar(30) description varchar(255)

pfid = pfid product_variant sku int pfid varchar(30) attribute0 tinyint attribute1 tinyint attribute2 tinyint attribute3 tinyint attribute4 tinyint

product_family pfid varchar(30) dept_id int manufacturer_id int name varchar(255) short_description varchar(255) long_description text image_filename varchar(255) intro_date datetime date_changed datetime list_price int monogramable tinyint

pfid = pfid

shopper sku = sku shopper_id char(32) order_id = order_id created datetime name varchar(235) password varchar(20) street varchar(50) city varchar(50)shopper_id = shopper_id state varchar(30) receipt zip varchar(15) order_id char(26) country varchar(20) shopper_id char(32) phone varchar(16) total int email varchar(50) status tinyint date_entered datetime date_changed datetime marshalled_receipt image shopper_id = shopper_id

receipt_item pfid varchar(30) sku int order_id char(26) row_id int quantity int adjusted_price int promo_price promo_name varchar(255) promo_type int promo_description text promo_rank int active int date_start datetime date_end datetime shopper_all int shopper_column varchar(64) shopper_op varchar(2) shopper_value varchar(64) cond_all int cond_column varchar(64) cond_op varchar(2) cond_value varchar(64) cond_basis char(1) cond_min int award_all int award_column varchar(64) award_op varchar(2) award_value varchar(64) award_max int disjoint_cond_award int disc_type char(1) disc_value realData Modeling

basket shopper_id char(32) date_changed datetime marshalled_order image

Minder Chen, 1993~2002

- 146 -

Attribute 0 of pfid 14 is size and the attribute value 1 is Grande and 2 is Tall and 3 is Short

Minder Chen, 1993~2002

Data Modeling - 147 -

Web-based Build-To-Order Application

Minder Chen, 1993~2002

Data Modeling - 148 -

Data Model for Build-To-Order Application

Minder Chen, 1993~2002

Data Modeling - 149 -

http://www.oracle.com/tools/jdeveloper/documents/jsptwp/index.html?content.html

Auction Web Site's Data Model


Minder Chen, 1993~2002 Data Modeling - 150 -

You might also like