You are on page 1of 13

Chapter 2

Data Models
Data Model:
A set of concepts to describe the structure of a database, and certain constraints that the database should obey. It is independent of hardware or software constraints. Rather than try to represent the data, as a database would see it, the data model focuses on representing the data as the users uses it in the real world. It works as a bridge between the concepts that make up real-world events and processes and the physical representation of those concepts in a database The Data model is one part of the conceptual design process. The data model focuses on what data should be stored in the database rather than focusing on how the data is processed.

Categories of data models


Conceptual (high-level, semantic) data models: Provide concepts that are close to the way many users perceive data. (Also called entity-based or object-based data models) Physical (low-level, internal) data models: Provide concepts that describe details of how data is stored in the computer. Implementation (representational) data models: Provide concepts that fall between the above two, balancing user views with some computer storage details.

History of Data Models


Relational Model: proposed in 1970 by E.F. Codd (IBM), first commercial system in 1981-82. Now in several commercial products (DB2, ORACLE, SQL Server, SYBASE, INFORMIX) Network Model: the first one to be implemented by Honeywell in 1964-65 (IDS System). Adopted heavily due to the support by CODASYL (CODASYL DBTG report of 1971). Later implemented in a large variety of systems - IDMS (Cullinet - now CA), DMS 1100 (Unisys), IMAGE (H.P.), VAX -DBMS (Digital Equipment Corp.). Hierarchical Data Model: implemented in a joint effort by IBM and North American Rockwell around 1965. Resulted in the IMS family of systems. The most popular model. Other system based on this model: System 2k (SAS inc.) Object-oriented Data Model(s): several models have been proposed for implementing in a database system. One set comprises models of persistent O-O Programming Languages such as C++ (e.g., in OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE). Additionally, systems like O2, ORION (at MCC then ITASCA), IRIS (at H.P.- used in Open OODB).

Object-Relational Models: Most Recent Trend. Started with Informix Universal Server. Exemplified in the latest versions of Oracle-10i, DB2, and SQL Server etc. systems.

Reasons for Creating a Data Model


It describes the exactly information needs of the business It facilitates discussion It helps to prevent mistakes, misunderstanding It forms important ideal system documentation It forms a sound basis for Physical database design It is a very good practice with many practitioners

Conceptual, Logical and Physical Data Model


A logical data model is sometimes incorrectly called a physical data model, which is not what the ANSI people had in mind. The physical design of a database involves deep use of particular database management technology. For example, a table/column design could be implemented on a collection of computers, located in different parts of the world. That is the domain of the physical model. Conceptual, Logical and physical data models are very different in their objectives, goals and content. Key differences noted below.

Conceptual Data Model (CDM)

Logical Data Model (LDM)

Physical Data Model (PDM)

Includes high-level data constructs

Includes entities (tables), attributes (columns/fields) and relationships (keys)

Includes tables, columns, keys, data types, validation rules, database triggers, stored procedures, domains, and access constraints

Non-technical names, so that executives and managers at all levels Uses business names for can understand the data basis of entities & attributes Architectural Description

Uses more defined and less generic specific names for tables and columns, such as abbreviated column names, limited by the database management system (DBMS) and any company defined standards

Uses general high-level data constructs from which Architectural

Is independent of technology (platform,

Includes primary keys and indices for fast data

Descriptions are created in nontechnical terms

DBMS)

access.

May not be normalized

Is normalized to fourth normal form (4NF)

May be de-normalized to meet performance requirements based on the nature of the database. If the nature of the database is Online Transaction Processing (OLTP) or Operational Data Store (ODS) it is usually not de-normalized. Denormalization is common in Datawarehouses.

Network Model
-The model has three basic components: records, data types and links ADVANTAGES: Network Model is able to model complex relationships and represents semantics of add/delete on the relationships. Can handle most situations for modeling using record types and relationship types. Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET etc. Programmers can do optimal navigation through the database. DISADVANTAGES: Navigational and procedural nature of processing Database contains a complex array of pointers that thread through a set of records. Little scope for automated "query optimization

Hierarchical Model
In this data model, the relationships between logical records types have hierarchical representation

ADVANTAGES:
Hierarchical Model is simple to construct and operate on Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in manufacturing, personnel organization in companies Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT etc.

DISADVANTAGES:

Navigational and procedural nature of processing Database is visualized as a linear arrangement of records Little scope for "query optimization"

Relational Database Model


The relational Database model was developed by E.F. Codd. A relational database allows the definition of data structures, storage and retrieval operations and integrity constraints. In such database, the data and relations between them are organized in tables. A table is collection of records and each record in a table contains the same fields Properties of Relational Tables:
Values are atomic Each row is unique Column values are of the same kind The sequence of columns is insignificant The sequence of rows is insignificant Each column has unique name

Entity Relationship Modeling


It is relational database modeling. Simply stated the ER modeling is a conceptual data model that views the real world as collection of Entities and relationships.

Entity Relationship Diagram


The diagram used for modeling Entity Relationships in relational database.

Example COMPANY Database Requirements of the Company (oversimplified for illustrative purposes)

The company is organized into DEPARTMENTs. Each department has a name, number and an employee who manages the department. We keep track of the start date of the department manager. Each department controls a number of PROJECTs. Each project has a name, number and is located at a single location. We store each EMPLOYEEs social security number, address, salary, sex, and birthdate. Each employee works for one department but may work on several projects. We keep track of the number of hours per week that an employee currently works on each project. We also keep track of the direct supervisor of each employee. Each employee may have a number of DEPENDENTs. For each dependent, we keep track of their name, sex, birthdate, and relationship to employee.

ER Model Concepts Entities and Attributes


Entities are specific objects or things in the mini-world that are represented in the database. For example the EMPLOYEE John Smith, the Research DEPARTMENT, the ProductX PROJECT Attributes are properties used to describe an entity. For example an EMPLOYEE entity may have a Name, SSN, Address, Sex, BirthDate A specific entity will have a value for each of its attributes. For example a specific employee entity may have Name='John Smith', SSN='123456789', Address ='731, Fondren, Houston, TX', Sex='M', BirthDate='09-JAN-55 Each attribute has a value set (or data type) associated with it e.g. integer, string, subrange, enumerated type,

Types of Attributes
1. Simple
Each entity has a single atomic value for the attribute. For example, SSN or Sex. The attribute may be composed of several components. For example, Address (Apt#, House#, Street, City, State, ZipCode, Country) or Name (FirstName, MiddleName, LastName). Composition may form a hierarchy where some components are themselves composite.

2. Composite

3. Multi-valued An entity may have multiple values for that attribute. For example, Color of a CAR or PreviousDegrees of a STUDENT. Denoted as {Color} or {PreviousDegrees}. 4. In general, composite and multi-valued attributes may be nested arbitrarily to any number of levels although this is rare. For example, PreviousDegrees of a STUDENT is a composite multi-valued attribute denoted by {PreviousDegrees (College, Year, Degree, Field)}.

Summary of ER-Diagram Notation for ER Schemas

Example of ER DIAGRAM Entity Types are: EMPLOYEE, DEPARTMENT, PROJECT, DEPENDENT

Weak Entity Types

An entity that does not have a key attribute A weak entity must participate in an identifying relationship type with an owner or identifying entity type Entities are identified by the combination of: A partial key of the weak entity type The particular entity they are related to in the identifying entity type Example: Suppose that a DEPENDENT entity is identified by the dependents first name and birhtdate, and the specific EMPLOYEE that the dependent is related to. DEPENDENT is a weak entity type with EMPLOYEE as its identifying entity type via the identifying relationship type DEPENDENT_OF

Relationship Types: Relationships Degree


A relationship's degree indicates the number of associated entities or participants.

Unary Relationship Binary Relationship Ternary Relationship(n-ary Relationship

A unary relationship exists when an association is maintained within a single entity. A binary relationship exists when two entities are associated. A ternary relationship exists when three entities are associated

Cardinality Mapping
The degree of cardinality mapping in relationship (also known as cardinality) is the number of occurrences in one entity which are associated (or linked) to the number of occurrences in another.

There are three degrees of relationship (Cardinality Mapping), known as: 1. one-to-one (1:1) 2. one-to-many (1:M) 3. many-to-many (M:N)

One-to-one (1:1)
This is where one occurrence of an entity relates to only one occurrence in another entity. A one-to-one relationship rarely exists in practice, but it can. However, you may consider combining them into one entity. For example, an employee is allocated a company car, which can only be driven by that employee. Therefore, there is a one-to-one relationship between employee and company car.

One-to-Many (1:M)
Is where one occurrence in an entity relates to many occurrences in another entity. For example, taking the employee and department entities shown on the previous page, an employee works in one department but a department has many employees. Therefore, there is a one-to-many relationship between department and employee.

Many-to-Many (M:N)

This is where many occurrences in an entity relate to many occurrences in another entity. The normalization process discussed earlier would prevent any such relationships but the definition is included here for completeness. As with one-to-one relationships, many-to-many relationships rarely exist. Normally they occur because an entity has been missed. For example, an employee may work on several projects at the same time and a project has a team of many employees. Therefore, there is a many-to-many relationship between employee and project.

However, in the normalization process this many-to-many is resolved by the entity Project Team.

Refining Relationship in Entity-Relationship Diagram


Entity must participate in relationship i.e. any isolated entity in the conceptual schema is removed since it has no any importance in the system. One-to-one relationship can be merged to make one entity. Mostly one to-many relationships is preferred. We have to resolve many-to-many relationship into one-to-many relationship by assuming manyto-many relationship as a relation and establish one-to-many relationship as discussed in the class. Unary and Binary relationships are preferred. Resolve ternary relationship by assuming ternary relationship as a relation and establish one-to-many relationship as discussed in the class. Redundant Relationship is a relationship between two entities that is equivalent in meaning to another relationship between these same two entities. These types of redundant relationship are discarded.

Schemas versus Instances

Database Schema: The description of a database. Includes descriptions of the database structure and the constraints that should hold on the database. Schema Diagram: A diagrammatic display of (some aspects of) a database schema. Schema Construct: A component of the schema or an object within the schema, e.g., STUDENT, COURSE. Database Instance: The actual data stored in a database at a particular moment in time. Also called database state (or occurrence).

Database Schema Vs. Database State


Database State: Refers to the content of a database at a moment in time. Initial Database State: Refers to the database when it is loaded Valid State: A state that satisfies the structure and constraints of the database.

Distinction
o The database schema changes very infrequently. The database state changes every time the database is updated. o Schema is also called intension, whereas state is called extension.

Data Independence
Logical Data Independence: The capacity to change the conceptual schema without having to change the external schemas and their application programs. Physical Data Independence: The capacity to change the internal schema without having to change the conceptual schema. When a schema at a lower level is changed, only the mappings between this schema and higherlevel schemas need to be changed in a DBMS that fully supports data independence. The higherlevel schemas themselves are unchanged. Hence, the application programs need not be changed since they refer to the external schemas.

DBMS Languages
Data Definition Language (DDL): Used by the DBA and database designers to specify the conceptual schema of a database. In many DBMSs, the DDL is also used to define internal and external schemas (views). In some DBMSs, separate storage definition language (SDL) and view definition language (VDL) are used to define internal and external schemas. Data Manipulation Language (DML): o Used to specify database retrievals and updates. o DML commands (data sublanguage) can be embedded in a general-purpose programming language (host language), such as COBOL, C or an Assembly Language.

o Alternatively, stand-alone DML commands can be applied directly (query language).

DBMS Interfaces
Stand-alone query language interfaces. Programmer interfaces for embedding DML in programming languages: Pre-compiler Approach Procedure (Subroutine) Call Approach User-friendly interfaces: Menu-based, popular for browsing on the web Forms-based, designed for nave users Graphics-based (Point and Click, Drag and Drop etc.) Natural language: requests in written English Combinations of the above

Other DBMS Interfaces


Speech as Input (?) and Output Web Browser as an interface Parametric interfaces (e.g., bank tellers) using function keys. Interfaces for the DBA: o Creating accounts, granting authorizations o Setting system parameters o Changing schemas or access path

Data dictionary / repository:


Used to store schema descriptions and other information such as design decisions, application program descriptions, user information, usage standards, etc. Active data dictionary is accessed by DBMS software and users/DBA. Passive data dictionary is accessed by users/DBA only.

You might also like