You are on page 1of 123

DATABASE MANAGEMENT

CSYS2404

LECTURE NOTES
Mrs. Gaye Campbell 2010

Gaye Campbell 2010

Database Management

TABLE OF CONTENTS

SYLLABUS/COURSE OUTLINE ........................................................................................................................................7

UNIT I Introduction of Database Concepts ...........................................................................8


UNIT II Database Design .....................................................................................................8
UNIT III Introduction to Relational Algebra and SQL ..........................................................9
UNIT IV Distributed Databases ............................................................................................9
UNIT V Security Issues ...................................................................................................... 10
UNIT I: INTRODUCTION TO DATABASE CONCEPTS ............................................................................................... 12

The need for File Systems and Databases .............................................................................. 12


Basic Concepts ...................................................................................................................... 12
Sample Payroll Database Structure ................................................................................................ 14

The traditional/file oriented approach .................................................................................... 15


Problems with the Traditional approach ........................................................................................ 15

The database approach .......................................................................................................... 16


DBMS (Database management systems) ........................................................................................ 17
Functions common to most databases ........................................................................................... 18

Advantages of databases ........................................................................................................ 19


Disadvantages of databases ................................................................................................... 19
Components of a DBMS ....................................................................................................... 20
The different types of databases/Database Models ................................................................. 21
Hierarchical ................................................................................................................................... 21
Network ........................................................................................................................................ 23
Relational ...................................................................................................................................... 25
Object-Oriented ............................................................................................................................. 26
Object-Relational ........................................................................................................................... 31
Multidimensional........................................................................................................................... 32
UNIT II: DATABASE DESIGN.......................................................................................................................................... 34

Introduction to the Database System Life Cycle (DBLC)....................................................... 34


Analysis and design phase.............................................................................................................. 34
Database implementation and operation phase ............................................................................. 34

Roles of database personnel................................................................................................... 36


Data modellers .............................................................................................................................. 36
Business Analysts ........................................................................................................................... 36

Copyright G. Campbell 2010

Database Management
Database Designers ....................................................................................................................... 36
Systems Analysts [see Business Systems course] ............................................................................ 37
Programmers ................................................................................................................................. 37
Database Administrators ............................................................................................................... 38

Database Design Conceptual, Logical, Physical .................................................................. 40


Conceptual design ......................................................................................................................... 41
Logical Design ................................................................................................................................ 41
Physical Design .............................................................................................................................. 41

Database Schema or Levels of abstraction in specifying a database structure ......................... 43


Definition of database schema ....................................................................................................... 43
Explanation of the four database schema ...................................................................................... 43

Entity- Relationship Diagrams ............................................................................................... 47


Types of relationships .................................................................................................................... 47
The symbols used in an ERD ........................................................................................................... 48
Sample ERDs .................................................................................................................................. 48
Example of Creating the ERD.......................................................................................................... 50
Entity and Referential Integrity ...................................................................................................... 51
ERD Exercises................................................................................................................................. 52

Functional Dependencies ....................................................................................................... 53


Computation of Closures ....................................................................................................... 53
Algorithm for finding the closure of a set of attributes ................................................................... 54
Closure Exercises ........................................................................................................................... 54

Armstrongs Axioms ............................................................................................................. 55


Reflexivity ...................................................................................................................................... 55
Augmentation................................................................................................................................ 55
Transitivity..................................................................................................................................... 55
Examples ....................................................................................................................................... 55
EXERCISE ....................................................................................................................................... 55

Covers and their role in determining redundant FDs .............................................................. 56


Algorithm to find redundant FDs. ................................................................................................... 56
Exercises - Find the redundant FDs in the following sets: ............................................................... 56

1st , 2nd , 3rd Normal Forms .................................................................................................... 57


Definition - A relation is in first normal form (1NF) if: ..................................................................... 57
Definition - A relation is in second normal form (2NF) if: ................................................................ 58

Copyright G. Campbell 2010

Database Management
Definition - A relation is in 3rd normal form (3NF) if: ....................................................................... 58
Comprehensive example (1NF to 3NF) ........................................................................................... 59
Another example of the process. ................................................................................................... 61
Normalization Exercises to 3NF. ..................................................................................................... 63

Assessment of file layouts as they affect the functioning of a database. .................................65


Physical and logical data organization. .................................................................................. 65
UNIT III: INTRODUCTION TO RELATIONAL ALGEBRA AND SQL ....................................................................... 66

The languages used in database systems ................................................................................ 66


The role of Relational DMLs and DDLs. ............................................................................... 66
The difference between relational algebra and relational calculus. ......................................... 67
Relational algebra.................................................................................................................. 68
Simple projection........................................................................................................................... 68
Selection........................................................................................................................................ 68
Difference (or Set Difference) ........................................................................................................ 68
Renaming ...................................................................................................................................... 68
Union............................................................................................................................................. 68
Intersection ................................................................................................................................... 68
Division.......................................................................................................................................... 68
Join (natural, equi, inner, outer)..................................................................................................... 69
Cartesian product. ......................................................................................................................... 72

Relational Algebra Exercises ................................................................................................. 73


SQL Commands LAB PORTION ....................................................................................... 76
Brief Summary of Commands......................................................................................................... 76
CREATE TABLE (using constraints primary key, foreign key) ......................................................... 78
ALTER TABLE .................................................................................................................................. 80
INSERT ........................................................................................................................................... 81
SELECT (using WHERE, GROUP BY, ORDER BY, HAVING, aggregate functions, logical operators,
comparison operators) .................................................................................................................. 82
SELECT sub queries ........................................................................................................................ 86
Operations on Result Sets .............................................................................................................. 89
UPDATE ......................................................................................................................................... 91
DELETE .......................................................................................................................................... 92
CREATE VIEW ................................................................................................................................. 92
CREATE INDEX ............................................................................................................................... 93

Copyright G. Campbell 2010

Database Management
DROP TABLE .................................................................................................................................. 93
DROP VIEW .................................................................................................................................... 93
DROP INDEX .................................................................................................................................. 93
GRANT and REVOKE ....................................................................................................................... 93
COMMIT and ROLLBACK ................................................................................................................ 94

SQL EXERCISES ................................................................................................................. 95


EXERCISE 1 CREATE TABLE AND ALTER TABLE STATEMENTS ........................................................ 95
EXERCISE 2 INSERT, UPDATE, DELETE, SELECT USING UNION ....................................................... 95
EXERCISE 3 - SELECT STATEMENT ................................................................................................... 95
EXERCISE 4 - SELECT STATEMENT USING MORE THAN ONE TABLE ................................................. 96
EXERCISE 5 DISTINCT, WILDCARD contd, SUB QUERY, CREATE INDEX, DROP TABLE, DROP INDEX
...................................................................................................................................................... 96
EXERCISE 6 REVIEW OF ALL COMMANDS..................................................................................... 96
UNIT IV: DISTRIBUTED DATABASES.......................................................................................................................... 99

Characteristics of a distributed database ................................................................................ 99


Definition of logical database, local and global application, global intelligence ..................... 99
Assessment of a distributed database versus a loose connection of independent site ............ 100
Terms and concepts used in distributed databases ................................................................ 100
Advantages and disadvantages of a distributed database ...................................................... 101
Advantages .................................................................................................................................. 101
Disadvantages ............................................................................................................................. 102

Practice Questions ............................................................................................................... 103


Data warehouse ................................................................................................................... 104
Differences between data warehouse and operational database ............................................ 106
Data mart ............................................................................................................................ 108
On-line analytical processing............................................................................................... 109
Data mining ........................................................................................................................ 110
Transactions Atomic, Consistent, Isolated, Durable (ACID) ............................................. 111
Concurrency control ............................................................................................................ 111
UNIT V: SECURITY ISSUES ......................................................................................................................................... 113

The role of the Data Dictionary ........................................................................................... 113


What is data security?.......................................................................................................... 113
What are Security Risks?..................................................................................................... 113
Security risks and their effects ..................................................................................................... 114

Database protection methods - backup and restore methods ................................................. 116


Integrity Preservation keys (primary and foreign), data validation, authority levels ........... 117
Keys ............................................................................................................................................. 117
Data Validation ............................................................................................................................ 117

Copyright G. Campbell 2010

Database Management
Authority Levels ........................................................................................................................... 118

Security Control unauthorized access and use, encryption, anti-virus, firewall, SQL views
............................................................................................................................................ 118
SAMPLE SQL CODE FOR RECREATING DATABASE .............................................................................................. 121
REFERENCES .................................................................................................................................................................. 123

Copyright G. Campbell 2010

Database Management

SYLLABUS/COURSE OUTLINE
THE COUNCIL OF COMMUNITY COLLEGES OF JAMAICA
COURSE NAME:

Database Management

COURSE CODE:

CSYS2404

CREDITS:

CONTACT HOURS:

45 (45 hours theory)

PRE-REQUISITE(S):

None

CO-REQUISITE(S):

None

SEMESTER:
COURSE DESCRIPTION:
This course is designed to ensure that the student completes a study of Database Management
Systems. Students will be exposed to database concepts including functional dependencies, SQL
and normalization. Emphasis will be placed on the creation and manipulation of databases using
Oracle, but this can be extended to any available DBMS.
GENERAL OBJECTIVES:
Upon successful completion of this course, students should:
1. understand various terms used in Database Management
2. appreciate the advantages of the database approach
3. understand key components of a database management system
4. appreciate the historical transformation of database models and DBMS
5. know the steps in the Database System Life Cycle
6. appreciate the differences between Logical and Physical Database Design and
organization
7. understand functional dependencies
8. understand how to normalize up to 3NF
9. use SQL commands
10. understand how to create reports using ad-hoc SQL commands
11. understand how to solve relational Algebra problems
12. understand distributed database concepts
13. appreciate the importance of maintaining data integrity and security
14. understand the application of Entity Relationship Diagrams

Copyright G. Campbell 2010

Database Management
UNIT I Introduction of Database Concepts
Specific Objectives:
Upon successful completion of this unit, students should be able to:
1. define key terms associated with database management
2. discuss the file oriented versus the database approach
3. discuss advantages associated with database approach as opposed to file-oriented
approach
4. identify hardware, software and DBMS components
5. describe features of hierarchical, network, relational, object-oriented and object-relational
models
Content:
1. Basic Concepts character, field, record, table/file, database, Database Management
System, primary key, foreign key, secondary key, composite key, super key, candidate
key
2. The traditional/file oriented approach
3. The database approach
4. Advantages of databases
5. Components of a DBMS DDL, DML, Query Language, Report Generator
6. The different types of databases hierarchical, network, relational, object-oriented,
object-relational
UNIT II Database Design
Specific Objectives:
Upon successful completion of this unit, students should be able to:
1. define the Database System Life Cycle
2. identify the Phases in the Database System Life Cycle
3. identify the roles of database personnel
4. discuss conceptual, logical and physical data design
5. discuss the concept of database schema
6. utilize ERDs to capture data requirements
7. discuss concepts of entity and referential integrity
8. discuss Functional Dependencies (FDs)
9. find redundant FDs in a set
10. normalize to 3NF
11. assess file layouts as they affect the functioning of databases
12. discuss the differences between physical and logical data organization
Content:

Copyright G. Campbell 2010

Database Management
1. The Database Management System Life Cycle - Database Analysis, Database Design,
Database Implementation, Database Testing and Evaluation, Operation, Database
Maintenance
2. Roles of database personnel - Data modelers, Business Analysts, Database Designers,
Systems Analysts, Programmers and Database Administrators.
3. Database Design Conceptual, Logical, Physical
4. Database Schema
5. Entity- Relationship Diagrams
6. Entity and Referential Integrity
7. Functional Dependencies
8. Computation of Closures
9. Armstrongs Axioms
10. Covers and their role in determining redundant FDs
11. 1st , 2nd , 3rd Normal Forms
12. Assessment of file layouts as they affect the functioning of a database.
13. Physical and logical data organization.
UNIT III Introduction to Relational Algebra and SQL
Specific Objectives:
Upon successful completion of this unit, students should be able to:
1.
2.
3.
4.

discuss and identify the role of Relational DMLs and DDLs


differentiate between relational algebra and relational calculus
solve Relational Algebra problems
utilize SQL commands

Content:
1. The role of Relational DMLs and DDLs.
2. The difference between relational algebra and relational calculus.
3. Introduction to Relational algebra Simple projection, selection, difference, renaming,
union, intersection, division, join (natural, equi, inner, outer) and Cartesian product.
4. SQL Commands - CREATE TABLE (using constraints primary key, foreign key),
ALTER TABLE, INSERT, SELECT (using WHERE, GROUP BY, ORDER BY,
HAVING, aggregate functions, logical operators, comparison operators), SELECT sub
queries, UPDATE, DELETE, CREATE VIEW, CREATE INDEX, DROP TABLE,
DROP VIEW, DROP INDEX, GRANT and REVOKE, COMMIT and ROLLBACK.
UNIT IV Distributed Databases
Specific Objectives:
Upon successful completion of this unit, students should be able to:

Copyright G. Campbell 2010

Database Management
1. define characteristics of Distributed Databases
2. assessment of a distributed database versus a loose connection of independent sites
3. define terms and concepts used in the distributed database environment
4. identify advantages and disadvantages of distributed databases
5. discuss data warehousing
6. differentiate between a data warehouse and a data mart
7. differentiate between a data warehouse and an operational database
8. discuss On-line analytical processing (OLAP)
9. discuss the concept of data mining
10. discuss the concept of transactions and concurrency control
Content:
1.
2.
3.
4.

Characteristics of a distributed database


Definition of logical database, local and global application, global intelligence
Assessment of a distributed database versus a loose connection of independent site
Terms and concepts used in distributed databases transparency, homogeneous versus
heterogeneous distribution, fragmentation vertical/horizontal, replication, and allocation
5. Advantages and disadvantages of a distributed database
6. Data mart
7. Data warehouse
8. Differences between data warehouse and operational database
9. On-line analytical processing
10. Data mining
11. Transactions Atomic, Consistent, Isolated, Durable (ACID)
12. Concurrency control
UNIT V Security Issues
Specific Objectives:
Upon successful completion of this unit, students should be able to:
1.
2.
3.
4.

identify the role of the Data Dictionary/ Directory


identify methods used in database protection
discuss methods used in integrity preservation
identify and discuss security control techniques

Content:
1.
2.
3.
4.

The role of the Data Dictionary


Database protection methods - backup and restore methods
Integrity Preservation keys (primary and foreign), data validation, authority levels
Security Control unauthorized access and use, encryption, anti-virus, firewall, SQL
views

Copyright G. Campbell 2010

10

Database Management
METHODS OF DELIVERY:
1. Lectures
2. Discussions
3. Lab
METHODS OF ASSESSMENT AND EVALUATION:
1. Common Coursework
2. Internal Tests
3. Final Examination

20%
20%
60%

RESOURCE MATERIAL:
Prescribed:
Hoffer, J.A., Prescott, M. & Topi, H. (2008) Modern database management. (9th ed.) .
NJ: Prentice Hall.
Recommended:
Date, C. J. (2003) An introduction to database systems. (8th ed.). NJ: Addison Wesley.
Shah, N. (2004) Database systems using oracle. (2nd ed.). NJ: Prentice Hall.

Copyright G. Campbell 2010

11

Database Management

UNIT I: INTRODUCTION TO DATABASE CONCEPTS


The need for File Systems and Databases
In order to be competitive in todays data driven environment, business organizations have to be
concerned with the concept of data management. Data management is the process of identifying
effective and efficient methods of collecting, storing and retrieving data. Over the years, this
need has given rise to the emergence of two distinct data management approaches: the file
approach and the database approach.
Before we look at the differences between the file approach and the database approach we need
to be aware of some basic file/database concepts.
Basic Concepts
Term/Concept
Data
Information
Character

Field/Attribute/Column

Definition

Raw facts which are important to an organization


Organized-data. This means that what is information for someone may
be data for another.
One of a set of symbols, such as letters or numbers, that are arranged
to express information and belongs to a character set (e.g. ASCII
represented by 8 bits).
A single-unit of data in its simplest form. A field contains a specific
piece of information within a record. A field name uniquely identifies each
field. In the example employee table below the lastname field would
contain all of the last names of the employees in the table. It is an attribute or
characteristic of an entity.

Data type/Field type

Record/Row/Tuple

The physical representation of a data value. A data type is a unified set


of data values that is integrated with a set of operations that allows the
effective manipulation of each data value within the set. The data
type determines what kind of data may be stored in the field and it
also determines the operations, which may be performed on the stored
value.
A group of related fields. A record is defined as being a collection of
related data. These data item (values) are often stored in fields. Each
field is allowed to hold an atomic value, that the value is not
decomposable. In order to store information each field has to be
associated with a data type. A record contains information about a given
person, place, event or thing. A record in an employee table would contain
specific information about a particular employee.

Table/File/Relation

A group of records having, the same structure. A table is-a collection


of similar records, which means that all the records within a table must
have the same structure (physical and logical). It captures all of the
records of a particular type of entity. E.g. the employee table has all of the

Copyright G. Campbell 2010

12

Database Management

Entity

employee records. The structure of the table is described by the fields, that is,
the type of data that will be held in the table.
An entity is an object or event about which someone chooses to collect data.
It may be a person, place, event or thing. E.g. Student, car, library book,
employee, bank account etc.

Database
(db)/Information
Repository

A database is a collection of tables, which collectively stores and


provides the information needed by an organization.
Common types of databases in society include:Payroll, Employee data, Inventory management/Stock, Sales
Customer data, Supplier data, Library book management, Banking, Student
Registration

Database Management
Systems (DBMS)

Complex system software which constructs and maintains the database


in a controlled way. It allows creation, access, and management of a
database. A database system is essentially nothing more than a
computerized record-keeping system. The users will have the
following facilities: add new files, insert new data, retrieve data,
update data, delete data, and delete files.

Key
Primary key

Attribute(s) used to identify an entity

The primary key is one or more fields whose values uniquely identify
each record in a table. A primary key cannot allow Null values and
must always have a unique index. (Null values indicate that the field is
empty). A primary key is used to relate a table to foreign keys in other
tables.
Fields that could be used as primary keys include:TRN, Student id number, Employee id number, License plate number,
Passport number, NIS number, Chassis number, Engine number, Part
number, Reference number, ISBN on books, Bar code
Department id etc.

Secondary key
Candidate key (minimal

superkey)
Composite key
Foreign key
Superkey

Index key

Query

Copyright G. Campbell 2010

A set of attributes used for identifying records but not uniquely (e.g.
Name)
An attribute that can serve as a primary key. (an alternate key). It can
allow null values. E.g. on an employee table the TRN may be used as
the key but the NIS No. is also unique.
A primary key that consists of two or more attributes
The primary key of one entity that is placed in a second entity for the
purpose of accessing the first entity
All keys are superkeys, but not all superkeys are keys. A super key is a
collection of one or more fields whose collective value creates a
unique value. The importance of a super key is that it allows us to
make a distinction between the records, which are stored in a table
This is a field or a collection of fields whose collective value is used to
order the information in a database table. The main purpose of an
index is to speed up data retrieval
A question about the data stored in your tables, or a request to perform

13

Database Management

Form
Report
Non prime attribute

an action on the data. It can bring together data from multiple tables to
serve as the source of data for a form, report, or data access page.
A database object on which you place controls for taking actions or for
entering, displaying, and editing data in fields.
A database object that prints information that is formatted and
organized according to your specifications.
An attribute that is not a part of the primary key

Sample Payroll Database Structure

Copyright G. Campbell 2010

14

Database Management

The traditional/file oriented approach

The file processing approach is an approach to storing and managing data where each department
within an organization typically has its own set of files. The focus is on procedures. Data flows
from program to program. Files are designed to meet needs of a given program. The file
approach is often called the traditional approach. In this methodology, the process of' data
management is " handled in an unstructured and ad-hoc" (unplanned) manner. This means that
the data files and the programs which manipulate these files are created on a departmental basis
without due consideration of the needs of the other departments.
Can you use the above approach to do this query?
Find the employees making < $23000 who a) work in warehouse with floor area larger than
30000 square feet. b) have issued an order to supplier S6.
Problems with the Traditional approach
The problems created by this approach may be divided into 2 categories : data problems
and programming problems.
Data problems
These problems were brought about by the differences in the format of the duplicated
data. These differences were typically seen in 3 areas:
Typographical errors in the duplicated data
Data type differences in the duplicated data
Differences in the logical representation of the duplicated data
Programming problems
The programming languages that were available during this period of time were all 3rd
Generational Languages, which are also known as procedural languages. Procedural
languages suffer from two deficiencies, which makes it difficult to write programming
routines that manipulate data within the data files These 2 deficiencies are known as
structural dependence and data dependence.

Copyright G. Campbell 2010

15

Database Management

Structural dependence this is the situation in which a programmer needs to have a


knowledge of the representation of the logical structure of a file in order to write
programming routines to manipulate the data within the file. The logical structure of a
file is concerned with the order in which data occurs within the file.
Data dependence
This is the situation in which a programmer needs to have a knowledge of the physical
representation of the data within the file in order to write programming routines to
manipulate the data.
The problems are as follows:
Application program dependent. E.g Prog 1 cannot access directly those files
designed for Prog 2 (Files are often design specifically for their particular
application)
Separated and Isolated data Resulting in difficulty to access data stored in
different files
Incompatible files
Files must be pre-sorted
Redundant data can arise as new programs are written (The same fields are stored
in multiple places, the chance for errors is increased. There are also typographical
errors in the duplicated data)
Inconsistent data arises when one program does an update and another does not.
File structure changes severely impact existing programs.
Poor data control with no centralized control at the data element level it is
common for the same data element to have multiple names
Often difficult to understand
The database approach

Copyright G. Campbell 2010

16

Database Management

In the database approach many programs and users share the data in the database. Users
access data using software called a Database Management System (DBMS). The focus is
on the data and not on procedures. The data resource is separate from the programs.
In the database approach data management is handled in a structured and planned
manner. The first step in the database approach is to perform a data requirements analysis
of the organization as a whole. In other words we are concerned with identifying the data
needs of the organization not just the data needs of the specific department. This results
in a pool of centralized data, which is then shared among the various organizational
departments.
The first step in the database approach is geared towards solving the data problems that
were present in the file approach. We have now eliminated the duplication of data by
ensuring that there is a centralized pool of data, which is accessed by the entire
organization. The result is that all the problems that were generated because of duplicated
data are now eliminated.
The second step in the database approach is the use of a 4th Generational Language,
which is also known as a nonprocedural language. A nonprocedural language does not
suffer from the deficiencies of a 3rd Generational Language. In fact, a 4th Generational
Language supports structural independence and data independence.
 Structural independence the situation in which the logical representation
of a file structure is not needed in order to write programming routines for
manipulating the file contents.
 Data independence - the situation in which the physical representation of
data is not needed in order to write programming routines for
manipulating the data.
Database - An organized collection of data. A set of related files.
Formal Definition: A database is a single organized collection of structured data, stored
with minimum of duplication of data items so as to provide a consistent and controlled
pool of data. This data is common to all users of the system but is independent of
programs which use the data.
DBMS (Database management systems)
The DBMS is an item of complex system software which constructs and maintains the
database in a controlled way. It allows creation, access, and management of a
database.
It consists of a collection of interrelated data and a collection of programs to access
that data. The data describe one particular enterprise. A DBMS is usually purchased
from a software vendor and is the means by which an application program or end-user
views and manipulates data in a database.
Copyright G. Campbell 2010

17

Database Management
It also provides the interface between the user and the data. (The user is unaware of
the structure of the database. The DBMS provides user with the services needed and
handles the technicalities of maintaining and using the data.)
The DBMS also allocates the storage to data.
It maintains indices so that any required data can be retrieved, and so that separate
items of data can be cross referenced. (Research: Look up hashing)
The DBMS also has the function of providing security for the data. The main aspects
of this are:- protecting data against unauthorized access, safeguarding data against
corruption, providing recovery and restart facilities after a hardware/software failure.
The DBMS keeps statistics of the use made of the data. This allows redundant data to
be removed.
It also allows data which is frequently used to be kept in a readily accessible form so
that time is saved.
Functions common to most databases
Data Dictionary (DD)
o Is sometimes called a repository
o Contains data about each file in the DB and each field within the files
o Should only be updated by skilled personnel
o Is used to perform validation checks
o Allows users to specify a default field
File retrieval and maintenance
o Many tools provided
o Involves adding new records, updating existing records and deleting
unwanted records
Query Language
o Allows users to specify data to be displayed, printed or stored
o Consists of simple English-like statements
o Each has its own grammar and vocabulary
o Usually quickly learned by non-programmer
Form
o A window used to enter and change data
o When well designed validates data as entered reducing data entry
errors
Report Generator
o Also called report writer
o Allows users to design a report on the screen
o Normally used only to retrieve data
Data Security
o A DBMS provides means to ensure that only authorized users access
users at permitted times
o Most DBMSs allow different levels of access privileges
Backup and Recovery
o A DBMS provides a variety of techniques to restore a damaged or
destroyed database to usable form.

Copyright G. Campbell 2010

18

Database Management
o A Backup or copy of the entire database should be made on a regular
basis
o Some DBMSs maintain a log of activities
Advantages of databases
Data is managed by the DBMS
Program independent
Information supplied to managers is more valuable because it is based on a
comprehensive collection of data instead of files which contain only the data needed
for one application. (Total availability). [Data is centralized and integrated]
Shared data
Data belongs to and are shared, usually over a network, by the entire
organization.
Security settings are usually used to define who have access to what level.
As well as routine reports, it is possible to obtain ad hoc reports to meet particular
requirements.
Easier Access non-technical users can access and maintain data if afforded the
necessary privileges. [Better service to the users]
There is an economic advantage in not duplicating data. In addition, errors due to
discrepancies between 2 files are eliminated.
The amount of input preparation needed is minimized by the single input principle.
(This means that there is little duplication of data, one transaction will cause the
necessary changes to be made to the data). (Reduced data redundancy most data
items are stored in only one file which greatly reduces duplicate data)
Improved data integrity data modification is accomplished by changing only one
file, reducing the probability of introducing inconsistencies and redundancies
A great deal of programming time is saved because the DBMS handles the
construction and processing of the files and the retrieval of data. (Reduced
development time)
The integration of different business systems is greatly facilitated.
Data definition and documentation are standardized.
Disadvantages of databases

Requires more memory, storage and processing power


Data are more vulnerable than in file processing systems

[Research The History of databases, 1970 E.F. Codd]

Copyright G. Campbell 2010

19

Database Management

Components of a DBMS
The components of a Database System are as follows:

Database
Software DDL, DML, Query Language, Report Generator/Writer (see unit III for more details)
Hardware
Users

Copyright G. Campbell 2010

20

Database Management

The different types of databases/Database Models


Every database and DBMS is based on a specific data model. The data model consists of
the rules that define how the database organizes data and how users view the organization
of data. Databases are classified according to the approaches taken to database
organization. The classes are:
Relational
Network
Hierarchical
Object Oriented
Multidimensional
A data model is a representation of data and its interrelationships which describe ideas
about the real world.
The hierarchical and network database models store its data in a series of records, which
have a set of field values attached to it. They collect all the instances of a specific record
together as a record type. These record types are the equivalent of tables in the relational
model, and with the individual records being the equivalent of rows. Links between the
record types are created using Parent-child relationships.
Hierarchical
A hierarchical system is one that is organized in the shape of a pyramid, with each row of
objects linked to objects directly beneath it. Hierarchical systems pervade everyday life.
Examples of hierarchical systems in society are:
The army which has generals at the top and privates at the bottom
The classification of plants and animals according to species, family, genus etc.
Examples of hierarchical systems in computers are:
File system a hierarchy of folders and sub-folders in which files are placed.
Menu driven system systems of main menus and sub-menus below. (E.g. when
you click on File another menu comes up under it).
The hierarchical model is the oldest of the database models, and unlike the network,
relational and object oriented models, does not have a well documented history of its
conception and initial release. It is derived from the Information Management Systems of
the 1950's and 60's. It was adopted by many banks and insurance companies who are still
running it as a legacy system to this day. Hierarchical database systems can also be found
in inventory and accounting systems used by government departments and hospitals.
The hierarchical model is a tree structured model and consists of many record types
with one being the root. The root record type exists at the top of the tree. All data must
be accessed through the root. One-to-many relationships exist between records in the
hierarchy with one being the parent and the other the child. Each child has a unique
Copyright G. Campbell 2010

21

Database Management
parent and a parent can have many children. This child/parent rule assures that data is
systematically accessible. To get to a low-level table, you start at the root and work your
way down through the tree until you reach your target. Of course, as you might imagine,
one problem with this system is that the user must know how the tree is structured in
order to find anything.
For example, in the diagram below, the root record type is customer, the parent of order
is customer, the parent of parts is order. In order to access an order, you must first access
the customer (e.g. by knowing the customer#). Order has two children which are parts
and salesman. In order to access the parts, you must first access the customer then the
order. The path to the parts record type is therefore Customer, Order, Parts.
Hierarchical structures were widely used in the first mainframe database management
systems. However, due to their restrictions, they often cannot be used to relate structures
that exist in the real world. Hierarchical relationships between different types of data can
make it very easy to answer some questions, but very difficult to answer others. If a oneto-many relationship is violated (e.g., a patient can have more than one physician) then
the hierarchy becomes a network.
The hierarchical model is no longer used as the basis for current commercially produced
systems, however, there are a large number of legacy (old) installations. These legacy
systems are likely to be phased out over time, as the number of qualified staff declines
due to retirement and retraining.
Examples of hierarchical databases include:
IMS - Information Management Systems by IBM
System 2000 by MRI systems corp.
Adabas
GT.M
Cach
Multidimensional_hierarchical_toolkit
Mumps_compiler

Copyright G. Campbell 2010

22

Database Management

Advantages of the Hierarchical Model


 Data is unified since all records stem from the root
 Easier to secure the database since you can access data through only one
path
 Good for large volumes of one-to-many relationships
 Adding, updating, and deleting records is more efficient and accurate than
the network model
Disadvantages of the Hierarchical Model
Software dependence (Changes to the database structure requires
modification to all programs which access the database)
You cannot add a record to a child table until it has already been
incorporated into the parent table. This might be troublesome if, for
example, you wanted to add a student who had not yet signed up for any
courses. In the diagram above, you cannot add a new salesperson until
there is a customer and an order.
Cannot (difficult) show many-to-many relationships
One-to-many relationship can result in redundant data
Not flexible enough to support ad-hoc queries
Data can only be accessed through the right path
It is not user friendly as users have to know the structure in order to access
data through the right path
Network
The network model is a database model conceived as a flexible way of representing
objects and their relationships. Its original inventor was Charles Bachman, and it was
developed into a standard specification published in 1969 by the Conference on Data
Systems Languages (CODASYL) Consortium. In many ways, the Network Database
model was designed to solve some of the problems with the Hierarchical Database
Model.
Where the hierarchical model structures data as a tree of record types, with each record
type having one parent record and many children, the network model allows each

Copyright G. Campbell 2010

23

Database Management
record type to have multiple parent and child records, forming a lattice structure. This
allows the model to support many-to-many relationships. There is no root record type.
Data can therefore be accessed through more than one path. For example, in the diagram
below, an order can be accessed through either the salesperson or the customer as order
has salesperson and customer as its parents. Another way of saying it is that the child of
salesperson and customer is order. The path to Parts is either Salesperson, Order, Parts or
Customer, Order, Parts. You can therefore access parts by either knowing who the
salesperson is or through the order by knowing for example, the order #.
The chief argument in favour of the network model, in comparison to the hierarchical
model, was that it allowed a more natural modeling of relationships between entities.
Although the model was widely implemented and used, it failed to become dominant for
two main reasons. Firstly, IBM chose to stick to the hierarchical model in their
established products such as IMS and DL/I. Secondly, it was eventually displaced by the
relational model, which offered a higher-level, more declarative interface.
Examples of network databases include:
Codasyl
Total
VAX-DBMS
IMAGE of Hewlett Packard
DMS-1100 of UNIVAC
SUPRA of Cincom

Copyright G. Campbell 2010

24

Database Management

Advantages of the Network Model


Many-to-many relationships are easily represented
It is more flexible as you can access data through more than 1 path
Represents redundancy more efficiently than hierarchical model
Disadvantages of the Network Model
Software dependence. (Changes to the database structure requires
modification to all programs which access the database)
Uses more processing time than the hierarchical structure
Users must have knowledge of the structure of the database in order to
navigate
Hard to design, use and maintain

Relational
Relational databases consist of tables called relations. Relations are made up of tuples
and attributes. The rows/records are called tuples. The columns/fields are called
attributes. Relationships between relations are implicit in the overlapping attributes. All
have the same simple format making them easy to set out under column headings. Each
row normally has a unique identifying key. Most relational databases include Structured
Query Language (SQL) a query language that allows users to manage, update and
retrieve data (e.g. Oracle, MySQL, Ingres, db2, Sybase, Access, Visual FoxPro).



Relational DB developer calls file a relation, record a tuple, and field an


attribute
Relational DB user calls file a table, record a row, and field a column

CustName

Copyright G. Campbell 2010

Salesperson

Order-No

Salesperson

Part-No Order-no

25

Database Management
Advantages of the Relational Model
 Structural independence (i.e. Changes to the database structure DOES
NOT require modification to all programs which access the database
 Powerful and flexible query mechanism that makes ad-hoc queries
possible
 Easy representation of all types of relationships
 Unification of data that minimizes redundancy and maximizes security
Disadvantages of the Relational Model
 Requires more space and processing power
 Requires more planning if the database structure is to be designed properly
Entity Table/Relationship Table
An entity table is a table structure which allows us to store a set of similar entities. A
relationship table on the other hand is a table structure that enables us to show the
associations, which exist among elements in, related entity tables.
Standard Notation
Standard Notation is a format for writing database tables so that its logical structure may
be understood. In standard notation, each table is given a unique name. The name of the
table is then written in capital letters. Following the table name is a list of all the fields,
which are found in the table. These fields are enclosed in brackets. The primary key
field for the table is then underlined.
For example, let us assume that we want to store the following information about a
student: id, Fname, Lname and sex. Let us also assume that we want to store the
following information about a subject: subld, subName, sublength. Finally- we want to
show the relationship between each student and the subjects taken in another table called
takes- If we assume- that, id is the primary key for the student table and if we assume the
subid is the primary key for the subject table, we will end up with the following table
structures in standard notation.
STUDENT(id, Fname, Lname, sex)
SUBJECT(subid, subName, sublength)
TAKES(id, subid)

Object-Oriented
Summary
o Stores data in objects (An object contains data plus the actions that process the
data)
o Can usually store more types of data than Relational databases
o Can usually access data faster than the Relational DB

Copyright G. Campbell 2010

26

Database Management
o Stores unstructured data more efficiently than the Relational DB
o Example FastObjects, GemStone
What is an Object?
An object generally is any item that can be individually selected and manipulated. This
can include shapes and pictures that appear on a screen as well as less tangible software
entities. In object-oriented programming an object is a self-contained entity that consists
of both data and procedures to manipulate the data. In other words, an object is an item that
contains data, as well as the actions that read or process the data.
Real-world objects share two characteristics: They all have state and behavior. For
example, dogs have state (name, color, breed, hungry) and behavior (barking, fetching,
wagging tail). Bicycles have state (current gear, current pedal, two wheels, number of
gears) and behavior (braking, accelerating, slowing down, changing gears).
Software objects are modeled after real-world objects in that they too have state and
behavior. You might want to represent real-world dogs as software objects in an
animation program or a real-world bicycle as a software object in the program that
controls an electronic exercise bike. You can also use software objects to model abstract
concepts.
What is a Class?
A class is a category of objects. For example, there might be a class called shape that
contains objects which are circles, rectangles, and triangles. The class defines all the
common properties (characteristics) of the different objects that belong to it.
A class is a special programming construct that allows us to create objects. In other
words, a class provides the blueprint for the creation of an object. The class must
specify a description of the data that is stored and a description of the operations that the
object can provide.
As indicated above, each object must have a state and a set of methods, which are
encapsulated (contained) inside the object. The state refers to the data that is stored inside
the object, while the methods/behaviours refer to the set of operations/functions, which
the object can perform. For example, a user can click on a button, put the mouse over the
button, right click or double click on the button. Click, double click, right click, mouse
over etc are therefore examples of methods. When the user clicks on the button, the
relevant code for the particular user action is executed. Each object must have a set of
well-defined public interfaces, which a client may use to get the object to perform a
specific operation.
Examples of objects.
An object oriented database can contain many classes of objects, these include:
Command buttons
List boxes
Data windows

Copyright G. Campbell 2010

27

Database Management

Windows
Menus
Text boxes
Pictures
Audio clips
Video clips (animation)
Students
Courses
Employees

What is an object-oriented database (OODB)?


Object-oriented databases or object database management systems grew out of research
during the early to mid-1980s into having intrinsic database management support for
graph-structured objects. The term "object-oriented database system" first appeared
around 1985.
An object-oriented database stores data in objects. The most significant characteristic of
object-oriented database technology is that it combines object-oriented programming
with database technology to provide an integrated application development system.
Object-oriented databases are designed to work well with object-oriented programming
languages such as Java, C#, and C++.
An object contains data, as well as actions that read or process the data. A Member object, for
example, might contain data about a member such as Member ID, First Name, Last Name,
Address, and so on. It also could contain instructions on how to print the member record or
the formula required to calculate a member's balance due. A record in a relational database,
by contrast, would contain only data about a member.
Object-oriented databases have several advantages compared with relational databases.
They can store more types of data, access this data faster, and allow programmers
to reuse objects. An object-oriented database stores unstructured data more efficiently
than a relational database. Unstructured data includes photographs, video clips, audio
clips, and documents. When users query an object-oriented database, the results often
display more quickly than the same query of a relational database.
If an object already exists, programmers can reuse it instead of recreating a new object saving on program development time. For example, if a Close button exists on each
screen, the programmer only needs to write the code once, then place the same button on
each screen. This is called inheritance as discussed below.
The following are features of an object-oriented database:

Inheritance the ability to create new objects by allowing them to automatically


obtain the data members and the data operations of an existing class without rewriting
the code that is present in the existing class.

Copyright G. Campbell 2010

28

Database Management

Polymorphism (many forms) the ability to have multiple classes of objects using the
same interfaces although the implementation details may vary from object to object.
For example, you can have a function/subroutine that calculates the area of an object.
The way it calculates area depends on the type of object that called the function. This
is because the formula for area is different for circle, rectangle, triangle etc. In other
words, there is one function called CALCULATE_AREA and multiple objects will
call this function, but the function behaves differently from object to object.

Encapsulation the ability of an object to hide its internal representation from the
program that uses it. This is accomplished by defining public interfaces and by
specifying that these public interfaces must be used when accessing the internal data.

Information-hiding - an object has a public interface that other objects can use to
communicate with it. The object can maintain private information and methods that
can be changed at any time without affecting other objects that depend on it. You
don't need to understand a bike's gear mechanism to use it.

Examples of object oriented databases include:

FastObjects
GemStone
KE Texpress
ObjectStore
Versant

Examples of applications appropriate for an object-oriented database include the


following:

A multimedia database stores images, audio clips, and/or video clips. For example, a
geographic information system (GIS) database stores maps. A voice mail system
database stores audio messages. A television news station database stores audio and
video clips.
A groupware database stores documents such as schedules, calendars, manuals, memos,
and reports. Users perform queries to search the document contents. For example, you
can search people's schedules for available meeting times.
A computer-aided design (CAD) database stores data about engineering, architectural,
and scientific designs. Data in the database includes a list of components of the item
being designed, the relationship among the components, and previous versions of the
design drafts.
A hypertext database contains text links to other types of documents. A hypermedia
database contains text, graphics, video, and sound. The Web contains a variety of
hypertext and hypermedia databases. You can search these databases for items such as
documents, graphics, audio and video clips, and links to Web pages.
A Web database links to an e-form on a Web page. The Web browser sends and
receives data between the form and the database.

Copyright G. Campbell 2010

29

Database Management
OODBs add database functionality to object programming languages. A major benefit is
the unification of the application and database development into a seamless data model
and language environment. As a result, applications require less code, use more natural
data modeling, and code bases are easier to maintain. Object developers can write
complete database applications with a modest amount of additional effort.
According to Rao (1994), "The object-oriented database (OODB) paradigm is the
combination of object-oriented programming language (OOPL) systems and persistent
systems. The power of the OODB comes from the seamless treatment of both persistent
data, as found in databases, and transient data, as found in executing programs." Data is a
database is said to be persistent (constant) because you can read a record at one point in
time and read the record at another point in time and the record is still there. In other
words, the record is not transient (temporary).
In contrast to a relational DBMS where a complex data structure must be flattened out to
fit into tables or joined together from those tables to form the in-memory structure,
OODBs have no performance overhead to store or retrieve a web or hierarchy of
interrelated objects. This one-to-one mapping of object programming language objects to
database objects has two benefits over other storage approaches: it provides higher
performance management of objects, and it enables better management of the complex
interrelationships between objects. This makes object DBMSs better suited to support
applications such as financial portfolio risk analysis systems, telecommunications service
applications, world wide web document structures, design and manufacturing systems,
and hospital patient record systems, which have complex relationships between data.

Copyright G. Campbell 2010

30

Database Management
Representation of an object oriented database.
In the sample website below, the object-oriented database contains buttons and a map. When the
user clicks on a particular area of the map, information on that area will appear. When the user
clicks on a button, there is a link to another web page. When the user puts their mouse over a
button, a description of the button appears.

Object-Relational
What is a hybrid object-relational database (ORD)?
An object-relational database (ORD) or object-relational database management
system (ORDBMS) combines features of the relational and object-oriented data models.
It is a relational database management system that allows developers to integrate the
database with their own custom data types and methods. The term object-relational
database is sometimes used to describe external software products running over
traditional DBMSs to provide similar features; these systems are more correctly referred
to as object-relational mapping systems.
Whereas RDBMS or SQL-DBMS products focused on the efficient management of data
drawn from a limited set of data types (defined by the relevant language standards), an
object-relational DBMS allows software developers to integrate their own types and the

Copyright G. Campbell 2010

31

Database Management
methods that apply to them into the DBMS. The goal of ORDBMS technology is to allow
developers to raise the level of abstraction at which they view the problem domain.
Object-relational database management systems (ORDBMSs) add new object storage
capabilities to the relational systems at the core of modern information systems. These
new facilities integrate management of traditional fielded data, complex objects such as
time-series and geospatial data and diverse binary media such as audio, video, images,
and applets. An applet is an application that has limited features, requires limited memory
resources, and is usually portable between operating systems.
By encapsulating methods with data structures, an ORDBMS server can execute complex
analytical and data manipulation operations to search and transform multimedia and other
complex objects. As an evolutionary technology, the object-relational (OR) approach has
inherited the robust transaction- and performance-management features of its relational
ancestor and the flexibility of its object-oriented cousin. Database designers can work
with familiar tabular structures while assimilating new object-management possibilities.
Examples of Object-relational databases include:
DB2
JDataStore
Oracle
Polyhedra
PostgreSQL
What is Object Definition Language (ODL)?
Object-oriented and object-relational databases often use a query language called object
query language (OQL) to manipulate and retrieve data. These databases also have an
object definition language (ODL). ODL is used to define and manipulate the objects in
the database. ODL must specify a description of the data that is stored in objects as well
as a description of the operations that the object can provide.
For example, an object could be defined as being a command button. Code could be
written to manipulate the button in various ways such as: raise the button, move its
location, bring it into focus, enlarge it etc.

Multidimensional
o Stores data in dimensions.
o The number of dimensions varies
o Most have a time dimension
o Examples: D3, Oracle Express
The following shows the difference between the relational view of sales data and
the multidimensional view of sales data.

Copyright G. Campbell 2010

32

Database Management
Relational View
INVOICE Table
Number
Date
2034
15/5/96
2035
15/5/96
2036
16/5/96
2037
16/5/96

Customer Amount
Dartonik
$3500
INC
$1800
Dartonik
$2000
INC
$800

LINE
Number
2034
2034

Table
Product Price Quantity
Mouse
$150
20
Diskette $50
10

Multidimensional View
Time Dimension
Customer Dimension 15/5/96 16/5/96
Totals
Dartonik
$3500
$2000
$5500
INC
$1800
$800
$2600
Totals
$5300
$2800
$8100
Sales figures occur at the intersection of a customer row and time column

[Extra Research: semi-structured model, associative model]

Copyright G. Campbell 2010

33

Database Management

UNIT II: DATABASE DESIGN


Introduction to the Database System Life Cycle (DBLC)
The DBLC is made up of the following phases:

Database Analysis
Database Design
Database Implementation
Database Testing and Evaluation
Database Operation
Database Maintenance

In designing a database it goes through this cycle. The steps in the cycle are further broken
down as follows:
Analysis and design phase
Requirements formulation and analysis
Logical Design
Implementation design
Physical design
Database implementation and operation phase
Database implementation
Operation and monitoring
Modification and adaptation

Database Analysis
This phase is done in the analysis phase of the SDLC. The main aim of database analysis
is to perform the following function:

Analyse the current situation of the company (initial study)


Define the problems being experienced
Define organizational objectives and business rules (for validation rules etc.)
Define the scope and the boundaries of the project

Database Design
This phase is concerned with performing the following functions:
Conceptual Design how data relates to each other (models the real world) (e.g.
ERD)
Logical Design the information content of the database (tables/objects and
links)
Physical Design layout on secondary storage (indexing, data types, access
methods etc.)

Copyright G. Campbell 2010

34

Database Management
[Research the various access methods: Indexed, Sequential, Random/Direct]
Database Implementation
This phase is concerned with the actual creation of the database with respect to the
database design that was constructed above. In addition this phase is also concerned with
the implementation of security routines, business rules, concurrency control etc. We are
also concerned with the population of the database. [In other words this is where we
create the database structure using SQL commands]
Database Testing and Evaluation
This phase is concerned with running tests to ensure that the database will meet the needs
of the organization. This involves verifying that the appropriate business rules are being
called, that the security of the database is indeed intact etc. The failure of evaluation
criteria may signal changes in the conceptual, logical or physical layers.
This phase also involves testing of the programs that will use the database to ensure that
the interface works.
Database Operation
In this step users are actually using the database (e.g. adding records) through the relevant
application software.
Database Maintenance
This phase is concerned with ensuring that the database is functional and reliable. This
includes making modifications (e.g. adding new fields, increasing field sizes etc.).
Maintenance is often attained by performing the following activities:

Preventive maintenance (e.g. backup)


Corrective maintenance (recovery from failure)
Adaptive maintenance (adding new entities, enhancing performance etc)
Performing periodic security audit checks

Copyright G. Campbell 2010

35

Database Management

Roles of database personnel


Data modellers
Database design seeks to design the logical and physical structure of one or more databases to
accommodate the information needs of the users in an organization for a defined set of
applications". The design process roughly follows five steps:
1.
2.
3.
4.
5.

planning and analysis


conceptual design
logical design
physical design
implementation

The data model is one part of the conceptual design process. The most widely used form of data
modelling is the Entity-Relationship (ER) approach. The role of the data modeller therefore is to create
the data model or to carry out conceptual database design.

Business Analysts
This person has both business and computer knowledge. The term Business Analyst (BA) is
used to describe a person who practices the discipline of business analysis. A business analyst or
"BA" is responsible for analyzing the business needs of clients to help identify business
problems and propose solutions. Within the systems development life cycle domain, the business
analyst typically performs a liaison function between the business side of an enterprise and the
providers of services to the enterprise. Common alternative titles are systems analyst, and
functional analyst, although some organizations may differentiate between these titles and
corresponding responsibilities.
The International Institute of Business Analysis has the following definition of the role: "A
business analyst works as a liaison among stakeholders in order to elicit, analyze, communicate
and validate requirements for changes to business processes, policies and information systems.
The business analyst understands business problems and opportunities in the context of the
requirements and recommends solutions that enable the organization to achieve its goals."
The British Computer Society proposes the following definition of a business analyst: "An
internal consultancy role that has responsibility for investigating business systems, identifying
options for improving business systems and bridging the needs of the business with the use of
IT."
This person critically evaluates the information gathered. He/She should have strong analytical
skills and can therefore translate business needs to requirements. He also has good
communication skills and is able to challenge business units.
Database Designers
The process of designing a database generally consists of a number of steps which will be carried
out by the database designer. Usually, the database designer must:

Determine the data to be stored in the database

Copyright G. Campbell 2010

36

Database Management

Determine the relationships between the different data elements


Superimpose a logical structure upon the data on the basis of these relationships.

[See the Database Design section for more details]


Systems Analysts [see Business Systems course]
The systems analyst analyses and designs systems that meet the computer requirements of an
organization. He/She uses computer technology to solve problems. A systems analyst is responsible

for researching, planning, coordinating and recommending software and system choices to meet
an organization's business requirements. The systems analyst plays a vital role in the systems
development process. A successful systems analyst must acquire four skills: analytical, technical,
managerial, and interpersonal. Analytical skills enable systems analysts to understand the
organization and its functions, which helps him/her to identify opportunities and to analyze and
solve problems. Technical skills help systems analysts understand the potential and the
limitations of information technology. The systems analyst must be able to work with various
programming languages, operating systems, and computer hardware platforms. Management
skills help systems analysts manage projects, resources, risk, and change. Interpersonal skills
help systems analysts work with end users as well as with analysts, programmers, and other
systems professionals.
Because they must write user requests into technical specifications, the systems analysts are the
liaisons between vendors and the IT professionals of the organization they represent. They may
be responsible for developing cost analysis, design considerations, and implementation timelines. They may also be responsible for feasibility studies of a computer system before making
recommendations to senior management.
Called Systems Architects in some companies.
Basically, a systems analyst performs the following tasks:

Interact with the customers to know their requirements


Interact with designers to convey the possible interface of the software
Interact/guide the coders/developers to keep track of system development
Perform system testing with sample/live data with the help of testers
Implement the new system
Prepare documentation

Many systems analysts have morphed into business analysts.


Programmers
Writes, tests, modifies computer programs. This person must be able to communicate effectively, write
documentation, conduct training, consult with users, engineers etc. He/She also writes user manuals,
communicates with users and trains them. [Please note that programming languages are numerous and
change from time to time. Programmers should therefore have the ability to learn new languages on their
own as technology changes.

Copyright G. Campbell 2010

37

Database Management
Database Administrators
A database administrator (DBA) is a person who is responsible for the environmental aspects of
a database. Managing a companys database requires a great deal of coordination. The role of
coordinating the use of the database belongs to the database administrator (DBA). The duty
of a database administrator varies depending on job description, corporate and IT policies and
the technical features and capabilities of the database management systems (DBMSes) being
administered. They nearly always include disaster recovery (backups and testing of backups),
performance analysis and tuning, and some database design or assistance thereof.
Database administrators work with database management systems software and determine ways
to organize and store data. They identify user requirements, set up computer databases, and test
and coordinate modifications to the computer database systems. An organizations database
administrator ensures the performance of the system, understands the platform on which the
database runs, and adds new users to the system. Because they also may design and implement
system security, database administrators often plan and coordinate security measures. With the
volume of sensitive data generated every second growing rapidly, data integrity, backup systems,
and database security have become increasingly important aspects of the job of database
administrators. Their salaries range from $65,000US to $86,000US depending on qualifications
and experience.
The administrative controls carried out by the DBA therefore include the following:
Select and implement the DBMS
Develop database models (e.g. Entity relationship diagrams)
Create and maintain the data dictionary.1 This includes documentation of the data
dictionary.
Ensures that the database structure is documented
Provides manuals describing the facilities the database offers and how to make use of
these facilities. Provides the facilities for retrieving data and for structuring reports are
appropriate to the needs of organization
Manages and evaluates security of the database. (Includes backup and recovery
Verifies database integrity
Monitors performance of the database
Recoverability - Checks backup and recovery/restore procedures
Perform archiving (backup and remove historical data from current files)
Appraise the performance of the database and takes corrective actions if performance
degrades.
Periodic appraisal of the data to ensure it is complete, accurate and not duplicated.
(Monitor performance).
Availability ensures that the database is running when necessary
Use query languages to obtain reports of the information in the database

A data dictionary (also called repository) is a DBMS element that contains data about each
table in a database and each field within those tables.

Copyright G. Campbell 2010

38

Database Management
Although not strictly part of a database administrator's duties, logical and physical design of
databases is sometimes part of the job. These functions are traditionally thought of as being the
duties of a database analyst or database designer.
[Research Salaries of the above job titles in various companies]

Copyright G. Campbell 2010

39

Database Management

Database Design Conceptual, Logical, Physical


Database Design is the process of developing a database structure from user requirements. It is
the process of producing a detailed data model of a database. This logical data model contains all
the needed logical and physical design choices and physical storage parameters needed to
generate a design in a Data Definition Language, which can then be used to create a database. A
fully attributed data model contains detailed attributes for each entity. The term database design
can be used to describe many different parts of the design of an overall database system.
Principally, and most correctly, it can be thought of as the logical design of the base data
structures used to store the data. In the relational model these are the tables and views. In an
object database the entities and relationships map directly to object classes and named
relationships. However, the term database design could also be used to apply to the overall
process of designing, not just the base data structures, but also the forms and queries used as part
of the overall database application within the database management system (DBMS).
The Database Design Process
The process of designing a database generally consists of a number of steps which will be carried
out by the database designer. Not all of these steps will be necessary in all cases. Usually, the
designer must:

Determine the data to be stored in the database


Determine the relationships between the different data elements
Superimpose a logical structure upon the data on the basis of these relationships.

Determining data to be stored


In a majority of cases, the person who is doing the design of a database is a person with expertise
in the area of database design, rather than expertise in the domain from which the data to be
stored is drawn e.g. financial information, biological information etc. Therefore the data to be
stored in the database must be determined in cooperation with a person who does have expertise
in that domain, and who is aware of what data must be stored within the system.
This process is one which is generally considered part of requirements analysis, and requires
skill on the part of the database designer to elicit the needed information from those with the
domain knowledge. This is because those with the necessary domain knowledge frequently
cannot express clearly what their system requirements for the database are as they are
unaccustomed to thinking in terms of the discrete data elements which must be stored. Data to be
stored can be determined by Requirement Specification.

Copyright G. Campbell 2010

40

Database Management
Conceptual design
Once a database designer is aware of the data which is to be stored within the database,
they must then determine how the various pieces of that data relate to one another. When
performing this step, the designer is generally looking out for the dependencies in the data,
where one piece of information is dependent upon another i.e. when one piece of
information changes, the other will also. For example, in a list of names and addresses,
assuming the normal situation where two people can have the same address, but one person
cannot have two addresses; the name is dependent upon the address, because if the address
is different then the associated name is different too. However, the inverse is not
necessarily true, i.e. when the name changes address may be the same.
Logical Design
This involves the design of the entire information content of the database. It is the
consolidation of all user requirements into a DBMS-independent information structure
(conceptual schema). The conceptual schema accurately models the real world organization
and its important data elements and relationships. The conceptual schema normally used is
the ERD. Once the relationships and dependencies amongst the various pieces of
information have been determined, it is possible to arrange the data into a logical structure
which can then be mapped into the storage objects supported by the database management
system.
In the case of relational databases the storage objects are normalized tables which store
data in rows and columns. Each table may represent an implementation of either a logical
object or a relationship joining one or more instances of one or more logical objects.
Relationships between tables may then be stored as links connecting child tables with
parents. Since complex logical relationships are themselves tables they will probably have
links to more than one parent.
In an Object database the storage objects correspond directly to the objects used by the
Object-oriented programming language used to write the applications that will manage and
access the data. The relationships may be defined as attributes of the object classes
involved or as methods that operate on the object classes.
Logical design results in the logical database structure.
Physical Design
This results in a physical database structure which is developed from the logical structure.
This determines the layout or configuration on secondary storage.
In other words the physical design of the database specifies the physical configuration of
the database on the storage media. This includes detailed specification of data elements,
data types, indexing options, and other parameters residing in the DBMS data dictionary. It
is the detailed design of a system that includes modules & the database's hardware &
software specifications of the system.

Copyright G. Campbell 2010

41

Database Management

Physical design can be roughly divided into 3 steps:


Stored record format design - concerned with the problem of formatting stored data by
analysis of the characteristics of data item types, distribution of data item values, their
usage of various applications.
Stored record clustering - physical allocation of stored records. Record clustering
places the same or different record types together in blocks on the storage device.
Access method design - provide storage and retrieval capabilities for data stored on
physical devices.

Copyright G. Campbell 2010

42

Database Management

Database Schema or Levels of abstraction in specifying a database structure

Look at the diagram above. Which is easier to see, the rabbit or the duck?
 Just as how different persons perceive the illusions in different ways, different users will view the
data in different ways.
 Database schema is therefore based on how one views the data. E.g. Data can be viewed as
entities with attributes or it can be viewed as groups of bits.

Definition of database schema


Database schema defines a databases structure, its tables, relationships, domains, and
business rules. Database schema is a design, the foundation on which the database and the
application are built.
Explanation of the four database schema
Conceptual schema - consists of attributes, entities, relationships
The conceptual schema is also called the logical model, and is the basic database model, which
deals with organizational structures that are used to define database structures such as tables and
constraints.
This represents a global view of the data. It is an enterprise-wide representation of data as
viewed by high-level managers. This model is the basis for the identification and description of
the main data objects, avoiding details. The most widely used conceptual model is the entity
relationship (E-R) model. Using the E-R model yields the conceptual schema, which is, in effect
the basic database blueprint. In other words, this schema is used to design the database structure.
Conceptual schema provides a relatively easily understood birds eye view of the data
environment. The conceptual schema is independent of both software and hardware. Software
independence means that the model does not depend on the DBMS software used to implement
the model. Hardware independence means that the model does not depend on the hardware used
in the implementation of the model. Therefore, changes in either the hardware or the DBMS
software will have no effect on the database design at the conceptual level.
Internal schema - physical view - what analyst/programmer sees
Once a specific DBMS has been selected, the internal model adapts the conceptual model to a
specific DBMS. In other words, the internal model requires the database designer to match the
conceptual models characteristics and constraints to those of the selected database model. The
database designer will, for example, see the specific tables in the database and know which fields

Copyright G. Campbell 2010

43

Database Management
are on which table. Because the internal model depends on the existence of specific database
software, it is said to be software-dependent. Therefore a change in the DBMS requires that the
internal model be changed to fit the DBMSs characteristics and requirements. [e.g. currency
datatype is not on all DBMSes]
The development of a detailed internal model is especially important to database designers who
work with certain database models that require very precise specification of data storage location
and data access paths. In contrast, the relational database model requires less detail in its internal
model because most RDBMSes handle data access path definition transparently that is, the
designer need not be aware of the data access path details. Nevertheless, even relational database
software usually requires data storage location specification, especially in a mainframe
environment. For example, DB2 requires that the data storage group, the location of the database
within the group and the location of the tables within the database be specified.
The internal model is still hardware- independent because it is unaffected by the choice of the
computer in which the software is installed. Therefore a change in storage devices or even a
change in operating system will not affect the internal models design requirements.
External schema - applications programmer or end user view
This is based on the internal model. It is the end users view of the data environment or the
applications interface. It deals with methods through which users may access the data (e.g.
through the use of a data input form). By end users we mean the people who use the application
programs as well as those who designed and implemented them. Whereas the database designer
will know that fields are located on different tables, an end user may see every field on one
screen (form). This user therefore views the fields as if they were on one table. The end user will
not need to know that the data is separated into different tables. Some fields may also be missing
from the users screen. The user does not necessarily need to know about these fields in order to
perform his tasks.
Physical schema way data is stored on secondary storage
The lowest level of abstraction describes the way data is saved on storage media such as disks or
tapes. This model requires the definition of both the physical storage devices and the physical
access methods required to reach the data within those storage devices. It is both software and
hardware dependent. A change to either the DBMS software or hardware would require a change
to the database model.
Attributes of storage media
o Tracks
Bits (0s and 1s) are the smallest unit of data. The bits are commonly stored on tracks. The 0
is a non magnetized spot on magnetic storage devices or as pits (holes) burnt in the surface of
optical storage devices.
o Sectors
Data can be grouped in blocks called sectors. A sector on magnetic disk for example is in the
shape of a pizza slice/wedge. The block is read or written to at once.

Copyright G. Campbell 2010

44

Database Management

File Organization and Access Methods

When data are stored on secondary storage devices, the method of organization chosen will
determine how the data can be accessed. In turn, this will affect the types of applications that
can use the data, as well as the time and cost necessary to do so.
a) Sequential
With sequential file organization records are stored physically in order in the file. This
can be in alphabetical or numerical order. Processing begins at the first logical record and
proceeds through each record in the file until the final record has been read or written.
Records cannot be inserted in the middle of the file. In order to modify a file the original
file (master) is changed by creating transactions in a transaction file. The transaction file
is processed and a new master file is created based on the transactions. Any type of
storage device can access sequential files. Magnetic tape is a sequential access device and
can only use sequential files.
Advantages
 This method can use magnetic tape which is the least expensive method of
storage.
 It is the most efficient form of organization when the entire file, or most of it,
must be processed at once.
 Transaction and old master files act as a backup, should the new master file be
damaged or destroyed.
Disadvantages
 This method can be slow when trying to locate a record near to the end of the file.
 The entire file must be processed and a new master file created even if only one
record requires maintenance or updating.
b) Indexed
Records have a unique key which is a pointer to the record in order to access them. The
pointers exist in an index file (separate from data file) and direct you to the next logical
record. The records are not physically in logical order. In order to access the file
sequentially, you follow the sequence of the pointers. Files can be ordered in many ways
by using more than one sets of pointers.
For example, Alice is the first logical record, the next record logically (alphabetically) is
Boris. The pointer after Alice therefore says 4. In other words, if you want to know what
the record is after Alice, go to record number 4 to find it.

Copyright G. Campbell 2010

45

Database Management
Physical Rec#

Data

Index/Pointer

1
2
3
4
5

Mary
Alice
Jane
Boris
Peter

5
4
1
3
2

Advantages
 Data can be accessed sequentially or directly.
 No transaction files are maintained.
 If an index is lost, the data still exists.
Disadvantages
 Indexes lower efficiency
 Indexes can be damaged, thus the sequencing is lost.
 There is no backup of the master file. Procedures must be established to ensure
the regular creation of backup files.
c) Direct (Random)
The data in this method may be organized in such a way that they are scattered
throughout the disk in what may appear to be a random order. Direct access permits
access to any record without the necessity to read other records in the file. To accomplish
this each record is uniquely identified by a key. The key is used to calculate an address
for the record. This method is known as hashing. Hashing is a method used for
determining the physical location of a record. In this method, the primary key is
processed mathematically and another number is computed that represents the location
where the record will be stored. When a user retrieves the record, its key is entered, and
the hashing routine is used to determine where the record can be found. The problem with
hashing however is that different keys processed can sometimes result in the same
number or the same storage locations, leading to collisions. The second record must
then be stored in an overflow area. This reduces the efficiency of the retrieval process,
because the search for the right record becomes more complex through the use of
overflow areas and thus becomes more time-consuming.
Once accessed, a record can be read or updated. This method requires the use of direct
access devices such as magnetic disk.
Advantages
 Data can be accessed directly and quickly.
 Files can also be processed sequentially.
 Data is easily kept up-to-date.
Disadvantages
 This is more expensive than sequential.
[Research - The levels of the ANSI/SPARC database architecture]

Copyright G. Campbell 2010

46

Database Management

Entity- Relationship Diagrams


A data model is a pictorial abstraction of the contents of a database. The major function of the
data model is to provide a simplified view of the database contents in a form that is easily
understood by the client, the end-user, the application programmer and the database designer.
The most popular diagramming technique that is used to create the data model is known as the
Entity - Relationship diagram.
An Entity-Relationship Diagram (ERD), also known as the Entity Relationship Model, is a
specialized graphic that shows the interrelationships between entities in a database. EntityRelationship diagrams (ERDs) emerged in the 1970's from work by Dr. Peter Chen and others.
They were looking for means to simplify the representation of large and complex data storage
concepts. The purpose of an ERD is to design a database structure. They can also be used with
clients to discuss business rules.
An entity is an object or event about which someone chooses to collect data. It may be a person,
place or thing etc. Examples are: student, car, employee, song, customer, library book, product,
patient. Entities can be thought of (roughly) as nouns. Entities are drawn as rectangles.
[Research: Weak entity2, cardinality, existence-dependent, supertype entity, subtype entity]
An entity has certain attributes. An attribute is a characteristic of an entity or it can be defined
as the data collected about the entity. Examples are: name, address, sex, date of birth, eye color,
title, product code, blood type etc. (The attributes equate to the fields/data items). A record
would form a collection of these data items. The attribute that would uniquely identify a
particular entity would be the primary key field. This field is the data item that uniquely
identifies the record.
Types of relationships
A relationship is an association between entities. A relationship captures how two or
more entities are related to one another. Relationships can be thought of (again, roughly)
as verbs. Examples: an owns relation between a company and a computer, a supervises
relation between an employee and a department, a performs relation between an artist and
a song.
There are three types of relationships that can exist between entities. These are:
One-to-one (1:1) i.e. Each entity A has only one entity B. E.g. A product can have
only one package.
One-to-many (1:m) i.e. each entity A has many entity Bs. Each B has only one A. E.g.
A teacher of this subject can have many students, but a student has only one teacher in
the subject.

Cannot be uniquely identified by its own attributes alone

Copyright G. Campbell 2010

47

Database Management
Many-to-many (m:n) i.e. Each entity A has many entity Bs. Each B has many As. E.g.
A doctor can have many patients, a patient can have many doctors.
The symbols used in an ERD
Entity represented by a rectangle
Relationship represented by a diamond or a line depending on convention
Type of relationship represented by a diamond with a number or lines depending on
convention
Attribute represented by ovals outside the entity or listed inside the entity
depending on convention

Sample ERDs

Convention 1 - Chen

In this convention, entities are represented by rectangles, relationships are represented by


diamonds and attributes by ovals. The name of the relationship is written in the diamond.
The name of the attribute is written in the oval. The oval is attached to the entity with a
line. The type of relationship is represented by 1, m or n. For example, based on the 3
diagrams above there is a one-to-many relationship between Dept and Emp; there is a
many-to-many relationship between Salesman and City; there is a one-to-one relationship
between Office and Emp. Dept has an attribute called manager.

Copyright G. Campbell 2010

48

Database Management
Convention 2 - Martin

In this convention, entities are represented by labelled rectangles. The label is the name
of the entity. Entity names should be singular nouns. Relationships are represented by a
solid line connecting two entities. The name of the relationships is written above the line.
Relationship names should be verbs. Attributes, when included, are listed inside the
entity rectangle (e.g. DeptID and ProjectID). Attributes which are identifiers are
underlined. Attribute names should be singular nouns. A one is represented by a single
line attached to the entity and a many is indicated by a crows foot or three lines. The
above diagram shows a one-to-many relationship (one department to many projects).
Mandatory existence is represented by placing a perpendicular bar on the line next to the
mandatory entity. Optional existence is represented by placing a circle on the line next to
the optional entity. The diagram shows that Departments are mandatory but Projects are
optional.

Copyright G. Campbell 2010

49

Database Management
Example of Creating the ERD
Consider a hospital:
Patients are treated in a single ward by the doctors assigned to them. Usually each patient will be
assigned a single doctor, but in rare cases they will have two.
Heathcare assistants also attend to the patients, a number of these are associated with each ward.
Initially the system will be concerned solely with drug treatment. Each patient is required to take a
variety of drugs a certain number of times per day and for varying lengths of time.
The system must record details concerning patient treatment and staff payment. Some staff are paid
part time and doctors and care assistants work varying amounts of overtime at varying rates (subject to
grade).
The system will also need to track what treatments are required for which patients and when and it
should be capable of calculating the cost of treatment per week for each patient (though it is currently
unclear to what use this information will be put).

How do we start the ERD?


1. Define Entities: these are usually nouns used in descriptions of the system, in the discussion of business
rules, or in documentation; identified in the narrative (see highlighted items above).
2. Define Relationships: these are usually verbs used in descriptions of the system or in discussion of the
business rules (entity ______ entity); identified in the narrative (see highlighted items above).
3. Add attributes to the relations; these are determined by the queries, and may also suggest new entities,
e.g. grade; or they may suggest the need for keys or identifiers.
4. What questions can we ask?
a. Which doctors work in which wards?
b. How much will be spent in a ward in a given week?
c. How much will a patient cost to treat?
d. How much does a doctor cost per week?
e. Which assistants can a patient expect to see?
f. Which drugs are being used?
5. Describe the type of relationship between the entities
Many-to-Many must be resolved to two one-to-manys with an additional entity
Usually automatically happens
Sometimes involves introduction of a link entity (which will be all foreign key) Examples:
Patient-Drug
6. This flexibility allows us to consider a variety of questions such as:
a. Which beds are free?
b. Which assistants work for Dr. X?
c. What is the least expensive prescription?
d. How many doctors are there in the hospital?
e. Which patients are family related?
7. Represent that information with symbols

Copyright G. Campbell 2010

50

Database Management

Entity and Referential Integrity


As a database designer you will discover that database integrity rules are essential if you are to
create a good database design. Although some Relational DBMS automatically enforce these
rules, you still need to be aware of them.
Entity integrity this states that all records must have a primary key and the primary key value
must never contain a null or undefined value. The purpose of this rule is to ensure that each
record within a table have a unique identity.
Referential integrity - this states that a foreign key must either have a null value or it must have a
matching primary key value in the table to which it is related- The purpose of this rule is to
ensure that there are no illegal entries within the relationship tables. It also prevents us from
deleting records whose primary key value has a corresponding match in a relationship table.

Copyright G. Campbell 2010

51

Database Management
ERD Exercises
Exercise 1
Man is married to wife
Manager manages employee
Lecturer teaches student
Student studies course
Exercise 2
An artist belongs to a band. The artist can make a CD if he wishes. A CD contains one or more
tracks on it.
Exercise 3
Consider a construction firm: Employees belong to a particular department. Each employee is
employed to perform a particular task. The system should capture the employees name, TRN,
address and other contact information. The system should also capture information about the
department such as department name, location, and supervisor. The system should capture
information about the task performed by an employee such as description, date assigned,
deadline date, and hourly rate.
Exercise 4
A Sales Rep serves none, one or more customers at a time. A customer can place as many orders
as he would like to. An order lists one or many products. Products that are available are stored in
the company warehouse.
Exercise 5
A company has several departments. Each department has a supervisor and at least one
employee. Employees must be assigned to at least one, but possibly more departments. At least
one employee is assigned to a project, but an employee may be on vacation and not assigned to
any projects. The important data fields are the names of the departments, projects, supervisors
and employees, as well as the supervisor and employee number and a unique project number.
Exercise 6
A Metropolitan Bus Company owns a number of buses. Each bus is allocated to a particular route,
although some routes may have several buses. Each route passes through a number of towns.
One or more drivers are allocated to each stage of a route, which corresponds to a journey
through some or all of the towns on a route. Some of the towns have a garage where buses are
kept and each bus is identified by the registration number and can carry different numbers of
passengers, since the vehicles vary in size and can be single or double-decked. Each route is
identified by a route number and information is available on the average number of passengers
carried per day for each route. Drivers have an employee number, name, address, and
sometimes a telephone number.
[Entities: Bus, Route, Town, Driver, Stage. Relationships: Bus-route - is serviced by / routestage comprises / driver-stage - is allocated / stage-town - passes-through/
route-town passes-through / garage-town - is situated/ garage-bus - is garaged]
[Research Martin & Chen]

Copyright G. Campbell 2010

52

Database Management

Functional Dependencies
Definition - Let R(A1, A2, ., An) be a relational schema (i.e. a relation/table with
attributes A1 etc.), and let X and Y be subsets of (A1, A2, ., An) - we allow for the case
where X and Y are composite. We say that X functionally determines Y (or Y is
functionally dependent on X), written as X --> Y, if for each value of X there exists
exactly one value of Y.
A functional dependency allows us to use the value of one attribute and predict the value
of another attribute.
Example
SUPPLIERS (name, address, item, price)
Here are 2 FDs
name --> address - Given a particular value of name there exists precisely one
corresponding value for each address.
name + item --> price.
NB. If X is the primary key (or a candidate key) then all attributes Y or relation R must be
functionally dependent on X.
Computation of Closures
Definition - Let F be the set of functional dependencies for relation R, and let X --> Y be
a given functional dependency. Then F logically implies X --> Y (written F |= = X --> Y)
if every relation (instance r of R) that satisfies the dependencies in F also satisfies X -->
Y.
Example
{ A --> B, B --> C } |= = A --> C.
Definition - The closure of F, F+, is the set of FDs that are logically implied by F, that is,
F+ = {X --> Y : F |= = X --> Y}. If have a set of FDs then closure is another set of FDs
that is implies.
Closure can be used to find keys of a relation.
Definition - Consider the relational schema R(A1, A2, ., An) and the set of FDs F, and
let X be a subset of {A1, A2, , An}. Then X is a (candidate) key if :
(a) X --> {A1, A2, ., An} is in F+.
(b) X is a minimal key, that is, for no proper subset Y X is Y --> {A1, A2, . An) in
F+.
Example

Copyright G. Campbell 2010

53

Database Management
Let R(A, B, C) and F = {A --> B, B --> C}. What is the key?
F |= = A --> C if every A gives a value C.
If F |= = A --> C then it follows therefore that A --> B, A --> C, A --> A, that is, A -->
{A, B, C}. Since A is a single attribute, it has no proper subsets. Hence A is a key.
Algorithm for finding the closure of a set of attributes
Given a set of attributes U, a set of FDs F, and a set X U. To find X+, the closure of X.
Method
1. X(0) is X
2. X(i+1) is X(i) plus (i.e. unioned with) the set of attributes A such that there is some
dependency Y --> Z, in F, such that A is in Z, and Y X(i). (i.e. X(i) --> Z so X(i) U Z)
Note: We will eventually reach i such that X(i) = X(i+1) . There is then no need to compute
beyond X(i) once we discover that X(i) = X(i+1) . Also the process terminates if X(i) = U. If
X(i) = U then X is a key.
Example
Given relation R (city, st, zip) and nontrivial FDs city + st --> zip and zip --> city. To
show that city + st is a key.
Let X = {city, st}.
Using the above algorithm, we have
X(0) = X = {city, st}
we now look for dependencies of the form city --> Q1, st --> Q2, or city, st --> Q3. (city
and st are subsets of X(0)).
If all 3 exist, then
X(1) = X(0) U Q1 U Q2 U Q3.
There is one such dependency, namely, city, st --> zip
Hence, X(1) = X(0) U zip = {city, st} U zip = {city, st, zip}, But {city, st, zip} = U
Hence X+ = U, and X = {city, st} is a key.
Closure Exercises
Exercise 1: By computing its closure, show that (i) st, zip is a key, (ii) city, zip is not a
key.
Exercise 2: Given Supplier (name, address, item, price) and F = {name --> address, name
+item --> price}. Show that name + item is a key. Show if address, price is a key.
Exercise 3: Given R(name, job, dept) and job, dept -> name and name -> dept. Determine
if job, dept or dept, name or job, name is a key.

Copyright G. Campbell 2010

54

Database Management

Armstrongs Axioms
These are rules used to determine/generate dependencies from other dependencies.
Note: Armstrongs axioms are sound and complete. They are sound because they do not
generate any incorrect dependencies. They are complete because all FDs implied by F can be
derived from F using the axioms.
Given a relational scheme R, as set of attributes U and a set of functional dependencies F, the
axioms are as follows:
Reflexivity
IF Y X U, then X --> Y is logically implied by F.
e.g. if X = name + item and Y= item (i.e. Y X) then name + item --> item.
Augmentation
If X --> Y holds, and Z U, then XZ --> YZ.
e.g. if item --> price then item + name --> price + name
Transitivity
If X --> Y and Y --> Z, then X --> Z.
Examples
Given the relation R (city, st, zip) and nontrivial FDs
city + st --> zip
zip --> city
to show that both city + st and st + zip are keys for R.
(a) zip --> city (given)
(b) zip st --> city st (augmentation using (a) )
(c) city st --> zip (given)
(d) city st --> city st zip (augmentation using (c) )
(e) st zip --> city st zip (transitivity using (b) and (d) ).
Hence from (d) and (e) both city st and st zip are keys for R. [Both determine all fields]
EXERCISE
Given R(TRN, Name, Age, Addr, Year) and FDs
TRN, addr -> Age
Age -> Year
Use Armstrongs Axioms to come up with other FDs.

Copyright G. Campbell 2010

55

Database Management

Covers and their role in determining redundant FDs


If F and G are sets of dependencies, then F is equivalent to G if F+ = G+. In that case we say that
F covers G (and G covers F).
To test whether F and G are equivalent, we must show that every dependency in F is in G+ and
that every dependency in G is in F+.
In designing databases we ensure that the set of functional dependencies for a given schema is
minimal, that is, that there are no redundant dependencies. We say that a set of dependencies F
is a minimal cover, Fm, if:
1. Every right hand side of a dependency F is a single attribute. If any r.h.s. has more than
1 attribute, then split it.
2. For no X --> A in F is the set F - {X--> A} equivalent to F. That is, no dependency in
F is redundant.
3. For no X-> A in F and proper subset Z of X is F - {X--> A} U {Z --> A} equivalent to
F. That is, no attribute on the l.h.s. of any FD in F is redundant.
Note: Every set of dependencies F is equivalent to a set Fm that is minimal.
Example
Consider the set F = {A-->B, B-->A, B-->C, A-->C, C-->A}. A minimal cover, found by
eliminating the dependencies B-->A and A-->C, is given by Fm = {A-->B, B-->C, C-->A}.
Algorithm to find redundant FDs.
1. Choose an FD, say X-->Y, and remove it from the set of FDs
2. result = X;
while (result changes and Y is not contained in result) do
for each FD, A-->B, remaining in the reduced set of FDs
if A is a subset of result then
result = result U B
end
3. if Y is a subset of result then
FD X -->Y is redundant.
Exercises - Find the redundant FDs in the following sets:
a) Colour -> Density,
Density -> Elasticity,
b) Lime -->Melon,
Lime -->Naseberry,
Lime Melon -->Naseberry
c) Name -> Addr,
name, item -> price,
d) Name -> id,
Id -> name,
Id -> dept
e) Flight# -> destination,
destination, arrival time -> flight#,

Copyright G. Campbell 2010

Colour -> Elasticity


Naseberry Melon--> Orange,
name -> price
Name, id -> dept,
Origin, destination -> flight#,
origin, flight# -> origin

56

Database Management

1st , 2nd , 3rd Normal Forms


Definition - An attribute of relation R is prime (sometimes called key) attribute if it participates
in a key.
Example : If A + B + C is a key for a relation R, then attributes A, B, and C are prime attributes.
Definition - Normalization is a process of obtaining stable groupings of attributes into relations.
It is a process of decomposing a table into smaller, simpler tables. In addition to being simpler
and more stable, normalized tables are more easily maintained. Normalization is the process of
eliminating data redundancies and data anomalies from table structures by applying various rules
called normal forms.
Normalization organizes a database into one of several forms to remove ambiguous relationships
between data and minimize data redundancy. In zero normal form (0NF), the database is
completely non-normalized/unnormalized, and all of the data fields are included in one relation
or table. The table has large rows due to the repeating groups and wastes disk space. There is
also at least one value that is not atomic (that is, it can be decomposed further).
Note: When you break down a table into simpler tables always ensure that there is a common
field that you will be able to use to join the tables back together for queries.
The normalization process starts with unnormalized relations - where at least one value is not
atomic (that is, it can be decomposed further.
Example
S#
S1

S2
S3

PQ
P#
P1
P2
P3
P4
P1
P2
P2

QTY
300
200
400
200
300
400
200

The field PQ can be broken down into P# (part number) and QTY (quantity).
Other examples include:
NAME can be broken down into FIRSTNAME, MIDDLENAME, LASTNAME.
ADDRESS can be broken down into STREET#, STREET, CITY, ZIPCODE etc.
Definition - A relation is in first normal form (1NF) if:
every attribute is a simple (atomic) attribute.
A table or relation is in first normal form (1NF) if: every field is a simple
(atomic) field. A simple, atomic field is one that cannot be broken down further.
A table is also in first normal form (1NF) if it contains no repeating groups. Note:
Every normalized table is in 1NF. 1NF violations cause data redundancy, which

Copyright G. Campbell 2010

57

Database Management
may lead to data inconsistencies, poor data integrity, wastage of space, data
anomalies etc. To convert to 1NF you should break down fields to their simplest
and remove any repeating groups into another table.
Example: We can convert the above to 1NF as follows:
S#
S1
S1
S1
S1
S2
S2
S3

P#
P1
P2
P3
P4
P1
P2
P2

QTY
300
200
400
200
300
400
200

Definition - A relation is in second normal form (2NF) if:


a) it is in 1NF and
b) it has no partial dependencies of nonprime (nonkey) attributes on keys. That is
every nonprime attribute is fully dependent on the primary key. [Primary key -->
all attributes.]
[NB. An attribute of relation R is prime (sometimes called key) attribute
if it participates in a key.
Example : If A + B + C is a key for a table R, then attributes A, B, and C
are prime attributes.]
Example
Cars (model, cylinder#, origin, tax, fee). Key is model + cylinder#, and FD is
model --> origin.
Table cars in not in 2NF because origin is non prime and not fully dependent on
model and cylinder# (i.e. key).
Model and cylinder# are prime, origin, tax, fee are non prime.
Definition - A relation is in 3rd normal form (3NF) if:
a) it is in 2NF and
b) it has no transitive dependencies of nonprime attributes on keys (i.e. A --> B,
B --> C means A --> C).
Example
employee (emp#, dept, location). Suppose emp# is a key, Employee is not in 3NF
because there is a transitive dependency of location on the key, emp#.
Emp# --> dept, dept --> location, emp# --> location.
In order to convert to 3NF we need to remove or break up the transitive
dependencies.
Example: if X --> Y and Y --> Z then X and Y will remain on one table with X
being the key and Y and Z would be on the other table with Y being the key. All
fields dependent on X would be on one table and all fields dependent on Y would

Copyright G. Campbell 2010

58

Database Management
be on another table. Y will be the common field that will be used to join the tables
for the running of queries.
From the Employee table above, we would therefore place emp# and dept on one
table (key emp#) and dept and location on another table (key dept).
[Research 4NF, BCNF]

Comprehensive example (1NF to 3NF)


1NF
S# Status City
P# Qty
S1 20
LONDON P1 300
S1 20
LONDON P2 200
S1 20
LONDON P3 400
S1 20
LONDON P4 200
S1 20
LONDON P5 100
S1 20
LONDON P6 100
S2 10
PARIS
P1 300
S2 10
PARIS
P2 400
S3 10
PARIS
P2 200
S4 20
LONDON P2 200
S4 20
LONDON P4 300
S4 20
LONDON P5 400
S# + P# is a key, S# --> status, S# --> city, city --> status
Problems
We cannot insert the fact that a supplier is in a given city until he supplies at least one
part
Deletion of a row for a given supplier destroys additional info
Redundancy can result in long searches and inconsistency (if change one row have to
make same change in another). Example: Suppose supplier S1 changes status to 30,
then all 6 rows would have to be modified
2NF
To change to 2NF we ensure that everything is fully dependent on the key. Fields that are
not fully dependent on the key should be moved to a separate table. Only fields fully
dependent on the key should remain in the original table. The fields status and city should
therefore be placed in their own table and the key for that table is the field that they are
functionally dependent on (which is S#).

Copyright G. Campbell 2010

59

Database Management

S#
S1
S2
S3
S4
S5

Status
20
10
10
20
30

City
LONDON
PARIS
PARIS
LONDON
ATHENS

S# P# Qty
S1 P1 300
S1 P2 200
S1 P3 400
S1 P4 200
S1 P5 100
S1 P6 100
S2 P1 300
S2 P2 400
S3 P2 200
S4 P2 200
S4 P4 300
S4 P5 400
Note - We can now enter the fact that supplier S5 is located in Athens.
Problems
We cannot enter the fact that a given city has a given status until a supplier is located
in that city.
If we delete the only row for a city we destroy the fact that a city has a given status
value.
Status value occurs many times. Hence search and consistency problems.
3NF
Remove transitive dependencies S# --> city and city --> status
S# P#
Qty
S#
City
City
S1 P1
300
S1
LONDON
ATHENS
S1 P2
200
S2
PARIS
LONDON
S1 P3
400
S3
PARIS
PARIS
S1 P4
200
S4
LONDON
S1 P5
100
S5
ATHENS
S1 P6
100
S2 P1
300
S2 P2
400
S3 P2
200
S4 P2
200
S4 P4
300
S4 P5
400

Copyright G. Campbell 2010

Status
30
20
10

60

Database Management
Another example of the process.
Repeating groups are listed in parentheses (part a). The table has large rows due to the repeating
groups and wastes disk space when an order has only one item.
How do you identify repeating groups. Consider the Order table. For every order there will only
be one order number and date. Other items will be repeated. For example, for a particular order
we will have more than one product number, product name, quantity ordered etc. This is because
you are able to order more than one thing. Thus Product# to Vendor Name is shown twice to
facilitate two products.
To normalize the data from 0NF to 1NF (first normal form), you remove the repeating groups
(fields 3 through 7 and 8 through 12) and place them in a second table (part b). You then assign a
primary key to the second table (Line Item), by combining the primary key of the nonrepeating
group (Order #) with the primary key of the repeating group (Product #). Primary keys are
underlined to distinguish them from other fields.
To further normalize the database form 1NF to 2NF (second normal form), you remove partial
dependencies. A partial dependency exists when fields in the table depend on only part of the
primary key. In the Line Item Table (part b), Product Name is dependent on Product #, which is
only part of the primary key. Second normal form requires you to place the product information
in a separate Product table to remove the partial dependency (part c).
To move from 2NF to 3NF(third normal form), you remove transitive dependencies. A
transitive dependency exists when a nonprimary key field depends on another nonprimary field.
As shown part c, Vendor Name is dependent on Vendor #, both of which are nonprimary key
fields. If Vendor Name is left in the Order table, the database will store redundant data each time
a product is ordered from the same vendor.
Third normal form requires Vendor Name to be placed in a separate Vendor table, with Vendor #
as the primary key. The field that is the primary key in the new table - in this case, Vendor # also remains in the original table as a foreign key and is identified by a dotted underline (part d).
In 3NF, the database now is well organized into four separate tables and is easier to maintain.
For instance, to add, delete, or change a Vendor or Product Name, you make the change in just
one table.
Order Table
Order
#
1001

Order
Date
6/8/2004

Product
#
605

1002

6/10/2004

751

1003

6/10/2004

321

1004

6/11/2004

605

Product
Name
White
Copy
Paper
Ballpoi
nt pens
Ring
Binder
White
Copy
Paper

Copyright G. Campbell 2010

Qty
Ordered
2

Vendor
#
321

Vendor
Name
Hammer
mill

166

Pilot

12

450

Globe

321

Hammer
mill

Product
#
203

Product
Name
CD
Jewel
Cases

Qty
Ordered
5

Vendor #
110

Vendor
Name
Fellowes

102

File
Folders

450

Globe

61

Database Management

Order
Order #
1001
1002
1003
1004

Order Date
6/8/2004
6/10/2004
6/10/2004
6/11/2004

Line Item
Order#
Product #
1001605`
1001203
1002751
1003321
1004605
1004102

Qty Ordered

Vendor#

2
5
6
12
2
2

321
110
165
450
321
450

Product
Product #
1002
203
321
605
751

Product Name
File Folders
CD Jewel Cases
Ring Binder
White Copy Paper
Ballpoint pens

Vendor
Vendor #
110
166
321
450

Vendor Name
Fellowes
Pilot
Hammermill
Globe

a) Zero Normal Form (0NF)


(Order #, Order Date, (Product #, Product Name, Quantity Ordered, Vendor #, Vendor Name))
b) First Normal Form (1NF)
Order (Order #, Order Date)
Line Item (Order # + Product #, Product Name, Quantity Ordered, Vendor #, Vendor Name)
c) Second Normal Form (2NF)
Order (Order #, Order Date)
Line Item (Order # + Product #, Quantity Ordered, Vendor #, Vendor Name)
Product (Product #, Product Name)
d) Third Normal Form (3NF)
Order (Order #, Order Date)
Line Item (Order# +Product #, Quantity Ordered, Vendor #)
Product (Product #, Product Name)
Vendor (Vendor #, Vendor Name)

Copyright G. Campbell 2010

62

Database Management
Normalization Exercises to 3NF.
Exercise 1 - PatientDrug Table Structure
PatientI
D
9876765

Drug

7654433

Patient
Name
Brown,
Karen
Green, Ann

9876567

Dunn, Mary

Clidets

8768888

Allen, Oscar

Ventolin

9877771

Jones, Bob

6512334

Harris, Kay

Panadein
e
Tavegyl

Tricepta
n
Tavegyl

Trade
Name
Tegretol

Formulat
ion
Tablets

Size

Dose

Frequency

Side Effect

Drug

100mg

30mg

Once a day

Hatce
ptan

Antihista
mine
Cyomisti
n
Inhalado
r
Panadol
ET
Antihista
mine

Liquid

200ml

10ml

Ointmen
t
Gas

100ml

2ml

20oz

1oz

Twice a
day
Every two
hours
Once a day

Stomach
Cramps
Headache

Tablets

100mg

5mg

Liquid

200ml

10ml

Twice a
day
Twice a
day

Trade
Name
Smithcline

Formulatio
n
Capsules

Size

Dose

Frequency

200mg

30mg

Once a day

PanadolET

Tablets

100mg

5mg

Twice a day

Side
Effect

Kidney
damage
Panad
eine

Indigestio
n

Indigestion
Headache

The key is PatientID & Drug


The FDs are:
PatientID --> PatientName
PatientID, Drug --> Frequency
Drug --> TradeName
Drug --> Size
Size --> SideEffect

Copyright G. Campbell 2010

PatientID, Drug --> Dose


Drug --> Formulation
Drug --> SideEffect

63

Database Management
Exercise 2 Sales Table
Salesman#
3462
3462
4578
1111
1111
6765

Salesman
Name
Walters,
Kevin
Walters,
Kevin
Allen, Ian
Matthews,
Joan
Matthews,
Joan
Brown,
Johnathan

Sales
Area
West

Customer#

West

18830

East
West

32112
98787

West

98799

North

87889

18765

Customer
Name
Delta
Services
Levy &
Sons
Johnsons
Facey

Warehouse#
4

Warehouse
Location
Fargo

Sales
Amount
13, 540

Bismarck

10,600

5
4

Goshen
Fargo

14,800
45,890

Websters
Inc
Taino
Limited

Portsmouth

34,877

Ferry

40,000

The key is Salesman# and Customer#


The FDs are:
Salesman# --> Salesman Name
Customer# --> Customer Name
Customer# --> Warehouse Location
Salesman#, Customer# --> Sales Amount
Warehouse# --> Warehouse Location

Copyright G. Campbell 2010

Salesman# --> Sales Area


Customer# --> Warehouse#

64

Database Management
Assessment of file layouts as they affect the functioning of a database.
It is important to evaluate the performance characteristics of the physical model
before implementing the database. Once the database is installed it is difficult or
impossible to redesign it. The performance parameters normally used are the space
estimates and time estimates. Both of these parameters are predictable. The database
designer should therefore try to optimize the physical model for space and time
considerations.
Note trade-offs between space and time - I/O can be reduced if some redundant data
is carried, but not having redundant data can save space but cost more time.
Physical and logical data organization.
Logical
Simplicity is important

Physical
Complex organizations may be
important. Software hides the
complexity.

Data independence is of prime


importance. (This gives the DBA the
freedom to change both the physical and
logical aspects of the database system
without disturbing the applications built
on the database.)

Data independence is of little concern


if facilities are provided for
restructuring the physical data.

Application program requests correspond


to the logical data structure. Program does
not care about physical layout of data.

Application programs requests are


usually unrelated to data storage.

Efficient use of storage is of a little


concern. E.g. 1 file vs 2 files etc.

Efficient use of storage is of major


concern.

High level of redundancy often exists


between logical files.

Elimination of redundancy is an
objective of physical organization.

Logical organization must be stable so


that programs do not have to be rewritten.

Physical layout may be changeable,


designed for periodic reorganization.

Means of finding/addressing data does not


have a major effect on logical structures.

Addressing techniques have a major


effect on physical storage layout.
Methods of locating data depends on
how data is physically laid out.

E.g. Name, id#, address, subject, grade

E.g. Name, Id#, address in one file


Id#, subject, grade in another file

Copyright G. Campbell 2010

65

Database Management

UNIT III: INTRODUCTION TO RELATIONAL ALGEBRA AND SQL


The languages used in database systems
A 4GL (4th generation language) is a product that aids the development of new
systems. They are called 4th generation because they work at higher level than normal
high level languages such as COBOL or Pascal. Most 4GLs make use of relational
databases, which themselves have query languages which perform operations at a
very high level. Some 4GLs are actually the combination of a database query
language and other facilities.
Features of a 4GL
Defines data
Define what processing must be performed on the data
Define report or screen format
Define input data and validation checks
Handle user queries
The role of Relational DMLs and DDLs.
Some databases have their own computer languages associated with them, which
allow the user to access and retrieve data. Other databases are only accessed via third
generation languages.
Data descriptions must be standardized, for this reason Data Description Language
(DDL) is provided which must be used to specify the data in the database. Similarly, a
Data Manipulation Language (DML) is provided which must be used to access the
data. The combination of the DDL and DML is often called a Data Sub-Language
(DSL) or a query language.
Data Definition Language - The DDL is that portion of the DBMS, which allows
us to create and modify the structure of the database and the database tables. The
functions of a DDL may therefore include:
 Creating Database structures
 Creating table structures
 Associating fields with table structures
 Associating data types with field structures etc.
Data Manipulation Language - The DML is that portion of the DBMS, which
allows us to store, modify, and retrieve data from the database. There are two types
of DMLs: procedural DML and the nonprocedural DML.

Procedural DMLs require that the user specify the data that is needed
from the database and how to obtain it

Procedural DMLs are more difficult to use since they require that the user be
proficient in using the language commands to manipulate the structure and the
contents of the data file. On the other hand they are more flexible since they
allow the user to determine the method that is used for accessing and
manipulating the structure and contents of a file.

Copyright G. Campbell 2010

66

Database Management
Nonprocedural DMLs require that the user specify the data that is
needed from the database, but it does not allow the user to tell how to
obtain it

Nonprocedural DMLs are easier to use since they do not require a detailed
knowledge of the language commands, which are needed to manipulate the
structure and the contents of a data file. On the other hand they lack flexibility
since the programmer has no way of determining the method for accessing and
manipulating the contents of the data file. Please note that it is the
nonprocedural DML of a 4th Generational Language that allows it to exhibit
structural and data independence.
Query Language
The implementation of a query language is very vital for a DBMS. The query
language allows the end user to generate adhoc queries, which are
immediately answered. In most languages the DML and the query language
are one and the same.
Today, many DBMS also provide support for a standardized query language
that may be different from the DML of the language. This is known as the
Structured Query Language (SQL).
The difference between relational algebra and relational calculus.
Query languages can roughly be divided into two types:
Relational algebra - allows the user to explicitly describe how to find the
answer to the query. Uses specific operators to apply to tables. The
operators are join, projection, selection, union, set difference.
Relational calculus - queries describe a desired set of tuples by specifying a
predicate the tuples must satisfy. The user describes the answer but does
not give the algorithm for finding it. notation for formulating the
definition of that desired relation.

Copyright G. Campbell 2010

67

Database Management
Relational algebra
Relational Algebra is:
the formal description of how a relational database operates
an interface to the data stored in the database itself
the mathematics which underpin SQL operations
This section uses the sample tables below along with others to demonstrate how to
solve relational algebra problems.

a
d
c

A
b c
a f
b d

B
b e d
d a f

R
a
q
a

x
y
z

S
x
z

Simple projection
x,y (A) = Produces output showing only certain attributes (x, y) of table A.
Selection
x = 7 (A) Produces a subset of rows that match/satisfy a criteria (field x = 7).
Please note that projection and selection can be combined.
x,y ( x = 7 (A) )
OR
x = 7 ( x,y (A) )
Difference (or Set Difference)
A - B = rows in A but not in B
abc
cbd
Renaming
A rename is a unary operation written as a / b(R) where the result is identical to
R except that the b field in all tuples is renamed to an a field. This is simply used
to rename the attribute of a relation or the relation itself.
Union
for relations with same arity (number of attributes)
A U B = all rows appearing in both A and B without repeating duplicates.
abc
daf
cbd
bed
Intersection
A B = Builds a relation consisting of all tuples appearing in both files.
daf
Division
Takes 2 relations, one binary, one unary and builds a relation consisting of all
values of one attribute of the binary relation that match (in the other attribute)
all values in the unary relation.
R divided by S by matching x to x and z to z. Answer = a from other field.

Copyright G. Campbell 2010

68

Database Management
Another Example of Division

Join (natural, equi, inner, outer)


A
B = Builds a relation consisting of all possible concatenated pairs of
tuples one from each of the 2 files.

Natural join - dont repeat common field.


Opposite of natural join is the equi-join
-join using conditions
Outer Join - include rows in table A with no match.
There are three forms of the outer join, depending on which data is to be
kept.
o LEFT OUTER JOIN - keep data from the left-hand table
o RIGHT OUTER JOIN - keep data from the right-hand table
o FULL OUTER JOIN - keep data from both tables
Opposite of the outer join is the regular/semi-join/inner. The semi-join is
joining similar to the natural join and written as R S where R and S are
relations. The result of the semi-join is only the set of all tuples in R for
which there is a tuple in S that is equal on their common attribute names.
The antijoin, written as R S where R and S are relations, is similar to the
natural join, but the result of an antijoin is only those tuples in R for
which there is NOT a tuple in S that is equal on their common attribute
names.

Copyright G. Campbell 2010

69

Database Management
Example of Natural Join

Example of -join

Consider tables Car and Boat which list models of cars and boats and their respective
prices. Suppose a customer wants to buy a car and a boat, but she doesn't want to
spend more money for the boat than for the car. The -join on the relation CarPrice
BoatPrice produces a table with all the possible options.

Example of a semijoin

Example of Left Outer Join

Copyright G. Campbell 2010

70

Database Management
Example of Right Outer Join

Example of Full Outer Join

Example of an antijoin

Copyright G. Campbell 2010

71

Database Management
Cartesian product.
The Cartesian Product is also an operator which works on two sets. It is sometimes
called the CROSS PRODUCT or CROSS JOIN. It combines the tuples of one relation
with all the tuples of the other relation.
Cartesian Product Example

Copyright G. Campbell 2010

72

Database Management
Relational Algebra Exercises
Exercise 1
Key Club
Table
IdNumber
452145
785475
745874
745888
888999

Firstname
John
Heather
Michelle
Keith
Ingrid

Lastname
Jones
Coombs
Gentles
Smith
Harris

Age
18
22
20
25
30

Sex
M
F
F
M
F

Student
Council Table
IdNumber
785475
745874
362121

Firstname
Heather
Michelle
Philip

Lastname
Coombs
Gentles
Cameron

Age
22
20
19

Sex
F
F
M

Math Grades
Table
IdNumber
452145
785475
745874
745888
888999

Grade
56
99
82
65
70

Scholarship Grades Table


Grade
99
82
i)
ii)
iii)
iv)
v)
vi)

Key Club Student Council


Key Club Student Council
Key Club - Student Council
Firstname, Age (Key Club)
Firstname, Lastname ( Age < 21 (Student Council))
Math Grades Scholarship Grades

Copyright G. Campbell 2010

73

Database Management
Exercise 2 - Dec 2001 Past Paper Question 5
Given the files below, give the results for the relational algebra.
ICEPStudents
Idnumber
5
9
16

ClassCode
CSS
1D
3D
a)
b)
c)
d)
e)

Name
Karen Henry
Crystal Adobe
Donna Building

[20 marks]
ComputerStudents

ClassCode
2S
CSS
1D

Idnumber
4
9
22

Name
Ellen Albright
Crystal Adobe
Peter Rock

Classes

FinalYearClasses

ClassName
Cert in Computing
Year 1 Comp Major
Year 3 Comp Major

ClassCode
3D

ICEPStudents ComputerStudents
ICEPStudents ComputerStudents
ICEPStudents ComputerStudents
Classes FinalYearClasses
Name, ClassCode (ComputerStudents)

ClassCode
MIS
CSS
CSO

f) Idnumber > 6 (ICEPStudents)


g) Name, ClassCode ( Idnumber > 6 (ComputerStudents))
h) Idnumber > 6 ( Name (ICEPStudents))
i)

ICEPStudents

Classes (Equi, Regular)

j)

ICEPStudents

Classes (Outer, Natural)

Exercise 3

EMPNO
111
234
456
121

Employees
NAME
Adams
Henry
Gregg
Brown

Jobs
JOBNO
JOBTITLE
12
Mason
23
Carpenter
34
Plumber

JOBNO
34
23
23
78

EMPNO
456
789
369

Retired Employees
NAME
JOBNO
Gregg
23
Jones
12
Wilson
56

Insured Jobs
JOBNO
23

a) NAME, JOBNO (Retired Employees) [3 marks]


b) JOBNO, EMPNO ( JOBNO > 30 (Employees) ) [3 marks]
c) Employees Retired Employees [3 marks]
d) Retired Employees Employees [3 marks]
e) Employees Retired Employees [3 marks]
f) EMPNO > 200 (Employees) [3 marks]
g) Jobs Insured Jobs [3 marks]
Jobs (Outer, Natural) [4 marks]
h) Employees

Copyright G. Campbell 2010

74

Database Management
Exercise 4
a) Which relational algebra operation is unary?
b) If a Cartesian product is done from one table to itself, how would you prevent
duplicate field names?

Copyright G. Campbell 2010

75

Database Management
SQL Commands LAB PORTION
What is SQL?
Abbreviation of structured query language, and pronounced either see-kwell or as
separate letters. SQL is a standardized query language for requesting information
from a database. The original version called SEQUEL (structured English query
language) was designed by an IBM research center in 1974 and 1975. SQL was first
introduced as a commercial database system in 1979 by Oracle Corporation.
Historically, SQL has been the favorite query language for database management
systems running on minicomputers and mainframes. Increasingly, however, SQL is
being supported by PC database systems because it supports distributed databases
(databases that are spread out over several computer systems). This enables several
users on a local-area network to access the same database simultaneously.
Although there are different dialects of SQL, it is nevertheless the closest thing to a
standard query language that currently exists. In 1986, ANSI approved a rudimentary
version of SQL as the official standard, but most versions of SQL since then have
included many extensions to the ANSI standard. In 1991, ANSI updated the standard.
The new standard is known as SAG SQL.
Please note that SQL command syntax varies slightly from one DBMS to the other.
Please note that even though SQL is done in the lab, you are required to know the
syntax by heart for the written final exam.

Oracle command
At command line type CONNECT
User Name SYSTEM
Password ADMIN
MySQL command
Start run cmd <enter>
Mysql u gcampbell p h exedvhost1
Pwd gcampbell

[can use 10.10.5.141 instead of host name]

Brief Summary of Commands


1. Data Manipulation
Projection and Selection
SELECT [field(s)] FROM [file(s)] WHERE [condition] ORDER BY [field(s)]
GROUP BY [field] HAVING [condition]
[fields]
* all fields
count(*)
sum(salary)

field1, field2, . Fieldn


count(distinct dept)
also avg, min, max

substr(field, 1,4)
distinct(dept)
amount * 10

[files]
join
SELECT a.field, b.field OR SELECT file1.field, file2.field
Copyright G. Campbell 2010

76

Database Management
WHERE a.field = b.field
Union
SELECT stmt 1 UNION ALL SELECT stmt 2
Modification
UPDATE file SET field1 = value, field2 = field2 +20 WHERE [condition]
DELETE FROM file WHERE [condition]
INSERT INTO file VALUES (x, y, z)
INSERT INTO file SELECT stmt
WHERE Clause
Field IN (A, B, C)
Dept LIKE (A%)
Dept [NOT] LIKE (E_)
Dept between A and C
Salary < 200 OR/AND sex =F (>, <>, =, >=, <=)
ORDER BY Clause
ORDER BY name DESC, age ASC
second field)

OR

ORDER BY 2 (i.e.

HAVING Clause
Used with a GROUP BY. Sets conditions for summary (grouped) data.
HAVING count(*) > 3

E.g.

2. Data Definition
CREATE TABLE file (field1 CHAR (5) NOT NULL, field2 INT, field3 DEC(5,2))
CREATE [UNIQUE] INDEX indexname ON file (field1 ASC, field2 DESC)
CREATE VIEW viewname (field1, field2, field3) AS SELECT stmt
ALTER TABLE file ADD field CHAR(5)
DROP TABLE file
DROP INDEX indexname on tablename
DROP VIEW viewname
Control
GRANT SELECT ON file to PUBLIC
REVOKE SELECT ON file FROM PUBLIC
COMMIT
ROLLBACK
MySQL data types
Auto_increment
Char
Boolean
Data
Dec/Decimal
Double
Double precision
Float
Int/Integer

Copyright G. Campbell 2010

77

Database Management
CREATE TABLE (using constraints primary key, foreign key)
The SQL command for creating an empty table has the following form:
create table <table> (
<column 1> <data type> [not null] [unique] [<column constraint>],
.........
<column n> <data type> [not null] [unique] [<column constraint>],
[<table constraint(s)>] );
For each column, a name and a data type must be specified and the column name
must be unique within the table definition. Column definitions are separated by
comma. There is no difference between names in lower case letters and names in
upper case letters. In fact, the only place where upper and lower case letters matter are
strings comparisons. A not null constraint is directly specified after the data type of
the column and the constraint requires defined attribute values for that column,
different from null. The keyword unique specifies that no two records can have the
same attribute value for this column. Unless the condition not null is also specified for
this column, the attribute value null is allowed and two tuples having the attribute
value null for this column do not violate the constraint.
Example: The create table statement for the EMP table has the form
create table EMP (
EMPNO number(4) not null,
ENAME varchar2(30) not null,
JOB varchar2(10),
MGR number(4),
HIREDATE date,
SAL number(7,2),
DEPTNO number(2) );
NB: Except for the columns EMPNO and ENAME null values are allowed.
Oracle offers the following basic data types:
char(n): Fixed-length character data (string), n characters long. The
maximum size for n is 255 bytes (2000 in Oracle8). Note that a string of
type char is always padded on right with blanks to full length of n. (+ can
be memory consuming). Example: char(40)
varchar2(n): Variable-length character string. The maximum size for n is
2000 (4000 in Oracle8). Only the bytes used for a string require storage.
Example: varchar2(80)
number(o, d): Numeric data type for integers and reals. o = overall number
of digits, d= number of digits to the right of the decimal point.
Maximum values: o =38, d= 84 to +127. Examples: number(8),
number(5,2)
Note that, e.g., number(5,2) cannot contain anything larger than 999.99
without resulting in an error. Data types derived from number are
int[eger], dec[imal], smallint and real.
date: Date data type for storing date and time.
The default format for a date is: DD-MMM-YY. Examples: 13-OCT-94,
07-JAN-98

Copyright G. Campbell 2010

78

Database Management
long: Character data up to a length of 2GB. Only one long column is
allowed per table.

It should be noted that data types vary from one database to another.

The definition of a table may include the specification of integrity constraints.


Basically two types of constraints are provided: column constraints are associated
with a single column whereas table constraints are typically associated with more than
one column. However, any column constraint can also be formulated as a table
constraint.
The specification of a (simple) constraint has the following form:
[constraint <name>] primary key | unique | not null
A constraint can be named. It is advisable to name a constraint in order to get more
meaningful information when this constraint is violated due to, e.g., an insertion of a
record that violates the constraint. If no name is specified for the constraint, Oracle
automatically generates a name of the pattern SYS C<number>.
The two most simple types of constraints have already been discussed: not null and
unique. Probably the most important type of integrity constraints in a database are
primary key constraints. A primary key constraint enables a unique identification of
each record in a table.
Based on a primary key, the database system ensures that no duplicates appear in a
table.
Example:
create table EMP (
EMPNO number(4) constraint pk emp primary key, . . . );
For example, for our EMP table in the example above, the specification defines the
attribute EMPNO as the primary key for the table. Each value for the attribute
EMPNO thus must appear only once in the table EMP. A table, of course, may only
have one primary key. Note that in contrast to a unique constraint, null values are not
allowed.
Example:
We want to create a table called PROJECT to store information about projects. For
each project, we want to store the number and the name of the project, the employee
number of the projects manager, the budget and the number of persons working on
the project, and the start date and end date of the project. Furthermore, we have the
following conditions:
- a project is identified by its project number,
- the name of a project must be unique,
- the manager and the budget must be defined.
Table definition:
create table PROJECT (
PNO number(3) constraint prj pk primary key,
PNAME varchar2(60) unique,
PMGR number(4) not null,
PERSONS number(5),

Copyright G. Campbell 2010

79

Database Management
BUDGET number(8,2) not null,
PSTART date,
PEND date);
A unique constraint can include more than one attribute. In this case the pattern
unique(<column i>, . . . , <column j>) is used. If it is required, for example, that no
two projects have the same start and end date, we have to add the table constraint.
Constraint no same dates unique(PEND, PSTART)
This constraint has to be defined in the create table command after both columns
PEND and PSTART have been defined. A primary key constraint that includes more
than only one column can be specified in an analogous way. Instead of a not null
constraint it is sometimes useful to specify a default value for an attribute if no value
is given, e.g., when a tuple is inserted. For this, we use the default clause.
Example:
If no start date is given when inserting a tuple into the table PROJECT, the project
start date should be set to January 1st, 1995:
PSTART date default(01-JAN-95)
Examples:
Create table Employee (empno int, empname char(40), deptcode char(3), salary
number(6,2), dateofbirth date, primary key (empno), constraint EmpC foreign key
(deptcode) references DeptTable);
CREATE TABLE SUPPLIERS ( SNO CHAR(5), SNAME CHAR(20) NOT NULL,
STATUS DEC(3), CITY CHAR(15), PRIMARY KEY ( SNO) )
CREATE TABLE PARTS ( PNO CHAR(6), PNAME CHAR(20), COLOR
CHAR(6), WEIGHT DEC(3), CITY CHAR(15), PRIMARY KEY ( PNO ) )
CREATE TABLE INVENTORY ( SNO CHAR(5), PNO CHAR(6), QTY DEC(5),
PRIMARY KEY ( SNO, PNO ), FOREIGN KEY ( SNO ) REFERENCES
SUPPLIERS, CONSTRAINT FKC FOREIGN KEY ( PNO ) REFERENCES PARTS
)
NB. FKC is the name of the constraint
ALTER TABLE
It is possible to modify the structure of a table (the relation schema) even if records
have already been inserted into this table. A column can be added using the alter table
command
alter table <table>
add(<column> <data type> [default <value>] [<column constraint>]);
Example:
Alter table Employees add column nisno char(6);

Copyright G. Campbell 2010

80

Database Management
If more than only one column should be added at one time, respective add clauses
need to be separated by colons.
A table constraint can be added to a table using
alter table <table> add (<table constraint>);
Note that a column constraint is a table constraint, too. not null and primary key
constraints can only be added to a table if none of the specified columns contains a
null value. Table definitions can be modified in an analogous way. This is useful, e.g.,
when the size of strings that can be stored needs to be increased. The syntax of the
command for modifying a column is
alter table <table>
modify(<column> [<data type>] [default <value>] [<column constraint>]);
Example:
Alter table Employees modify lastname char(35);
[NB. Use alter instead of modify for some DBMSs]
A column can be removed using the following:
Alter table <table>
Drop column <column>;
Examples:
Alter table Employees drop column Address3;

ALTER TABLE SUPPLIERS ADD COLUMN STATE CHAR(15)


ALTER TABLE SUPPLIERS DROP COLUMN CITY
ALTER TABLE SUPPLIERS ADD TRN INT
ALTER TABLE PARTS ADD DISCOUNT SMALLINT
ALTER TABLE PARTS ALTER COLUMN COLOR CHAR(10) [In MySQL]
ALTER TABLE PARTS MODIFY COLOR CHAR(10) [in Oracle]
ALTER TABLE DROP CONSTRAINT FKC
ALTER TABLE STUDENTS ADD CONSTRAINT FKC FOREIGN KEY
(DEPTID) REFERENCES DEPARTMENTS
INSERT
The most simple way to insert a record into a table is to use the insert statement
insert into <table> [(<column i, . . . , column j>)]
values (<value i, . . . , value j>);
For each of the listed columns, a corresponding (matching) value must be specified.
Therefore an insertion does not necessarily have to follow the order of the attributes
as specified in the create table statement. If a column is omitted, the value null is
inserted instead. If no column list is given, however, for each column as defined in the
create table statement a value must be given.
Examples:
insert into PROJECT(PNO, PNAME, PERSONS, BUDGET, PSTART)
values(313, DBS, 4, 150000.42, 10-OCT-94);

Copyright G. Campbell 2010

81

Database Management
or
insert into PROJECT
values(313, DBS, 7411, null, 150000.42, 10-OCT-94, null);
If there are already some data in other tables, these data can be used for insertions into
a new table. For this, we write a query whose result is a set of records to be inserted.
Such an insert statement has the form
insert into <table> [(<column i, . . . , column j>)] <query>
Example: Suppose we have defined the following table:
create table OLDEMP (
ENO number(4) not null,
HDATE date);
We now can use the table EMP to insert records into this new relation:
insert into OLDEMP (ENO, HDATE)
select EMPNO, HIREDATE from EMP
where HIREDATE < 31-DEC-60;
SELECT (using WHERE, GROUP BY, ORDER BY, HAVING,
aggregate functions, logical operators, comparison
operators)
In order to retrieve the information stored in the database, the SQL query language is
used.
In SQL a query has the following (simplified) form (components in brackets [ ] are
optional):
select [distinct] <column(s)>
from <table>
[ where <condition> ]
[ order by <column(s) [asc|desc]> ]
Selecting Columns
The columns to be selected from a table are specified after the keyword select. This
operation is also called projection. For example, the query
select LOC, DEPTNO from DEPT;
lists only the number and the location for each tuple from the relation DEPT. If all
columns should be selected, the asterisk symbol * can be used to denote all
attributes. The query
select * from EMP;

Copyright G. Campbell 2010

82

Database Management
retrieves all records with all columns from the table EMP. Instead of an attribute
name, the select clause may also contain arithmetic expressions involving arithmetic
operators etc.
select ENAME, DEPTNO, SAL* 1.55 from EMP;
For the different data types supported in Oracle, several operators and functions are
provided:
for numbers: abs, cos, sin, exp, log, power, mod, sqrt, +,, _, /, . . .
for strings: chr, concat(string1, string2), lower, upper, replace(string, search string,
replacement string), translate, substr(string, m, n), length, to date, . . .
for the date data type: add month, month between, next day, to char, . . .
Consider the query
select DEPTNO from EMP;
which retrieves the department number for each record. Typically, some numbers will
appear more than only once in the query result, that is, duplicate result records are not
automatically eliminated. Inserting the keyword distinct after the keyword select,
however, forces the elimination of duplicates from the query result.
It is also possible to specify a sorting order in which the result records of a query are
displayed. For this the order by clause is used and which has one or more attributes
listed in the select clause as parameter. desc specifies a descending order and asc
specifies an ascending order (this is also the default order).
For example, the query
select ENAME, DEPTNO, HIREDATE from EMP;
from EMP
order by DEPTNO [asc], HIREDATE desc;
displays the result in an ascending order by the attribute DEPTNO. If two records
have the same attribute value for DEPTNO, the sorting criteria is a descending order
by the attribute values of HIREDATE.
For the above query, we would get the following output:
ENAME
DEPTNO HIREDATE
FORD
10
03-DEC-81
SMITH
20
17-DEC-80
BLAKE
30
01-MAY-81
WARD
30
22-FEB-81
ALLEN
30
20-FEB-81

Selection of Records
Up to now we have only focused on selecting (some) attributes of all records from a
table. If one is interested in records that satisfy certain conditions, the where clause is
used. In a where clause simple conditions based on comparison operators can be
combined using the logical connectives and, or, and not to form complex conditions.
Conditions may also include pattern matching operations and even subqueries.

Copyright G. Campbell 2010

83

Database Management
Example: List the job title and the salary of those employees whose manager has the
number 7698 or 7566 and who earn more than 1500:
select JOB, SAL
from EMP
where (MGR = 7698 or MGR = 7566) and SAL > 1500;
For all data types, the comparison operators =, != or <>,<, >,<=, => are allowed in the
conditions of a where clause.
Further comparison operators are:
Set Conditions: <column> [not] in (<list of values>)
Example: select _ from DEPT where DEPTNO in (20,30);
Null value: <column> is [not] null,
i.e., for a tuple to be selected there must (not) exist a defined value for this
column.
Example: select _ from EMP where MGR is not null;
Note: the operations = null and ! = null are not defined!
Domain conditions: <column> [not] between <lower bound> and <upper bound>
Examples:
select EMPNO, ENAME, SAL from EMP
where SAL between 1500 and 2500;
select ENAME from EMP
where HIREDATE between 02-APR-81 and 08-SEP-81;
String Operations
In order to compare an attribute with a string, it is required to surround the string by
apostrophes,
e.g., where LOCATION = DALLAS. A powerful operator for pattern matching is
the like operator. Together with this operator, two special characters are used: the
percent sign % (also called wild card), and the underline , also called position marker.
For example, if one is interested in all records of the table DEPT that contain two Cs
in the name of the department, the condition would be where DNAME like
%C%C%. The percent sign means that any (sub)string is allowed there, even the
empty string. In contrast, the underline stands for exactly one character. Thus the
condition where DNAME like %C C% would require that exactly one character
appears between the two Cs. To test for inequality, the not clause is used.
Further string operations are:
upper(<string>) takes a string and converts any letters in it to uppercase, e.g.,
DNAME = upper(DNAME) (The name of a department must consist only of upper
case letters.)
lower(<string>) converts any letter to lowercase,
initcap(<string>) converts the initial letter of every word in <string> to uppercase.
length(<string>) returns the length of the string.
substr(<string>, n [, m]) clips out a m character piece of <string>, starting at position
n. If m is not specified, the end of the string is assumed. E.g. substr(DATABASE
SYSTEMS, 10, 7) returns the string SYSTEMS.
Aggregate Functions

Copyright G. Campbell 2010

84

Database Management
Aggregate functions are statistical functions such as count, min, max etc. They are
used to compute a single value from a set of attribute values of a column:
count Counting Rows
Example: How many records are stored in the relation EMP?
select count(*) from EMP;
Example: How many different job titles are stored in the relation EMP?
select count(distinct JOB) from EMP;

max Maximum value for a column


min Minimum value for a column

Example: List the minimum and maximum salary.


select min(SAL), max(SAL) from EMP;
Example: Compute the difference between the minimum and maximum salary.
select max(SAL) - min(SAL) from EMP;

sum Computes the sum of values (only applicable to the data type number)

Example: Sum of all salaries of employees working in the department 30.


select sum(SAL) from EMP
where DEPTNO = 30;

avg Computes average value for a column (only applicable to the data type
number)

Note: avg, min and max ignore tuples that have a null value for the specified
attribute, but count considers null values.
Joining Tables

Thus far we have only focused on queries that refer to exactly one table. Furthermore,
conditions in a where were restricted to simple comparisons. A major feature of
relational databases, however, is to combine (join) records stored in different tables in
order to display more meaningful and complete information. In SQL the select
statement is used for this kind of queries joining relations:
select [distinct] [<alias ak>.]<column i>, . . . , [<alias al>.]<column j>
from <table 1> [<alias a1>], . . . , <table n> [<alias an>]
[where <condition>]
The specification of table aliases in the from clause is necessary to refer to columns
that have the same name in different tables. For example, the column DEPTNO
occurs in both EMP and DEPT. If we want to refer to either of these columns in the
where or select clause, a table alias has to be specified and put in the front of the
column name. Instead of a table alias also the complete relation name can be put in
front of the column such as DEPT.DEPTNO, but this sometimes can lead to rather
lengthy query formulations.

Copyright G. Campbell 2010

85

Database Management
Comparisons in the where clause are used to combine rows from the tables listed in
the from clause.
Example: In the table EMP only the numbers of the departments are stored, not their
name. For each salesman, we now want to retrieve the name as well as the number
and the name of the department where he is working:
select ENAME, E.DEPTNO, DNAME
from EMP E, DEPT D
where E.DEPTNO = D.DEPTNO
and JOB = SALESMAN;
Any number of tables can be combined in a select statement. Example: For each
project, retrieve its name, the name of its manager, and the name of
the department where the manager is working:
select ENAME, DNAME, PNAME
from EMP E, DEPT D, PROJECT P
where E.EMPNO = P.MGR
and D.DEPTNO = E.DEPTNO;
It is even possible to join a table with itself:
Example: List the names of all employees together with the name of their manager:
select E1.ENAME, E2.ENAME
from EMP E1, EMP E2
where E1.MGR = E2.EMPNO;
Explanation: The join columns are MGR for the table E1 and EMPNO for the table
E2.
The equijoin comparison is E1.MGR = E2.EMPNO.
SELECT sub queries
Up to now we have only concentrated on simple comparison conditions in a where
clause, i.e., we have compared a column with a constant or we have compared two
columns. As we have already seen for the insert statement, queries can be used for
assignments to columns. A query result can also be used in a condition of a where
clause. In such a case the query is called a subquery and the complete select statement
is called a nested query.
A respective condition in the where clause then can have one of the following forms:
1. Set-valued subqueries
<expression> [not] in (<subquery>)
<expression> <comparison operator> [any|all] (<subquery>)
An <expression> can either be a column or a computed value.
2. Test for (non)existence
[not] exists (<subquery>)

Copyright G. Campbell 2010

86

Database Management
In a where clause conditions using subqueries can be combined arbitrarily by using
the logical connectives and and or.
Example: List the name and salary of employees of the department 20 who are
leading
a project that started before December 31, 1990:
select ENAME, SAL from EMP
where EMPNO in
(select PMGR from PROJECT
where PSTART < 31-DEC-90)
and DEPTNO =20;
Explanation: The subquery retrieves the set of those employees who manage a project
that started before December 31, 1990. If the employee working in department 20 is
contained in this set (in operator), this record belongs to the query result set.
Example: List all employees who are working in a department located in BOSTON:
select * from EMP
where DEPTNO in
(select DEPTNO from DEPT
where LOC = BOSTON);
The subquery retrieves only one value (the number of the department located in
Boston). Thus it is possible to use = instead of in. As long as the result of a
subquery is not known in advance, i.e., whether it is a single value or a set, it is
advisable to use the in operator.
A subquery may use again a subquery in its where clause. Thus conditions can be
nested arbitrarily. An important class of subqueries are those that refer to its
surrounding (sub)query and the tables listed in the from clause, respectively. Such
type of queries is called correlated subqueries.
Example: List all those employees who are working in the same department as their
manager (note that components in [ ] are optional:
select * from EMP E1
where DEPTNO in
(select DEPTNO from EMP [E]
where [E.]EMPNO = E1.MGR);
Explanation: The subquery in this example is related to its surrounding query since it
refers to the column E1.MGR. A record is selected from the table EMP (E1) for the
query result if the value for the column DEPTNO occurs in the set of values select in
the subquery. One can think of the evaluation of this query as follows: For each tuple
in the table E1, the subquery is evaluated individually. If the condition where
DEPTNO in . . . evaluates to true, this tuple is selected.
Note that an alias for the table EMP in the subquery is not necessary since columns
without a preceding alias listed there always refer to the innermost query and tables.

Copyright G. Campbell 2010

87

Database Management
Conditions of the form <expression> <comparison operator> [any|all] <subquery> are
used to compare a given <expression> with each value selected by <subquery>.
For the clause any, the condition evaluates to true if there exists at least on row
selected by the subquery for which the comparison holds. If the subquery yields an
empty result set, the condition is not satisfied.
For the clause all, in contrast, the condition evaluates to true if for all rows selected
by the subquery the comparison holds. In this case the condition evaluates to true if
the subquery does not yield any row or value.
Example: Retrieve all employees who are working in department 10 and who earn at
least as much as any (i.e., at least one) employee working in department 30:
select * from EMP
where SAL >= any
(select SAL from EMP
where DEPTNO = 30)
and DEPTNO = 10;
Note: Also in this subquery no aliases are necessary since the columns refer to the
innermost from clause.
Example: List all employees who are not working in department 30 and who earn
more than all employees working in department 30:
select * from EMP
where SAL > all
(select SAL from EMP
where DEPTNO = 30)
and DEPTNO <> 30;
For all and any, the following equivalences hold:
in , = any
not in , <> all or != all
Often a query result depends on whether certain rows do (not) exist in (other) tables.
Such type of queries is formulated using the exists operator.
Example: List all departments that have no employees:
select * from DEPT
where not exists
(select * from EMP
where DEPTNO = DEPT.DEPTNO);
Explanation: For each tuple from the table DEPT, the condition is checked whether
there exists a record in the table EMP that has the same department number
(DEPT.DEPTNO). In case no such record exists, the condition is satisfied for the
tuple under consideration and it is selected. If there exists a corresponding record in
the table EMP, the record is not selected.
Example: List workers who receive a higher rate than the average hourly rate.

Copyright G. Campbell 2010

88

Database Management
Select empname from employee
Where hrly_rate >
(select avg(hrly_rate) from employee);
Example: List workers who get an hourly rate higher than the average of those workers
reporting to the workers supervisor?
Select a.name from worker a
Where a.hrly_rate >
(select avg(b.hrly_rate)
From worker b
Where b.supv_id = a.supv_id);

Operations on Result Sets


Sometimes it is useful to combine query results from two or more queries into a single
result.
SQL supports three set operators which have the pattern:
<query 1> <set operator> <query 2>
The set operators are:
union [all] returns a table consisting of all rows either appearing in the result of
<query1> or in the result of <query 2>. Duplicates are automatically eliminated unless
the clause all is used.
intersect returns all rows that appear in both results <query 1> and <query 2>.
minus returns those rows that appear in the result of <query 1> but not in the result
of <query 2>.
Example: Assume that we have a table EMP2 that has the same structure and columns
as the table EMP:
All employee numbers and names from both tables:
select EMPNO, ENAME from EMP
union
select EMPNO, ENAME from EMP2;
Employees who are listed in both EMP and EMP2:
select * from EMP
intersect
select * from EMP2;
Employees who are only listed in EMP:
select * from EMP
minus
[NB. In other DBMSs use EXCEPT instead of MINUS]
select _ from EMP2;
Each operator requires that both tables have the same data types for the columns to
which the operator is applied.
Grouping
In previous sections we have seen how aggregate functions can be used to compute a
single value for a column. Often applications require grouping rows that have certain
properties and then applying an aggregate function on one column for each group

Copyright G. Campbell 2010

89

Database Management
separately. For this, SQL provides the clause group by <group column(s)>. This
clause appears after the where clause and must refer to columns of tables listed in the
from clause.
select <column(s)>
from <table(s)>
where <condition>
group by <group column(s)>
[having <group condition(s)>];
Those rows retrieved by the selected clause that have the same value(s) for <group
column(s)> are grouped. Aggregations specified in the select clause are then applied
to each group separately. It is important that only those columns that appear in the
<group column(s)> clause can be listed without an aggregate function in the select
clause !
Example: For each department, we want to retrieve the minimum and maximum
salary.
select DEPTNO, min(SAL), max(SAL)
from EMP
group by DEPTNO;
Rows from the table EMP are grouped such that all rows in a group have the same
department number. The aggregate functions are then applied to each such group. We
thus get the following query result:
DEPTNO MIN(SAL) MAX(SAL)
10
1300
5000
20
800
3000
30
950
2850
Rows to form a group can be restricted in the where clause. For example, if we add
the condition where JOB = CLERK, only respective rows build a group. The query
then would retrieve the minimum and maximum salary of all clerks for each
department. Note that is not allowed to specify any other column than DEPTNO
without an aggregate function in the select clause since this is the only column listed
in the group by clause (is it also easy to see that other columns would not make any
sense).
Once groups have been formed, certain groups can be eliminated based on their
properties, e.g., if a group contains less than three rows. This type of condition is
specified using the having clause. As for the select clause also in a having clause only
<group column(s)> and aggregations can be used.
Example: Retrieve the minimum and maximum salary of clerks for each department
having more than three clerks.
select DEPTNO, min(SAL), max(SAL)
from EMP
where JOB = CLERK
group by DEPTNO
having count(*) > 3;
Copyright G. Campbell 2010

90

Database Management
Note that it is even possible to specify a subquery in a having clause. In the above
query, for example, instead of the constant 3, a subquery can be specified.
A query containing a group by clause is processed in the following way:
1. Select all rows that satisfy the condition specified in the where clause.
2. From these rows form groups according to the group by clause.
3. Discard all groups that do not satisfy the condition in the having clause.
4. Apply aggregate functions to each group.
5. Retrieve values for the columns and aggregations listed in the select clause.
UPDATE
For modifying attribute values of (some) records in a table, we use the update
statement:
update <table> set
<column i> = <expression i>, . . . , <column j> = <expression j>
[where <condition>];
An expression consists of either a constant (new value), an arithmetic or string
operation, or an SQL query. Note that the new value to assign to <column i> must a
the matching data type.
An update statement without a where clause results in changing respective attributes
of all records tuples in the specified table. Typically, however, only a (small) portion
of the table requires an update.
Examples:
The employee JONES is transferred to the department 20 as a manager and his
salary is increased by 1000:
update EMP set
JOB = MANAGER, DEPTNO = 20, SAL = SAL +1000
where ENAME = JONES;
All employees working in the departments 10 and 30 get a 15% salary increase.
update EMP set
SAL = SAL * 1.15 where DEPTNO in (10,30);
Analogous to the insert statement, other tables can be used to retrieve data that are
used as new values. In such a case we have a <query> instead of an <expression>.
Example: All salesmen working in the department 20 get the same salary as the
manager who has the lowest salary among all managers.
update EMP set
SAL = (select min(SAL) from EMP
where JOB = MANAGER)
where JOB = SALESMAN and DEPTNO = 20;

Copyright G. Campbell 2010

91

Database Management
Explanation: The query retrieves the minimum salary of all managers. This value then
is assigned to all salesmen working in department 20.
It is also possible to specify a query that retrieves more than only one value (but still
only one record!). In this case the set clause has the form set(<column i, . . . , column
j>) = <query>.
It is important that the order of data types and values of the selected row exactly
correspond to the list of columns in the set clause.
DELETE
All or selected records can be deleted from a table using the delete statement:
delete from <table> [where <condition>];
If the where clause is omitted, all records are deleted from the table. An alternative
command for deleting all records from a table is the truncate table <table> command.
However, in this case, the deletions cannot be undone.
Example:
Delete all projects (tuples) that have been finished before the actual date (system
date):
delete from PROJECT where PEND < sysdate;
sysdate is a function in SQL that returns the system date. Another important SQL
function is user, which returns the name of the user logged into the current Oracle
session.
CREATE VIEW
NB. Not all DBMSs (e.g. MS-Access) have this command.

In Oracle the SQL command to create a view (virtual table) has the form
create [or replace] view <view-name> [(<column(s)>)] as
<select-statement> [with check option [constraint <name>]];
The optional clause or replace re-creates the view if it already exists. <column(s)>
names the columns of the view. If <column(s)> is not specified in the view definition,
the columns of the view get the same names as the attributes listed in the select
statement (if possible).
Example: The following view contains the name, job title and the annual salary of
employees working in the department 20:
Create view DEPT20 as
select ENAME, JOB, SAL_12 ANNUAL SALARY from EMP
where DEPTNO = 20;
In the select statement the column alias ANNUAL SALARY is specified for the
expression SAL_12 and this alias is taken by the view. An alternative formulation of
the above view definition is

Copyright G. Campbell 2010

92

Database Management
Create view DEPT20 (ENAME, JOB, ANNUAL SALARY) as
select ENAME, JOB, SAL _ 12 from EMP
where DEPTNO = 20;
A view can be used in the same way as a table, that is, records can be retrieved from a
view (also respective records are not physically stored, but derived on basis of the
select statement in the view definition), or records can even be modified. A view is
evaluated again each time it is accessed. In Oracle SQL no insert, update, or delete
modifications on views are allowed that use one of the following constructs in the
view definition:
Joins
Aggregate function such as sum, min, max etc.
set-valued subqueries (in, any, all) or test for existence (exists)
group by clause or distinct clause
In combination with the clause with check option any update or insertion of a row into
the view is rejected if the new/modified row does not meet the view definition, i.e.,
these rows would not be selected based on the select statement. A with check option
can be named using the constraint clause.
CREATE INDEX
Create [UNIQUE] INDEX <indexname> on <table> (field [ASC/DESC] [, field
[ASC/DESC], ...]) [WITH {primary | disallow null | ignore null }]
Example:
Create UNIQUE index Custid on Customers (CustomerID) with disallow null;

DROP TABLE
A table and its records can be deleted by issuing the command drop table <table>
[cascade constraints];
DROP VIEW
A view can be deleted using the command delete <view-name>. [NB. Use Drop
instead of delete for Oracle]
DROP INDEX
Drop index Custid on Customers;

GRANT and REVOKE


Grant <privilege1, ... privilegen> on <table> to <username>
Revoke < privilege > on <table> from <username>;
GRANT SELECT ON file to PUBLIC
REVOKE SELECT ON file FROM PUBLIC
Examples of privileges to be granted
SELECT, DELETE, INSERT, UPDATE, DROP, CREATE

Copyright G. Campbell 2010

93

Database Management
COMMIT and ROLLBACK
A sequence of database modifications, i.e., a sequence of insert, update, and delete
statements, is called a transaction. Modifications of records are temporarily stored in
the database system. They become permanent only after the commit command has
been issued.
As long as the user has not issued the commit statement, it is possible to undo all
modifications since the last commit. To undo modifications, one has to issue the
rollback command.
It is advisable to complete each modification of the database with a commit (as long
as the modification has the expected effect). Note that any data definition command
such as create table results in an internal commit. A commit is also implicitly
executed when the user terminates an Oracle session.

Copyright G. Campbell 2010

94

Database Management
SQL EXERCISES
EXERCISE 1 CREATE TABLE AND ALTER TABLE
STATEMENTS
1

Create a table called DEPARTMENTS with the following fields:- department 4


characters, deptname 50 characters, depthead 50 characters. The primary key of the table
is department. Please note that the deptname field is a compulsory field.

2
3

4
5
6
7
8
9

Create a table called MorantBayDepts. It has the same structure as


Departments.

Create a table called STUDENTS with the following fields:- idnum numeric, firstname,
lastname each 20 characters, address with 50 characters, telephone long integer, sex 1
character, maritalstatus 1 character, department 4 characters, DOB date, schoolfee
currency. The primary key is idnum, the field department should be used to link this table
to the departments table. Please also note that the firstname field is a compulsory field.
Please name the link so that you can delete it later.
You forgot the status field, please add it to the table, it is 10 characters long.
You no longer need the field maritalstatus, remove it from the table.
You have realized that 20 characters is not enough for the lastname, increase it to 25.
Remove the link between the two tables.
Add back the link between the two tables.
Add back the field marital status

EXERCISE 2 INSERT, UPDATE, DELETE, SELECT USING


UNION
1. Use the insert command to add data to the departments table
2. Use the insert command to add data to the students table

3. Student with idnum 4 changed address to 9 Brentwood Rd


4. The School board made a ruling that the minimum school fee for all programs is
$10,000. Change the schoolfee to $10,000 for all students whose school fee is less
than $10,000.
5. Add $4,000 to the schoolfee of all TVED students.
6. Student with idnumber 12 got married, change her last name to Gordon and her
maritalstatus to M.
7. Add a new record to the Departments table. department is IT, deptname is
Information Technology, depthead is Mr. Davis.
8. Delete all students whose status says GRADUATED
9. Add 3 records to the MorantBayDepts table.
10. Display all records from both Departments and MorantBayDepts
11. Display all records in the departments table that start with the letter C as well as
all records in the MorantBayDepts table.
12. Copy all of the records in the MorantBayDepts to the Departments table.
13. Delete all records from the MorantBayDepts table.
EXERCISE 3 - SELECT STATEMENT
1.
2.
3.
4.
5.

All records and all fields in the Students table.


All records and all fields in the Departments table.
All fields in the students table for those who are in CS department
The idnum, firstname and lastname of all students.
The idnum, firstname and lastname of all students sorted by lastname

Copyright G. Campbell 2010

95

Database Management
6. The idnum, firstname and lastname of all students sorted by lastname in descending
order.
7. The idnum, firstname, lastname and sex of all students
8. The idnum, firstname, lastname and sex of all female students
9. The idnum, firstname, lastname , sex and maritalstatus of all female married students
10. The firstname, lastname, maritalstatus of single and divorced students
11. The firstname, lastname, maritalstatus of those who are not single or divorced students
12. The idnum, firstname, lastname, schoolfee of all female students sorted by schoolfee
13. The lastname, firstname, schoolfee of students with schoolfee greater than $30,000 sorted
by lastname and firstname
14. The lastname, firstname, maritalstatus of students with lastname starting with the letter C
15. The lastname, firstname, maritalstatus of students with lastname not starting with the
letter C
16. The total schoolfee
17. The total schoolfee for each department
18. The total schoolfee for each department where totals exceed 30000
19. The total number of students
20. The average schoolfee

EXERCISE 4 - SELECT STATEMENT USING MORE THAN


ONE TABLE
1. All fields and records in both tables
2. Firstname, lastname, department, deptname, depthead for all Students.
3. Firstname, lastname, department, deptname, depthead for all students in the CS, BA and
HET departments.
4. Firstname, lastname, depthead, maritalstatus of all married students.
5. Firstname, Lastname, deptname of all students whose lastname ends with the letter E.
6. Firstname, lastname, deptname, schoolfee of all students with schoolfee between $50,000
and $80,000
7. Average schoolfee per deptname
8. Average schoolfee per deptname where the average is between $25,000 and $50,000.
9. Total number of students in each deptname
10. Total number of students in each deptname where the department has more than 2
students

EXERCISE 5 DISTINCT, WILDCARD contd, SUB QUERY,


CREATE INDEX, DROP TABLE, DROP INDEX
1.
2.
3.
4.
5.
6.
7.
8.
9.

Display the departments in the students table. Display each one only once.
Display the lastnames of those with a as the second letter.
Display the names of all students whose schoolfee is more than the average schoolfee.
Display the names of the students whose schoolfee is more than the average schoolfee of
those in the same department.
Display the names of the students who are below the average age.
Create an index called NAMEIDX on the students table. The index should be on lastname
and firstname. Why would you need to do this?
Create a unique index called SEXIDX on the students table. The index should be on sex.
Why do you get an error message?
Remove the index
Delete the table MorantBayDepts.

EXERCISE 6 REVIEW OF ALL COMMANDS

Copyright G. Campbell 2010

96

Database Management
WRITE DOWN THE SQL COMMANDS FOR THE FOLLOWING THEN EXECUTE
THEM IN ORACLE/MYSQL. Writing the commands before executing them is good practice
as you will not have the computer before you in the final examination.
(NB. Please prefix all tablenames, viewnames and indexnames with your initials. E.g.
GCMOVIETYPES)
DATABASE FOR A VIDEO CLUB
1. Create a table called MOVIETYPES with the following fields:- typecode 3 characters,
typename 25 characters. The primary key of the table is typecode.
2. Create a table called OTHERMTYPES with the same structure as MOVIETYPES.
3. Create a table called MOVIES with the following fields:- movienum integer, movietitle,
20 characters, typecode 3 characters, producer 20 characters, rating 2 characters, cost 6
numbers with 2 decimal places, datepurchased date. The primary key is movienum, the
field typecode should be the foreign key to the table called MOVIETYPES.
4. You forgot the director field, please add it to the MOVIES table, it is 25 characters long.
5. You no longer need the field producer, remove it from the MOVIES table.
6. You have realized that 20 characters is not enough for the movietitle, increase it to 30.
7. Add the following data to the MOVIETYPES table: [COM, Comedy], [HOR, Horror],
[DRA, Drama], [TRA, Tragedy], [CAR, Cartoon].
8. Add the following data to the OTHERMTYPES table: [MUS, Musical], [COM,
Comedy], [DOC, Documentary].
9. Add the following data to the MOVIES table. [123, Finding Nemo, CAR, G, 1500, 01JAN-2005, DisneyPixar], [456, Incredibles, CAR, G, 1300, 03-MAR-2006, Pixar], [789,
Pursuit of Happyness, DRA, M, 1000, 02-JAN-2007, Steven Speilberg], [111, Free Willy,
DRA, G, 900, 01-JAN-1980, John Holt], [222, Dancing with wolves, DRA, R, 1300, 04OCT-1990, Perry Mason].
10. Add 6 more of your own records to the MOVIES table.
11. Display all records and all fields in the MOVIES table.
12. Display all records and all fields in the MOVIETYPES table.
13. Display all fields in the MOVIES table for those records who are rated G.
14. Display the movienum, movietitle of all movies.
15. Display the first 5 letters of the movietitle of all movies.
16. Display the movietitle, cost, and cost * 10 of all movies.
17. Display the movienum, movietitle of all movies sorted by rating.
18. Display the movienum, movietitle of all movies sorted by rating in descending order.
19. Display the movietitles that end with the letter S.
20. Display the movienum, movietitle of all movietitles that start with the letter F.
21. Display the movienum, movietitle of all movietitles that start with the letter F and cost
less than $2000.
22. Display the movienum, movietitle of all movietitles that start with the letter F or cost less
than $2000.
23. Display the movietitle, cost of all movies that cost between $1200 and $1400.
24. Display the total cost of the movies.
25. Display the average cost of the movies.
26. Display the highest and lowest cost of the movies.
27. Display the total cost for each movie rating.
28. Display the total cost for each movie rating where totals exceed $4000
29. Display the total number of movies.
30. Display the typecodes in the MOVIES table. Display each typecode only once.
31. Display the movietitles of the movies whose cost is more than the average cost.
32. Display all fields and records in both tables.
33. Display the movietitle, typecode and typename of all movies.
34. Display the movietitle, typecode and typename of all movies with typecodes CAR, COM
and HOR.

Copyright G. Campbell 2010

97

Database Management
35. Display the movietitle, typecode and typename of all movies with typecodes CAR, COM
and HOR. Include the typecodes from the MOVIETYPES table that did not have a match
as well.
36. Change the director of movienum 111 to Robin Givens.
37. Change the price of the movienum 123 to $2500.
38. Increase the price of all movies to $1200 if the price is less than $1200.
39. Delete all movies that are rated R.
40. Display all records from both MOVIETYPES and OTHERMTYPES.
41. Display all records that are common to both MOVIETYPES and OTHERMTYPES.
42. Display the result of MOVIETYPES minus OTHERMTYPES.
43. Create an index called MTITLES on the MOVIES table. The index should be on
movietitle.
44. Remove the index called MTITLES.
45. Remove the table called OTHERMTYPES
46. Create a view called MOVIEV on the MOVIES table. It should only contain movietitle
and rating.
47. Display all of the data in MOVIEV.
48. Remove the view called MOVIEV.
49. Create another user. Give this user SELECT access to your tables.
50. Login as this user and display all fields and records in the tables.

Copyright G. Campbell 2010

98

Database Management

UNIT IV: DISTRIBUTED DATABASES


Characteristics of a distributed database
A centralized system is one in which all of the data is located in a single database at a
single site. Users can log in from any location to access the database. A distributed
database is a database that is spread across a network of computers that are
geographically dispersed and connected via communication lines. The database
must have a single logical data model. A distributed database is a database that is
under the control of a central database management system or distributed database
management system (DDBMS) in which storage devices are not all attached to a
common CPU. It can also be stored in multiple computers located in the same
physical location. Examples are: SDD-1 by Compute Corp of Americs, R* or System
R by IBM Research, Distributed Ingres by Univ. of Ca. at Berkeley.
Definition of logical database, local and global application, global

intelligence
Logical database

Logical databases are programs that read data from database tables.
Users access the distributed database through:
Local applications - applications which do not require data from other sites.
Global applications - applications which do require data from other sites.
Global Intelligence
This is a DBMS that manages the distributed database.
A distributed database works by using database links. A database link is a pointer that
defines a one-way communication path from a database server to another database
server. The link pointer is actually defined as an entry in a data dictionary table. To
access the link, you must be connected to the local database that contains the data
dictionary entry.
A database link connection is one-way in the sense that a client connected to local
database A can use a link stored in database A to access information in remote
database B, but users connected to database B cannot use the same link to access data
in database A. If local users on database B want to access data on database A, then
they must define a link that is stored in the data dictionary of database B.
A database link connection allows local users to access data on a remote database. For
this connection to occur, each database in the distributed system must have a unique
global database name in the network domain. The global database name uniquely
identifies a database server in a distributed system.
Database server
Database servers are responsible for processing SQL queries that have been generated
by the client process, and for returning the results of these queries back to the client
process that made the request.

Copyright G. Campbell 2010

99

Database Management
Client-server
A client-server architecture in a distributed database is a network architecture in
which each computer or process on the network is either a client or a server or both.
Database servers are powerful computers Clients are PCs or workstations on which
users run applications. Clients rely on database servers to process their queries. The
user will therefore use his client application to run queries. The queries will be sent to
the database server, who returns the result to the client.
Assessment of a distributed database versus a loose connection of
independent site
1. Data that makes up the logical database is stored at multiple sites connected
by a network.
2. At least one application takes a global view of the data.
3. The global application accesses all sites at least once.
4. A global intelligence (i.e. a DBMS) exists over and above all the local
intelligence (i.e. DMBSs). Its job is to manage the distributed database as a
whole.
Terms and concepts used in distributed databases
Transparency - Does a user access all of the files in a system in the same manner,
regardless of where they reside? Care with a distributed database must be taken to
ensure that the distribution is transparent. In other words, users must be able to
interact with the system as if it was one logical system. This applies to the systems
performance, and methods of access amongst other things. The users should not
need to know at which site any given piece of data is stored. In other words, a
distributed system should look like a centralized system to the user. Transactions
are transparent each transaction must maintain database integrity across multiple
databases. Transactions must also be divided into sub-transactions, each subtransaction affecting one database system.
A DDBMS must provide certain transparency features, which will serve to hide
the complexities of the distributed database from the end user. In other word the
DDPMS should make the user think that he/she is working with a centralized
database- These transparency features are listed below:
Distribution transparency - this means the user should not know that the data is
portioned, that it is replicated or where it is located.
Transaction transparency - this enables a transaction to update data at several
locations, in addition if all the locations are not updated then- the transactionis cancelled and the data reverts to its original state.
Failure transparency - if one machine fails, the system should still continue to
operate without the user being aware that something had gone wrong.
Performance transparency - the performance of the system should not suffer
because of the distributed design (in terms of network Congestion etc-)
Heterogeneity transparency - the system should allow the integration of
various DBMS without the user being aware of all these issues.
Homogeneous distributed database All of the sites use the same DBMS (e.g.
Oracle).

Copyright G. Campbell 2010

100

Database Management
Heterogeneous distributed database Uses multiple DBMSs. In other words, the
different sites do not have to use the same DBMS (e.g. Oracle and MS-SQL and
Postgresql).
The data may be distributed in several ways using the following database
concepts:
Fragmentation - Describes how a single table/file is divided among network sites.
There are three types of fragmentation, these are as follows:
a) Horizontal - contains all the attributes/fields and a subset of the
tuples/rows/records
b) Vertical - contains a subset of the columns/fields/attributes and all
the rows/records
c) Mixed database is fragmentation horizontally and vertically. (in
other words, subsets of rows and columns).
Table Replication - Determines the distribution of tables around the network.
Some tables exist at only one site, while others have been duplicated at various
sites (e.g. frequently used files that are basically static such as a code file).
Reasons for replication
a) To maximize local availability of data
b) To provide backup copies of tables in case a particular network fails.
Replication can introduce integrity problems. For example, data can be
changed at one site, but the duplicate site has not been changed. For
frequently updated tables, replication degrades database performance as all
copies of table must be updated regularly to maintain integrity.
Three replication conditions exist: full replication, partial replication or
partial replication or no replication
Full replication - all database fragments are replicated.
Partial replication - only some of the database fragments are
replicated.
No replication - each database fragment is stored at the same
location.
Allocation - combines fragmentation and replication.
Advantages and disadvantages of a distributed database
Advantages
Reflects organizational structure database fragments are located
in the departments they relate to
Local Processing and Autonomy - Allows local groups
(departments) to have control over their own data. Certain
processing can go on at one site and other processing at other sites
thereby speeding up processing. (Parallel processing).
Cost Reduction/Economics - Less transmission of data so
communication costs down as data closer to locations where
originate. It also costs less to create a network of smaller computers
with the power of a single large computer.
Copyright G. Campbell 2010

101

Database Management
Data and load sharing Each site does its own processing rather
than overloading one site. This leads to improved performance
data is located near the site of greatest demand, and the database
systems themselves are parallelized, allowing load on the databases
to be balanced among servers. (A high load on one module of the
database wont affect other modules of the database in a distributed
database.)
Improved Availability and Reliability - If one site fails, data may be
on another site. A fault in one database system will only affect one
fragment, instead of the entire database.
Security - If fire/sabotage of a site then data available on other site.
Capacity and incremental growth - There is no one machine that can
hold all of the data. If it becomes necessary to expand the system
then it is easier to add a new computer than upgrade one computer.
Efficiency and flexibility If data is stored close to its normal point
of use then response times and communication cost will be reduced.
Modularity systems can be modified, added and removed from the
distributed database without affecting other modules (systems).

Disadvantages
Distributed execution - The distributed DBMS needs to synchronize
and control processes on the various computers on network. It is
difficult to maintain integrity because enforcing integrity over a
network may require too much networking resources to be feasible.
Distributed transaction management is hard to control.- Need for
concurrency control and recovery mechanisms to process updates
across the network and restores consistency after a crash. It is
harder to recover from backups. A difficulty may arise if one site
holding a copy is not available at the time of the update. One
solution is to designate one copy as the primary copy. This site is
responsible for broadcasting the updates
Catalog management is more difficult. The database catalog
consists of metadata in which definitions of database objects such as
tables, views (virtual tables), indexes, and user groups are stored.
Distributed DBMS schema management is very difficult - A
distributed DBMS needs data about the distributed database to
manage it. Such schemas must be stored and managed in a
distributed fashion - very difficult.
Complexity Extra work must be done by the database
administrator (DBA) to ensure that the distributed nature of the
system is transparent. Extra work must also be done to maintain
multiple disparate systems, instead of one big one. Extra database
design work must also be done to account for the disconnected
nature of the database for example, joins become prohibitively
expensive when performed across multiple systems.
Economics Increased complexity and a more extensive
infrastructure means extra labour costs.
Security Remote database fragments must be secured, and they
are not centralized so the remote sites must be secured as well. The
infrastructure must also be secured (e.g. by encrypting the network
links between remote sites).

Copyright G. Campbell 2010

102

Database Management
Inexperience distributed databases are difficult to work with,
and as a young field there is not much readily available experience
on proper practice.

Practice Questions
1. Hewlett Limited has a distributed database. One of their sites burnt to the
ground. What advantage does Hewlett Limited have in this case?
2. PQHG Limited has millions and millions of records in their database. These
records need to be processed. Do you think it is better to place all of the
records on one computer to be processed or is it better to let several computers
share the load?
3. Geo Systems Limited has a table that contains the fields TRN, name, address,
gender and date of birth. The table is duplicated on two different sites. Mary
got married and changed her last name. Karen changed her address. Marys
name change was made at Site A by the site manager, but he could not make
the update on Site B because of a network problem. Karens address was
changed at Site B by the site manager but he could not make the same change
on Site A because of the same network problem. That night, both sites did a
backup. The next morning both systems crashed. The database administrator
now needs to do a restore. Which version of the table is the correct one?
4. Osbourne Inc has 2 sites, one in Kingston and the other in Montego Bay. The
distributed database has a table with the fields TRN, name, address, gender,
occupation and salary. The fields TRN, name, address and gender are located
in Kingston while TRN, name, occupation and salary are located in Montego
Bay. The payroll officer, Mr. Brown, who deals with salaries is located in
Montego Bay. He needs to create 2 queries. Query 1 shows names and
addresses of employees and Query 2 shows names and salaries of employees.
Which query does Mr. Brown need to use the network for? Which query
allows Mr. Brown to access files locally? Should there be a difference in the
way he runs or accesses either query? What is transparency? Mr. Brown
executes Query 2 very often and Query 1 very rarely. Would you redistribute
the fields or do you feel that the existing location is fine?
5. Which do you think is cheaper, a distributed database or a non-distributed
(centralized) database? Give reasons for your answer.
6. What are the advantages of a distributed database?
7. What are the disadvantages of a distributed database?
8. How does a distributed database work?

Copyright G. Campbell 2010

103

Database Management
Data warehouse
The need for data analysis.
Organizations tend to grow and prosper as they gain a better understanding of their
environment. Typically, business managers must be able to track daily transactions to
evaluate how the business is performing. By tapping into the operational database,
management can develop strategies to meet organizational goals. In addition, data analysis
can provide information about short-term tactical evaluations and strategies such as: are our
sales promotions working? What market percentage are we controlling? Are we attracting
new customers? Tactical and strategic decisions are also shaped by constant pressure from
external and internal forces, including globalization, the cultural and legal environment and,
perhaps most important, technology.
Given the many and many and varied competitive pressures, managers are always looking for
competitive advantages through product development, service, marketing and so on.
Managers understand that their business climate is very dynamic, thus mandating their prompt
reaction to change in order to remain competitive. In other words, the decision making cycle
time is reduced. In addition, the modern business climate requires managers to approach
increasingly complex problems based on a rapidly growing number of internal and external
variables. There is therefore growing interest in creating support systems, dedicated to
facilitating quick decision making in a complex environment.
Different managerial levels require different decision support needs. For example, transaction
processing systems, based on operational databases, are tailored to serve the information
needs of people who deal with short term inventory, accounts payable or purchasing. Middle
level managers, general managers, vice-presidents and presidents focus on strategic and
tactical decision making. Such managers require detailed information designed to help them
make decisions in a complex data and analysis environment.

Data warehousing
Downloading does move data closer to the user and thereby increase its potential utility.
Unfortunately, while one or two download sites can be managed without a problem, if every
department wants to have its own source of downloaded data, the management problems
become immense. Accordingly, organizations began to look for some means of providing a
standardized service for moving data to the user and making them more useful. That
service is called data warehousing.

What is a data warehouse?


A data warehouse (DW) is a huge database that stores and manages the data required to
analyze historical and current transactions. A data warehouse contains a wide variety of
data that present a coherent picture of business conditions at a single point in time. A data
warehouse includes not only data but also tools, procedures, training, personnel and other
resources that make access to the data easier and more relevant to decision makers. The goal
of the data warehouse is to increase the value of the organizations data asset. It typically has
a user-friendly interface so users easily can interact with its data. It is designed to support
management decision making. Through a data warehouse, managers and other users access
transactions and summaries transactions quickly and efficiently. The databases in a data
warehouse usually are quite large. Development of a data warehouse includes development of
systems to extract data from operating systems plus installation of a warehouse database
system that provides managers flexible access to the data.

Copyright G. Campbell 2010

104

Database Management

Figure 1 A Data Warehouse (DW)

The role of the data warehouse is to store extracts from operational data and make
them available to users in a useful format. The data can be extracts from databases
and files, but can also be document images, recordings, photos and other non-scalar
data. The source data could also be purchased from other organizations. The data
warehouse stores the extracted data and also combines it, aggregates3 it, transforms it
and makes it available to users via tools that are designed for analysis and decision
making such as OLAP (see section What is On-line analytical processing (OLAP)?
below).
Evolution of the data warehouse
The origins of todays Data Warehouses can be traced to the reporting systems that were
popular in the 1980s. These reporting systems provided some basic answers to the end users
questions, although the format wasnt always the most appropriate. The end users questions,
although the format wasnt always the most appropriate. The reporting systems that formed
the foundation of basic decision support required direct access to the operational data through
a menu interface to yield predefined report structures. Typically, the reporting system was
front-ended by a text-only presentation tool.
The next development stage produced a sophisticated form of decision support by supplying
lightly summarized data extracted form the operational database. Such lightly summarized
data were usually stored in an RDBMS and were accessed through SQL statements via a
query tool. The SQL-based query tool provided some predefined reports and, better yet, some
ad hoc query capability. Unfortunately, to use the queries the end user had to know the details
of the underlying data structure. The presentation tool was similar to the one used by the
3

A collection of, or the total of, disparate elements

Copyright G. Campbell 2010

105

Database Management
original reporting system, but it did provided additional customization options for ad hoc
reports. A variation on this theme of greater end user empowerment was the use of
spreadsheets or statistical packages to analyze operational data. End users used their own
desktop tools to access and manipulate data in order to support their decision making process.
Primitive as they were by current standards, these reporting systems and their extensions gave
IS departments the first major tools with which to solve decision support problems. Given
advances in hardware and software in the late 1980s and early to mid-1990s, the explosion of
available operational data, and the growing sophistication of decision support systems, data
warehouse developments were almost inevitable.

Differences between data warehouse and operational database


Characteristic
Integrated

Subject-Oriented

Time-Variant

Non-volatile

Operational database data


Similar data can have different
representations or meanings

Data warehouse data


Provide a unified view of all data
elements with a common definition
and representation for all
departments.
Data are stored with a functional Data are stored with a subject
or process orientation (for
orientation that facilitates multiple
example, invoices, credits, debits views for data and decision making
etc).
(e.g. sales, products, sales by
products etc.)
Data represent current
Data are historic in nature. A time
transactions (e.g. the sales of a
dimension is added to facilitate data
product in a given data).
analysis and time comparisons.
Data updates and deletes are
Data cannot be changed. Data are
very common.
only added periodically from
operational systems. Once data are
stored, no changes are allowed.

Components of a data warehouse

Data extraction tools


Extracted data
Metadata4 of warehouse contents
Warehouse DBMS(s) and OLAP (online analytical processing) servers
Warehouse data management tools
Data delivery programs
End-user analysis tools
User training courses and materials
Warehouse consultants

The source of the warehouse is operational data or data generated from routine
transaction processing systems such as Sales, Registration of a student, Payroll,
Banking deposit/withdrawal etc. The data warehouse therefore needs tools for
extracting the data and storing them. These data however are not useful without
metadata and describe the nature of the data, their origins, their format, limits on their
use and other characteristics of the data that influence the way they can and should be
used.

Data about the data such a field names, field types, validation rules etc).

Copyright G. Campbell 2010

106

Database Management
Potentially, the data warehouse contains billions of bytes of data in many different
formats. Accordingly, it needs DBMS and OLAP servers of its own to store and
process the data. In fact, several DBMS and OLAP products may be used, and the
features and functions of these may be augmented by additional in-house developed
software the reformats, aggregates5, integrates and transfers data from one processor
to another within the data warehouse. Programs may be needed to store and process
non-scalar data like graphics and animations also.
Because the purpose of the data warehouse is to make organizational data more
available, the warehouse must include tools not only to deliver the data to the users
but also to transform the data for analysis, query and reporting, and OLAP for userspecified aggregation and dis-aggregation.
The data warehouse provides an important, but complicated set of resources and
services. Hence the warehouse needs to include training courses, training materials
and on-line help utilities, and other similar training products to make it easy for users
to take advantage of the warehouse resources. Finally, the data warehouse includes
knowledgeable personnel who can serve as consultants.
User requirements for a data warehouse
The requirements for a data warehouse are different from the requirements for a
traditional database application. For one, a typical database application, the structure
of reports and queries is standardized. While the data in a report or query may vary
from month to month, for instance, the structure of the report or query stays the same.
Data warehouse users, on the other hand, often need to change the structure of
queries and reports.
Another difference is that users want to do their own data aggregation6. For
example, a user who wants to investigate the impact of different marketing campaigns
may want to aggregate product sales according to package color at one time;
according to marketing program at another time; according to package color within
marketing program at a third time. The analyst wants the same data in each report; but
simply presents it differently.
Data warehouse users also want to dis-aggregate them in their own terms, or drill
down their data. For example, a user may be presented with a screen that shows total
product sales for a given year. The user may then want to be able to click on the data
and have them explode into sales by month; to click again and have the data explode
into sales by product by month or sales by region by product by month.
Graphical output is another common requirement. Users want to see results of
geographic data in geographic form. Sales by state and province should be shown on a
map. A reshuffling of employees and offices should be shown on a diagram of office
space. These requirements are more difficult because they vary from user to user and
from task to task.
Many users of data warehouse facilities want to import warehouse data into domainspecific programs. For example, financial analysts want to import data into their
5
6

A collection of, or the total of, disparate elements


To collect or total disparate elements

Copyright G. Campbell 2010

107

Database Management
spreadsheet models and into more sophisticated financial analysis programs. Portfolio
managers want to import data into portfolio management programs, and oil drilling
engineers want to import data into seismic analysis programs. All of this importing
usually means that the warehouse data needs to be formatted in specific ways.
Rules for defining a data warehouse.
The following list is made up of 12 rules that define a data warehouse. This list was created
by William H. Inmon and Chuck Kelley in 1994.
1. The data warehouse and operational environments are separated.
2. The data warehouse data are integrated.
3. The data warehouse contains historical data over a long time horizon.
4. The data warehouse data are snapshot data captured at a given point in time.
5. The data warehouse data are subject-oriented.
6. The data warehouse data are mainly read-only periodic batch updates from
operational data. No online updates are allowed.
7. The data warehouse development life cycle differs from classical systems
development. The data warehouse development is data driven; the classical approach
is process driven.
8. The data warehouse contains data with several levels of detail: current details data,
old detail data, lightly summarized, and highly summarized data.
9. The data warehouse environment is characterized by read-only transactions to very
large data sets. The operational environment is characterized by numerous update
transactions to a few data entities at a time.
10. The data warehouse environment has a system that traces data sources,
transformations and storage.
11. The data warehouses metadata7 are a critical component of this environment. The
metadata identify and define all data elements. The metadata provide the source,
transformation, integration, storage, usage, relationships, and history of each data
element.
12. The data warehouse contains a charge-back mechanism for resource usage that
enforces optimal use of the data by end users.
The 12 rules capture the data warehouse life cycle, from its introduction as an entity separate
from the operational data store, to its components, functionality, and management processes.
The current generation of specialized decision support systems provides a comprehensive
infrastructure to design, develop, implement and use decision support systems within an
organization.

Data mart
Some organizations decide to limit the scope of the warehouse to more manageable chunks. A
data mart is a smaller version of a data warehouse, containing a database that helps a
specific group or department make decisions. Marketing and sales departments may have
their own separate data marts. Individual groups or departments often extract data from the
data warehouse to create their data marts.

Data about the data such a field names, field types, validation rules etc).

Copyright G. Campbell 2010

108

Database Management
Restricting a data mart to a particular type of data makes the management of the data
warehouse simpler and probably means that an off-the-shelf DBMS product can be used to
manage the data warehouse. Metadata8 is also simpler and easier to maintain.
A data mart that is restricted to a particular business function, such as marketing analysis,
may have many types of data and metadata to maintain, but all of those data serve the same
type of users. Tools for managing the data warehouse and for providing data to the users can
be written with an eye toward the requirements that marketing analysts are likely to have.
A data mart that is restricted to a particular business unit or geographical area may have many
types of input and many types of users, but the amount of data to be managed is less than for
the entire company. There will also be fewer requests for service, so the data warehouse
resources can be allocated to fewer users.
The following diagram summarizes the scope of alternatives for sharing data. Data
downloading is the smallest and easiest alternative. Data are extracted from operational
systems and delivered to particular users for specific purposes. The downloaded data are
provided on a regular and recurring basis, so the structure of the application is fixed, the users
are well trained, and problems such as timing and domain inconsistencies are unlikely to
occur because users gain experience working with the same data. At the other extreme, a data
warehouse provides extensive types of data and services for both recurring and ad hoc
requests. Data marts fall in the middle. As we move from left to right, the alternatives become
more powerful but also more expensive and difficult to create.

Data Marts
Data
Downloading

Particular Data
Inputs

Particular
Business
Functions

Particular
Business Unit or
Geographical
Region

Easier

Data Warehouse

More Difficult
Figure 2 - Continuum of Enterprise Data Sharing

On-line analytical processing


What is On-line analytical processing (OLAP)?
OLAP refers to an advanced data analysis environment that supports decision making,
business modelling, and operations research activities. OLAP systems share four major
characteristics, these are:
1. Use multidimensional data analysis techniques
2. Provide advanced database support
3. Provide easy-to-use end user interfaces
4. Support client/server architecture

OLAP is an approach to quickly answer multi-dimensional analytical queries. OLAP


is part of the broader category of business intelligence, which also encompasses
relational reporting and data mining. The typical applications of OLAP are in business
reporting for sales, marketing, management reporting, business process management
(BPM), budgeting and forecasting, financial reporting and similar areas. The term

Data about the data such a field names, field types, validation rules etc).

Copyright G. Campbell 2010

109

Database Management
OLAP was created as a slight modification of the traditional database term OLTP
(Online Transaction Processing).
Databases configured for OLAP use a multidimensional data model, allowing for
complex analytical and ad-hoc queries with a rapid execution time. They borrow
aspects of navigational databases and hierarchical databases that are faster than
relational databases.
The following shows the difference between the operational view of sales data and the
multidimensional view of sales data.
Operational View
INVOICE Table
Number
Date
2034
15/5/96
2035
15/5/96
2036
16/5/96
2037
16/5/96

Customer
Dartonik
INC
Dartonik
INC

Amount
$3500
$1800
$2000
$800

LINE
Number
2034
2034

Table
Product Price
Quantity
Mouse
$150
20
Diskette
$50
10

Multidimensional View
Customer Dimension
Dartonik
INC
Totals

Time Dimension
15/5/96 16/5/96 Totals
$3500
$2000 $5500
$1800
$800 $2600
$5300
$2800 $8100

Sales figures occur at the intersection of a customer row and time column

Practice Questions
1.
2.
3.
4.

What is the difference between operational data and a data warehouse?


Explain the components of a data warehouse.
What is OLAP?
Draw an example of a Multidimensional View of the data in the Education data
warehouse.

Data mining
Often, the database is distributed. Data warehouses often use a process called data mining.
Data mining is a process that often is used by data warehouses to find patterns and
relationships among data. E.g. A state government could mine through data to check if the
number of births has a relationship to income level. Many e-commerce sites use data mining
to determine customer preferences.
Examples of data mining findings can be:
65% of customers who did not use their credit card in the last six months are 88% likely
to cancel their account
82% of customers who bought a new TV 27 or larger are 90% likely to buy and
entertainment center within the next four weeks
If age < 30 and income <= 25000 and credit rating < 3 and credit amount > 25000 then
the minimum loan term is 10 years.

Copyright G. Campbell 2010

110

Database Management
Transactions Atomic, Consistent, Isolated, Durable (ACID)

An understanding of transactions is essential to the database designer especially if


he/she is designing a multiuser database. A transaction may be defined as being a
group of data modifications that must be performed entirely or not at all. All
transactions must adhere to the ACID test:
Atomic - this property states that the transaction must be completed in its
entirety or not at all.
Consistent - this property states that the transaction should never leave the
database in an inconsistent state. This property ensures that the integrity rules
and business are not violated.
Isolated - this property states that the data that is being used by a transaction is
not accessible until the transaction has been completed.
Durable - this property states that the data modification is permanent once the
transaction has been completed and if the transaction is not completed then the
system should remain in its original state.
Concurrency control
The concept of concurrency control is very important when designing multiuser
databases. Concurrency control is the process of coordinating the simultaneous
executions of transactions within a multiuser environment.
The simultaneous execution of transactions becomes problematic, only if the
transactions are attempting to access or modify the same data. If concurrency
control is not enforced at this point, then data inconsistencies may occur during
the process of data modification. The concept of isolation is what makes
concurrency control possible. Remember, isolation states that a transaction has
exclusive rights to the data being modified.
Conflict Table
Transactionl
Read
Read
Write
Write

Transaction2
Read
Write
Read
Write

Result
No conflict
Conflict
Conflict
Conflict

Lock Level
In order to accomplish isolation the DBMS makes it possible to perform a lock on a
data item. A lock is a mechanism that guarantees exclusive use of a data item. We
can have several types of locks:

Database locks - all the tables within the database are exclusive to the current
transaction.

Copyright G. Campbell 2010

111

Database Management
Table locks - all the rows and columns within a table is exclusive to the
current transaction.

Row locks - the selected rows are exclusive to the current transaction.

Column locks - the selected columns are exclusive to the current transaction.

Lock Type
Irrespective of the lock level, the DBMS may impose different lock types on the
data item. The most common are exclusive locks and shared locks. Both of these
locks are example of binary locks. A binary lock only has two states: locked or
unlocked. With this method each transaction must impose a lock on the data item
being accessed and must release the lock once the transaction has been completed.
* An exclusive lock exists when the data item is available only to a single
transaction. The problem with an exclusive lock is that the DBMS will not allow
two or more transactions to the access same data item for reading, at the same time.
* A shared lock is one that allows two or more transaction to access the same data
item for reading purposes.
Transaction Logs
The DBMS uses a transaction log to keep track of all the data modifications, which
are performed by each transaction. The DBMS will then use this information to
ensure that each transaction is durable (made permanent). A typical transaction log
will store the following pieces of information:

The start of a transaction


The name of table being modified
The primary key of the record being modified
The field that is being modified
The before and after value of the field being modified
The end of the transaction

When a system failure occurs the transaction log- is checked to see which transactions
were completed and which transactions were not. If the transaction were completed
then the DBMS would ensure the durability of the system by ensuring that the after
values are permanent. If the transaction was not completed then the system would
ensure the durability of the system by ensuring that the before values are permanent.

Copyright G. Campbell 2010

112

Database Management

UNIT V: SECURITY ISSUES


The role of the Data Dictionary
The DBMS makes use of descriptions of data items provided by the DDL. This is data
about data (meta-data). Metadata describes the structure and format of the data and the
overall database.

System tables store metadata. Contents include:


number of tables and table names, number of fields and field names, field types, field

lengths, key fields, field descriptions, files, cross references, error checks e.g. range
etc.
The DD helps a database user in:
Communicating with other users
Controlling data elements (add fields, change descriptions, formatting).
Maintaining standards.
Determining the impact of changes to data elements on the total database
Centralizing the control of data elements as an aid in database design and in
expanding the design.
Data validation
What is data security?
In the computer industry, data security refers to techniques for ensuring that data
stored in a computer cannot be read or compromised by any individuals without
authorization. Most security measures involve data encryption and passwords. Data
encryption is the translation of data into a form that is unintelligible without a
deciphering mechanism. A password is a secret word or phrase that gives a user
access to a particular program or system. [Research Protection vs Security]
What are Security Risks?
A computer or data security risk is any event or action that could cause a loss of or
damage to computer hardware, software, data, information, or processing capability.
Security risks fall into 6 main categories, they are as follows:
 Human error
 Technical error
 Virus, worm, Trojan horse
 Natural disasters etc
 Unauthorized use and access
 Theft and vandalism
Sources of incorrect data: Accidents - mistyping input or programming errors
Malicious use of the database
System problems - disk crash etc.

Copyright G. Campbell 2010

113

Database Management
Database protection involves:
Integrity preservation - concerns non malicious errors and their prevention.
Security (Access control) - concerned with restricting certain users so they
are allowed to access and/or modify only a subset of the database.
Security risks and their effects
1. Human error
Humans make mistakes. Examples of mistakes made include:
Deleting a file by accident
Formatting a hard drive
Adding data twice
Entering incorrect data
The computer is being misused by someone who is not adequately
trained/experienced (e.g. young child)
The effects of human error include:
Loss of data
Less data integrity (incorrect data) therefore incorrect information will be
retrieved
Physical damage to computer due to improper use
2. Technical error
A technical error is a system failure. The failure could be because of either
hardware, software or both. Examples include:
Hard disk crashing
Missing or corrupted files (e.g. due to not shutting down properly etc.)
Computer not booting
Drives (diskette, CD), not working (e.g. due to dust)
The effects of technical error include:
Loss of data
Loss of time in having to re-enter data
The inability to use certain devices
3. Virus
A virus is computer program that is designed to replicate itself by copying
itself into the other programs stored in a computer. It may be benign or have a
negative effect, such as causing a program to operate incorrectly or corrupting
a computer's memory. In addition to replication, some computer viruses share
another commonality: a damage routine that delivers the virus payload. A
virus payload is an action it performs on the infected computer.
The effects of viruses include:
The computer cannot boot because a boot sector virus has corrupted the boot
sector
Files are erased by the virus
Hard drive is formatted (all files are therefore lost)
Copyright G. Campbell 2010

114

Database Management
Files are corrupted by the virus
Consumption of storage space and memory
Degrading performance of the computer
It's important to remember that most viruses aren't programmed with
destructive intentions. Most simply reproduce without any destructive attack.
However, these viruses can cause damage to your files, particularly since
many of the viruses are poorly written programs that can cause unintended
software conflicts. At the very least, viruses are intrusive applications that
steal storage and CPU cycles without your permission. Most people's worst
virus fear is having their hard drive erased, but those who regularly create
back-up versions of important data could recover within a few hours. Viruses
that subtly corrupt data are potentially much more destructive - computer users
may not notice their presence until a great deal of data has been ruined. Some
viruses insert random numbers in spreadsheet applications or system files, or
add typos to word processing documents. One particularly nasty virus posted
confidential documents in the user's name to Internet newsgroups. [Research
the different types of viruses]
4. Natural disasters etc
Disasters can cause physical damage to computers, thereby causing loss of the
data on the computers.
Examples of disasters (natural and otherwise) include:
Earthquake
Hurricane
Fire
Flood
Lightening
Power surge, low voltage
Rats, roaches, insects etc.
The effects of disasters include:
Physical damage to computer
Loss of data
Repair bills
5. Unauthorized access and use
Unauthorized access is the use of a computer or network without permission.
Unauthorized access includes:
Hacker/cracker A hacker is a slang term for a computer enthusiast, i.e., a
person who enjoys learning programming languages and computer systems
and can often be considered an expert on the subject(s). Depending on how
it used, the term can be either complimentary or derogatory, although it is
developing an increasingly derogatory connotation. The pejorative sense
of hacker is becoming more prominent largely because the popular press
has co-opted the term to refer to individuals who gain unauthorized access

Copyright G. Campbell 2010

115

Database Management
to computer systems for the purpose of stealing and corrupting data.
Hackers maintain that the proper term for such individuals is cracker.
A person accessing someone elses bank account, email, medical records
etc without permission.

Unauthorised use is the use of a computer or its data for unapproved or


possibly illegal or unethical activities.
Unauthorized use includes:
Employees do things to deliberately modify the data such as give themselves
a raise
Taking money from someones account
Checking personal email or playing computer games on company time
Software piracy - the unauthorized copying of software.
The effects of unauthorized access and use are as follows:
Loss of sales due to piracy. Competing entity could use data against your
company
Loss of time
Identity theft

Also leads to theft of intellectual property9, theft of marketing


information (e.g., customer lists, pricing data, or marketing plans), or
blackmail based on information gained from computerized files (e.g.,
medical information, personal history, or sexual preference).
6. Theft and vandalism
A computer can be physically stolen or destroyed. This also causes loss of
data.
The effects of theft and vandalism include:
Loss of computer and data (and time to re-enter etc.)
Illegal access to files
Loss of income due to software piracy.
Database protection methods - backup and restore methods
Backup is the key the ultimate safeguard
Regardless of the precautions that you take, things can still go wrong. Backup is
therefore the main risk management solution. A backup is a duplicate of a file, or disk
that can be used if the original is lost, damaged, or destroyed. If your computer fails
you can restore from the backup. The following describes the different types of
backup.

Full backup that copies all of the files in a computer (also called archival
backup)

Intellectual property refers to the category of intangible (non-physical) property comprising primarily
copyright, moral rights related to copyrighted materials, trademark, patent and industrial design.

Copyright G. Campbell 2010

116

Database Management
Incremental backup that copies only the files that have changed since the last
full or last incremental backup
Differential backup that copies only the files that have changed since the last full
backup
Selective backup that allows a user to choose specific files to back up,
regardless of whether or not the files have changed since the last backup
Grandfather, Father, Son (or Three-generation backup) backup method in which
you recycle 3 sets of backups. The oldest backup is called the grandfather, the
middle backup is the father and the latest backup is called the son. Each time that
you backup you reuse the oldest backup medium. The father then becomes the
grandfather, the son becomes the father and the new backup becomes the son.
This method allows you to have the last 3 backups at all times.

Integrity Preservation keys (primary and foreign), data validation,


authority levels
Keys
Since primary keys do not allow null or duplicate values, it prevents the data
entry person from entering the same record more than once or from entering a
record with no unique identifier. Since the primary key automatically sets an
index, it also allows the DBMS to locate records faster.
The power of a database system comes from its ability to quickly find and bring
together information stored in separate tables using queries, forms, and reports.
In order to do this, each table should include a field or set of fields that uniquely
identifies each record stored in the table.

Uniqueness of key - This prevents duplication. E.g No two students should


have the same id number.
Referential integrity (must match foreign key) Ensures that related records
in separate tables have a match on the common field.
Data Validation
What is data validation? Data validation is the process of comparing data
with a set of rules or values to find out if the data is correct.
What is a validation rule? Validation rules, also called validity checks, are
checks performed on the data to ensure that the user is entering the correct
data.
What is the purpose of a validation rule? Validation rules reduce data entry
errors. They do this by limiting what the user is allowed to enter in a particular
field.
The various types of validity checks include:

Valid values List

Copyright G. Campbell 2010

117

Database Management
The data in the field is limited to a certain list of values. For example, sex can
only be male or female, marital status can only be single, married, widowed or
divorced.

Range check
A range check determines whether a number is within a specified range. (E.g.
5 to 9)

Alphabetic/numeric check (Data type check)


Alphabetic check - Ensures that users enter only alphabetic data into a field.
Numeric check - Ensures that users enter only numeric data into a field.

Field size check


Data that is entered into a field can also be limited by the size. For example,
your student id number is made up of 6 characters. The user should therefore
not be allowed to enter a student id number that has more than 6 characters.

Consistency check
This tests the data in two or more associated fields to ensure that the
relationship is logical. For example, the value in a Training_Date field cannot
occur earlier in time than the value in the Date_Joined field.

Completeness check
Verifies that a required field contains data. For example, every student must
have a first and last name entered.

Check Digit
A number or character that is appended to or inserted into a primary key value.
A check digit often confirms the accuracy of a primary key value. Bank
account, credit card and other identification numbers often include one or
more check digits.
Authority Levels
Authority levels are used to limit access (only certain users can perform
certain tasks). This is done for example through login ids and passwords. One
user may have Add/Change authority while another has Delete authority.

Security Control unauthorized access and use, encryption, anti-virus,


firewall, SQL views
A security control is an action taken to either prevent a data security risk from
happening or to reduce its effects. Security controls help to preserve the integrity of
data. Security controls include:

Unauthorized access is the use of a computer or network without permission.


Unauthorized use is the use of a computer or its data for unapproved or possibly
illegal activities (e.g. playing games, surfing net on company time).
Data validation
Reduction of human interaction (because humans make mistakes). In other words,
automate as many processes as possible. For example, use a bar code reader to
scan in the items rather than have the cashier typing in the item code
Training of users so that human error is reduced.
Supervision of children and inexperienced users.

Copyright G. Campbell 2010

118

Database Management
Separation of duties (e.g. one person enters and another person is needed to
change the data such as a cashier). This is in order to prevent employees from
making mistakes, committing fraud or stealing from the company.
Backup - just in case the hardware fails you, or if you get a virus or other problem
that causes loss of files. An offsite backup protects in cases of disaster. An offsite
backup is one that is not at the same location as the computer. You can also use
mirrored disks in which data is saved to more than one disk, if one disk crashes
the other takes over. [Research RAID]
Buy quality hardware from a reputable dealer to reduce likelihood of hardware
failure.
Get a warranty period when purchase a computer a computer that has a technical
error can therefore be fixed free of cost
Air conditioning to keep the computer cool
Plastic dust covers to keep dust out of diskette drives etc.
Proper (sturdy) desk on which to store computer
No magnets/dont open shutter and other proper diskette care procedures to
prevent data from being erased
Proper maintenance (care) e.g. defrag, cleaning computer
Regular testing of hardware and software
Virus protection - e.g. McAfee, Norton Antivirus. Anti-virus software detects and
removes viruses. The software must however be updated regularly as new viruses
are invented each day. Write protection of diskettes if not saving (only reading) so
as not to get a virus.
Limit software downloads to reduce the likelihood of getting a virus.
Use only authorized media for loading data and software.
Do not open unknown email and attachments to avoid getting a virus.
Use a firewall - a program and/or hardware that filters the data coming through the
internet to prevent unauthorized access. Some firewalls protect systems from
viruses, junk email (spam). (e.g. Black Ice, Zone Alarm)
Place computer site in a good location (e.g. not on a hillside or near the sea)
Strong, weatherproof facilities (no windows, fireproof)
No food/drink around the computer no insects, spills on keyboard etc
Raised (false) floors Similar to a false ceiling except this is below your feet. It is
used for earthquake protection as it works as a shock absorber. Raised floors also
allow you to hide cables below.
UPS (Uninterruptible Power Supply) This has a battery which charges while
there is power. It gives you time to shut down the computer properly when there is
a power cut. This is different from a generator which is used during a power cut
and runs on gas. It allows you to continue using the computer for as long as there
is gas. The UPS is important because improper shutdown can corrupt files. The
UPS also provides protection from power surges.
Surge protectors to protect against low voltage, power surge/spike, lightening etc.
Lightening rod to protect the building and all electrical devices within the building
from lightening storms.
Fire extinguishers specially made for computers (foam). These will not damage
the computers whereas water would cause damage.
Insurance of equipment in order to re-purchase if your computer is destroyed.
Access codes, passwords to prevent unauthorised access and use. Use biometric
devices e.g. Retinal scan, finger print scan, voice activated

Copyright G. Campbell 2010

119

Database Management
Intrusion detection software detects if you put in the wrong password more than
3 times and kicks you off. (What happens when you. try to put in a false telephone
card number, or the wrong PIN for your debit card at the ATM)
Audit trails and logs - audit trails keep track of what a user does when he is on the
system while log systems keeps track of user sign on/off
Physical security e.g. locks, guards, grills etc. Physical isolation of data
Encryption of data - encoding data so that it means nothing to hackers if they get
into the system.
Time and Location controls User can only use system at certain times and in
certain locations (cant hide and do wrong things)
Proper distribution and disposal - reports should be distributed to the correct users;
this reduces unauthorised access and use. Shred reports and do not just throw them
in the garbage. (e.g. do not throw away credit card statements (prevents persons
from going in your garbage and getting your private information).
Go to reputable web sites so that will not steal credit card number. Go to secure
sites (lock at the bottom of the screen).
Copyright and License agreements so that you have the right to sue persons who
steal your software/data. (Patents/Trademarks)
Auditing the programs that are written in case an unscrupulous employee
deliberately put in code for his benefit.
Callback systems the user can connect to the computer only after the computer
calls the user back at a previously established telephone number.
Metal detectors to prevent hardware theft
Lock the computer to the desk
Low profile facilities (no overt disclosure of high-value nature of site, in other
words do not display a sign to let persons know where your computer facilities
are)
Mark your computers in a secret place so that you can identify it if the police find
it. (Keep the receipt/invoice as proof of purchase and to have a record of the serial
number).
Views/Virtual tables user able to only see certain fields/records, Grant and
Revoke allows users to have only certain types of privileges e.g. update,
select, delete

Copyright G. Campbell 2010

120

Database Management

SAMPLE SQL CODE FOR RECREATING DATABASE


DROP TABLE dmot_depositor;
DROP TABLE dmot_borrower;
DROP TABLE dmot_account;
DROP TABLE dmot_loan;
DROP TABLE dmot_branch;
DROP TABLE dmot_customer;
CREATE TABLE dmot_branch (
branch_name varchar2(20), branch_city varchar2(20),
assets number, primary key (branch_name));
CREATE TABLE dmot_customer (
customer_name varchar2(20), customer_street varchar2(20),
customer_city varchar2(20),
primary key (customer_name));
CREATE TABLE dmot_account (
account_number char(5), branch_name varchar2(20),
balance number, primary key (account_number),
foreign key (branch_name) references dmot_branch);
CREATE TABLE dmot_loan (
loan_number char(5), branch_name varchar2(20),
amount number, primary key (loan_number),
foreign key (branch_name) references dmot_branch);
CREATE TABLE dmot_depositor (
account_number char(5), customer_name varchar2(20),
primary key (customer_name, account_number),
foreign key (customer_name) references dmot_customer,
foreign key (account_number) references dmot_account);
CREATE TABLE dmot_borrower (
loan_number char(5), customer_name varchar2(20),
primary key (customer_name, loan_number),
foreign key (customer_name) references dmot_customer,
foreign key (loan_number) references dmot_loan);
INSERT INTO dmot_branch VALUES ('Brooklyn Heights', 'Brooklyn', 200000000);
INSERT INTO dmot_branch VALUES ('Park Slope', 'Brooklyn', 150000000);
INSERT INTO dmot_branch VALUES ('East Village', 'New York', 300000000);
INSERT INTO dmot_branch VALUES ('Jamaica', 'Jamaica', 180000000);
INSERT INTO dmot_branch VALUES ('SOHO', 'New York', 220000000);
INSERT INTO dmot_customer VALUES ('Adams', 'Jay St', 'Brooklyn');
INSERT INTO dmot_customer VALUES ('Bob', '112th St', 'Jamaica');
INSERT INTO dmot_customer VALUES ('Christina', '7th Ave', 'Brooklyn');
INSERT INTO dmot_customer VALUES ('Johnson', 'Broadway', 'New York');
INSERT INTO dmot_customer VALUES ('Joe', 'Park Ave', 'New York');
INSERT INTO dmot_customer VALUES ('Susan', 'Canal St', 'New York');
INSERT INTO dmot_account VALUES ('A-101', 'East Village', 500000);
INSERT INTO dmot_account VALUES ('A-102', 'Jamaica', 200000);
INSERT INTO dmot_account VALUES ('A-103', 'East Village', 150000);
INSERT INTO dmot_account VALUES ('A-104', 'Park Slope', 450000);
INSERT INTO dmot_account VALUES ('A-105', 'East Village', 350000);

Copyright G. Campbell 2010

121

Database Management
INSERT INTO dmot_account VALUES ('A-106', 'Brooklyn Heights', 50000);
INSERT INTO dmot_account VALUES ('A-107', 'Jamaica', 100000);
INSERT INTO dmot_account VALUES ('A-108', 'Park Slope', 220000);
INSERT INTO dmot_loan VALUES ('L-101', 'Park Slope', 120000);
INSERT INTO dmot_loan VALUES ('L-102', 'SOHO', 200000);
INSERT INTO dmot_loan VALUES ('L-103', 'Jamaica', 100000);
INSERT INTO dmot_loan VALUES ('L-104', 'Park Slope', 180000);
INSERT INTO dmot_loan VALUES ('L-105', 'East Village', 100000);
INSERT INTO dmot_loan VALUES ('L-106', 'Jamaica', 150000);
INSERT INTO dmot_depositor VALUES ('A-101', 'Susan');
INSERT INTO dmot_depositor VALUES ('A-102', 'Adams');
INSERT INTO dmot_depositor VALUES ('A-103', 'Joe');
INSERT INTO dmot_depositor VALUES ('A-104', 'Bob');
INSERT INTO dmot_depositor VALUES ('A-105', 'Susan');
INSERT INTO dmot_depositor VALUES ('A-106', 'Johnson');
INSERT INTO dmot_depositor VALUES ('A-107', 'Susan');
INSERT INTO dmot_depositor VALUES ('A-108', 'Bob');
INSERT INTO dmot_borrower VALUES ('L-101', 'Joe');
INSERT INTO dmot_borrower VALUES ('L-102', 'Christina');
INSERT INTO dmot_borrower VALUES ('L-103', 'Johnson');
INSERT INTO dmot_borrower VALUES ('L-104', 'Bob');
INSERT INTO dmot_borrower VALUES ('L-105', 'Adams');
INSERT INTO dmot_borrower VALUES ('L-106', 'Bob');

NB. You will need to create a similar text file and execute it each time you need to
recreate your tables and data quickly.

Copyright G. Campbell 2010

122

Database Management

REFERENCES

Date, C. J. Introduction to Database Systems. Addition-Wesley.


Date, C. J. A Guide to The SQL Standard. 4th Ed. Addison-Wesley.
Entity Relationship Model. [On-line]. Available: http://en.wikipedia.org/wiki/Entityrelationship_diagram.
Gertz, Michael. Oracle/SQL Tutorial. Database and Information Systems Group,
Department of Computer Science, University of California, Davis,
Available: http://www.db.cs.ucdavis.edu.
Helman, Paul. The Science of Database Management. Irwin
Peter, Hadrian Dr. Database Management Systems Lecture Notes. UWI. Cave Hill.
Rob, Peter, Coronel, Carlos. Database Systems: Design, Implementation and
Management. 3rd Ed. Thomson Publishing.
Scarlett, H. (2005). Database Management Lecture Notes.
Shelly, G., Cashman, T. Discovering Computers 2006. Thomson.

Copyright G. Campbell 2010

123

You might also like