You are on page 1of 98

UNIT -I

1.Data base system

Database system is nothing more than a computer-based record keeping system


(i.e.) a system whose overall purpose is to record and maintain information. The
information concerned can be anything that is deemed to be of significance to the
organization or the system which may serve the organization in decision-making
processes involved in the management of that organization.

The database system involves four major componenets.They are data ,hardware,
software and users.

Database Management System

Us
er1

User

User

Application programs End Users


Fig: Simplified picture of a database system

Data
The data stored in the system is partitioned into one or more databases. A database
is a repository for stored data, it is of both integrated and shared.
Integrated: By integrated we mean that the database can be thought of as a
unification of several distinct files, with the redundancy among those files
eliminated.
Example: Combination of EMPLOYEE and ENROLLMENT data files.
Shared: By Shared we mean that individual pieces of data in the database can be
shared among different users that is many users can have access to the same piece of
data.
Example: The department information in EMPLOYEE file would be shared by
users in the personal department, education department etc.

1
Hardware

The hardware consists of the secondary storage device disks, drums,etc… on which
the database resides together with the associated devices, control units, channels and
so forth.

Software

Between the physical database and the users of the system is a layer of software
usually called the DBMS.All requests from users for access to the database are
handled by the DBMS.One general function provided by the DBMS is thus the
shielding of the database users from hardware level. The DBMS provides a view of
the database that is elevated somewhat above the hardware level and supports user
operation that are expressed in terms of that higher-level view.

Users

We consider three broad categories of database users, they are


*application programmers
*end-users
*DBA

1.Application programmers
Application programmer is responsible for writing application programs that use
the database. These application programs operate on the data in all the usual ways
that is in retrieving information, creating new information, deleting or changing
existing information.

2.End-users
End-users access the database from a terminal. An end-user may employ a
query language provided as an integral part of the system or may invoke a user-
written application program that accepts commands from the terminal and in turn
issues requests to the DBMS on the end-user’s behalf.

3.Database Administrator

DBMS have central control of both the data and to the programs that
access those data. The person who has such control over the system is called
DBA.The main functions of DBA are
*Schema definition
*Storage structure and access-method definition
*Granting and physical-organization modification
*Integrity-constraint specification

These are the various components of a database system.

2
2.Operational data

A database is a collection of stored operational data used by the application systems of


some particular enterprise. Where enterprise is a conventional generic term for any
reasonably self-contained commercial, scientific, technical or other organization.
Examples.
Manufacturing company,Bank,Hospital,University,Government department etc.
The enterprise should maintain a lot of data about its operation. The “operational data”
for the enterprises quoted above are,
Product data, account data, patient data, student data, planning data.

Example for the illustration of operational data

Consider the manufacturing company where the enterprise will wish to


retain information about the projects it has on hand; parts used in those projects; the
suppliers who supply the parts; the warehouses in which the parts are stored; the
employees who work on the projects etc..These are the basic entities about which data is
recorded in the database. In general there will be associations or relationships linking the
basic entities together(entity is any distinguishable object).

For example, there is an association between suppliers and parts that is each
supplier supplies certain parts and conversely each part is supplied by certain suppliers
etc..

projects
suppliers

parts
warehouses employees

locations department
s

Fig: An example of operational data


The figure illustrates

1.Most of the associations are between two entities or more than that
ex., arrow connecting suppliers-parts-projects

3
Here supplier s2 supplies part p4 to project j3.

2.The example also shows one arrow involving only in one type of entity (parts)
ex., some parts are components of other parts (a screw is a component of a huge
assembly or char etc..)

3.Some entities may be associated in more than one relationship


Ex., projects and employees are linked in two relationships
a. the employee works on the project
b .the employee is the manager of the project
This example clearly illustrates operational data and its functions.

3.Data Independence

The ability to modify a schema definition in one level without affecting a schema in the
next higher level is called data independence.
Most present day applications are data-dependent. This means ,the way in which the
data is organized in secondary storage and the way in which it is accessed are both
dictated by the requirements of the application ,and moreover that knowledge of the data
organization and access technique is built into the application logic.
For example, if a file is stored in indexed sequential form, and in order to modify the
file the indexes defined should be known. Here the data is dependent, and the
modification requires complete application program to be rewritten.
In database system, data resides independent and any modification done at physical
level/conceptual level may not affect the database system.

Two types of data independence stated are

1.Physical data independence

Physical data independence is the ability to modify the physical schema


without causing application programs to be rewritten. Modifications at the physical
level are occasionally necessary to improve performance.
Example,
Modifying the structure of the database using ALTER command etc.

2.Logical data independence

Logical data independence is the ability to modify the logical schema without
causing the application programs to be rewritten.
Example,
Modifications such as adding new columns or field to the database.
Most of the modifications are done by the DBA and the types of change that the

4
DBA wish to make may be explained with the help of the following definitions:

Stored field: Stored field is the smallest unit of data stored in the database.

Ex., database containing information about parts would probably include a stored field
type called part number etc.

Stored record: Stored record is a named collection of associated stored fields.


Stored file: Stored file is the collection of all occurrences of one type of stored record.

Similarly if a data type of the stored field has to be changed is also done by Data. The
data storage may be in any of the following form.

1.Representation of numeric data


Data may be stored in internal arithmetic form or as a character string.

2.Representation of character data


A character field may be stored in any of several character codes
(eg.EBCDIC,ASCII..)

3.Units for numeric data


The units in a numeric field may change.Ex.,from inches to centimeters

4.Data coding
In some situations it may be desirable to represent data in storage by coded values.
Ex., the value for part color=RED can be interpreted as 1=’RED’.

5.Structure of stored records


Two existing types of stored record may be combined into one. For ex., the record
types(part number, color) and (part number, weight) may be integrated to give (part
number,color,weight).
Also a single type of stored record may be split into two. For ex.,(part
number,color,weight) may be broken down into (part number, color) and (part number,
weight).

6.Structure of stored fields


A given stored file may be physically implemented in storage in a wide variety of
ways.
For ex., storing the file in single storage volume or spread across several volumes.
The above fact implies that the database is able to grow without affecting existing
applications.

5
4.Architecture for a Database system

The architecture is divided into three general levels, they are internal,conceptual,external
levels,

------------------- - External level


(individual user
views)

Conceptual level
(Community user view)

Internal level
(Storage view)

Fig:Three levels of architecture

*Internal level(Physical level)

This level is the one closest to the physical storage .This is a low-level representation
of the entire database; it consists of many occurrences of each of many types of internal
record .The storage view is described by means of the internal schema which not only
defines the various stored record types but also specifies what indexes exist, how stored
files are represented ,what physical sequence the stored records are in and so on.

*Conceptual level (Community logical level)

This level is the representation of the entire information content of the database. It
consists of many occurrences of each of many types of conceptual record. Also this is a
level of indirection between the other two levels.

*External level(user logical level)

This level is closest to the users and is concerned with the way the data is seen by the
individual users. The users may be application programmers,end-users,DBA etc.Each
user has a language at his/her disposal to interact with the database.
For the application programmer the language will be either a conventional
programming like c++,JAVA etc.
For end users the language will be either a query language or some special-
purpose language and that language is data sub language (DSL) which is a subset of the
total language that is concerned with database objects and operations. The DSL is
embedded within the corresponding host language . A given system might support any
number of host languages and any number of data sub languages; however, one particular
data sub language that is supported by almost all current systems is the language SQL.

6
Any given data sub language is a combination of at least two subordinate languages-a
Data definition language(DDL) and data manipulation language(DML).Where the DDL
portion consists of declarative constructs and the DML portion consists of executable
statements.
The individual user will generally be interested only in some portion of the total
database; moreover ,that user’s view of that portion will generally be somewhat abstract
when compared with the way the data is physically stored. The term for an individual
user’s view is an external view. An external view is thus the content of the database as
seen by some particular user.

For example,
A user from the Personnel Department might view the details of employee and
department and nothing else.

Detailed System architecture


User A1 user A2 User B1 User B2

Host language Host language Host language Host language


+DSL +DSL +DSL +DSL

*external *external
schema A External view A schema B External view B

External/conceptual External/conceptual
mapping A mapping B Database
managem
ent
conceptual Conceptual view system
schema (DBMS)

Conceptual/internal
mapping

storage structure Stored database(internal level)


definition
(internal schema)

*user interface
fig: Database system architecture

7
Mappings
The mappings involved in the architecture are conceptual/internal mapping and
external/conceptual mappings.
The conceptual/internal mapping defines the correspondence between the conceptual
view and stored database, it specifies how conceptual records and fields are represented
at the internal level. If the structure of the stored database is changed then the
conceptual/internal mapping must be changed accordingly, so that the conceptual schema
can remain invariant. The effects of such changes must be isolated below the conceptual
level, in order to preserve physical data independence.
The external/conceptual mapping defines the correspondence between a particular
external view and the conceptual view.

Database administrator(DBA)

The Data Administrator(DA) is the person who makes the strategic and policy
decisions regarding the data of the enterprise and the DBA is the person who provides the
necessary technical support fro implementing those decisions. Thus the DBA is
responsible for the overall control of the system in technical level. The major tasks of
DBA are
*defining the conceptual schema or schema definition
*storage structures and access-method definition
*schema and physical organization modification
*granting of authorization for data access
*integrity constraint specification

DBMS

The DBMS is the software that handles all access to the database. Its functions
are as follows
• A user issues an access request using some particular data sub language
• The DBMS intercepts that request and analyses it.
• The DBMS inturn,intercepts the external schema for that user, the corresponding
external/conceptual mapping, the conceptual schema, the conceptual/internal
mapping, the storage structure definition.
• The DBMS executes the necessary operations on the stored database

8
The diagrammatic representation of the major functions of DBMS and its components.

Source schemas Planned DML Unplanned


and mappings requests DML requests

DDL processors DML processor Query language


processor

Compiled
requests

Enforce security and


Integrity constraints
Source and optimizer
object schemas
and mappings

Optimized
requests

Run time manager


Meta
data

database

Metadata (data dictionary)

9
5.Distributed databases

The key objective of distributed system is that it should look like a centralized system to
the users. Distributed processing means that distinct machines can be connected together
into communication network such as the Internet, so that the single data-processing task
can span several machines in the network.
A distributed database is typically a database that is not stored in its entirety at a single
physical location, but rather is spread across a network of computers that are
geographically dispersed and connected through communication links.
For example, consider a banking system in which the customer accounts database is
distributed across the bank branch offices, such that each individual customer account
record is stored at the customer’s local branch. It other words the data is stored at the
location at which it is frequently used, but is still available through communication
network to users at other locations for example, users at the bank’s central office.

Client
Server
Client
Server

Communication
network

Client
Client
Server
Server

D
database

Advantages
• Efficiency of local processing

10
• Data sharing

Disadvantages

• Overhead may be quite high


• Technical difficulties

6.Storage structures and its purposes.

The main idea behind data maintenance is for future reference and it has to be stored
for the storage and access of data ,various techniques like sequential ,direct access etc.
exists. Once the data is stored in the memory in internal level(physical storage) then it is
accessed through DML operations in terms of external records and must be converted in
turn to operations at the actual hardware level that is to operations on physical records or
blocks. The component responsible for this internal/physical conversion is called an
access method. The access method consists of a set of routines whose function is to
conceal all device-dependent details from the DBMS and to present the DBMS with a
stored record interface.

USER
user interface

External record DBMS

occurrences Stored record interface


Access Method
Stored record
occurrences Physical record interface

physical record
occurrences

Fig: The stored record interface

The stored record interface thus corresponds to the internal level, just as the user interface
corresponds to the external level. Also the stored record interface allows DBMS to view
the storage structure as a collection of stored files each consisting of all occurrences of
one type of stored record. The DBMS knows
*What stored files exist

11
*The structure of the corresponding stored record
*The stored fields on which it is sequenced
*The stored field which can be used for direct access etc.
These information will be specified as part of the storage structure definition.
The DBMS does not know
a)anything about physical records
b)how sequencing is performed
c)how direct access is performed
These information are specified to the access method not to the DBMS.

Also ,when a new stored record occurrence is first created and entered into the database,
the access method is responsible for assigning it a unique stored record
address(SRA).This value distinguishes each stored records from other records, the SRA
for a particular occurrence is returned to the DBMS by the access method when the
occurrence is first created and may be used by the DBMS for subsequent direct access to
the occurrence concerned. The SRA for a given occurrence does not change until the
occurrence is physically moved as part of a database reorganization.

7.How data are stored in the physical storage?

There are various possible representations of data within the memory and some of
them are explained here. Consider the following example.
S# Sname Status City
S1 Smith 20 London
S2 Jones 10 Paris
S3 Blake 30 Paris
S4 Clark 20 London
S5 Adams 30 Athens

The table consists of information about five suppliers for each supplier a record
number ,a supplier name, a status value and a location is recorded. Also the supplier
number for each supplier is unique, that is each record is sequenced on the basis of its
primary key.
The above example is the simplest from of data representation containing only five
record occurrences with unique supplier number. If the suppliers are 10000 rather than
five and located in only 10 different cities then the storage will be wasted specifying the
10 cities among 10000 suppliers. Then the pointer is specified from the supplier file to
the city file by separating the city attribute alone to a file.

The following is another form of data the representation

Supplier file
city file

12
S# Sname Status City-ptr
S1 Smith 20
S2 Jones 10
S3 Blake 30
S4 Clark 20
S5 Adams 30
City
Athens
London
Paris

In the above figure the pointers exists from supplier file to the city file and they are
SRAs(Storage record address).Advantage of this form of representation over the previous
one is, in the later memory space is saved.

The third form of data representation is indexing. If a file is indexed on any of its
attributes(more frequently occurring) then accessing such file is quite easier. The
representation can be

City Supplier ptr


Athens
London

paris
S# Sname Status
S1 Smith 20
S2 Jones 10
S3 Blake 30
S4 Clark 20
S5 Adams 30

indexed on city

An example,”Find all suppliers in a given city”,when this query is placed then the result
is retrieved quite easily from the database if represented as above that is in indexed form.

The purpose of indexing is to provide an access path to the file.An index is a file in
which each entry(record) consists of a data value together with one or more pointers.The
data value is a value for some field of the indexed file and the pointers identify records in
the indexed file having that value for that field.An index can be used in two ways first it

13
is used for sequential access to the indexed file and another is used for direct access to
individual records in the indexed file on the basis of a given value for that same field.
The another form of dat representation is multilist organisation.

8.DATA STRUCTURES AND CORRESPONDING OPERATORS

The range of data structures supported at the user level is a factor that critically affects
many componenets of the system .It dictates the design of the corresponding data
manipulation languages,since DML operation must be defined in terms of its effect on
those datastructures.We may categorize database systems according to the approach and
the best known approaches are

Relational approach
Hierarchical approach
Network approach

The relational approach

The relational approach uses a collection of tables to represent both data and the
relationships among those data. Each table has multiple columns and each column has a
unique name.

Sample relational database

Bank customer
Customer name Snsocial-security-no. customer-street customer-city account-no.
Johnson 92-83-7465 Alma Palo Alto A-101
Smith 019-28-3746 North Rye A-215
Hayes 677-28-9011 Main Harrison A-102
Turner 182-73-6091 Putnam Stamford A-305
Johnson 192-83-7465 Alma Palo Alto A-201
Jones 321-12-3123 Main Harrison A-217
Lindsay 336-66-9999 Park Pits field A-222
Smith 019-28-3746 North Rye A-201

Accounts

account-no balance
A-101 500
A-215 700

14
A-102 400
A-305 350
A-201 900
A-217 750
A-222 700

For example, customer Johnson whose social-security-no. is 192-83-7465 lives on Alma


in Palo Alto and has 2 accounts A-101 with balance 500,a-201 with balance 900.Also
smith and Jhonson shares A-201 account.

Network model
Data in the network model are represented by collections of records and
relationships among data .The relationships among data can be represented by links,
which can be viewed as pointers

Sample network databases

Johnson 192-83- Alma Palo Alto A-101 500


7465

Smith 019-28- Nort Rye


A-215 700
3746 h

Hierarchical Model

This form of data representation is similar to network model in the sense that records
represent data and relationships among data and links .It differs from the network model
in that the records are organized as collection of trees rather than graphs.

15
9.Advantages of using DBMS

Many enterprises choose to store its operational data in an integrated database because
it provides the enterprise with centralized control of its operational data, which is most
valuable.

DBA has the central responsibility over operational data.


Advantages if data is stored under centralized control.

1.Redundancy can be reduced


In non-database system each application has its own private files-which may cause
redundancy in stored data. By means of integration this can be avoided.

2.Inconsistency can be avoided (to some extent)


Suppose the fact, Employee E3 works in department D8 is represented by two distinct
entries in the database and the system is not aware of this duplication. And if any one
alone is updated in some occasions they will not agree and comes inconsistent state.
So if the redundancy is controlled then the system could guarantee that the database is
never inconsistent as seen by the user, by ensuring that any change made to either of two
entries is automatically made to each other. This process is known as propagating
updates.

3.The data can be shared


New applications can access the stored databases.

4.Security restrictions can be applied.


Only if permissions are available all users could access the database. The permissions
are given by the DBA, so the data ensures security.

5.Integrity can be maintained


Data in the database is accurate or not is mostly validated.

10.Database Administrator

One of the main reasons for using DBMS is to have central control of both
the data and the programs that access those data. The person who has such central control

16
over the system is called the database administrator (DBA). The functions of the DBA
include the following.

Schema definition: The DBA creates the original database schema by writing a set of
definitions that is translated by DDL compiler to a set of tables that is stored
permanently in the data dictionary.

Storage structure and access-method definition: The DBA creates appropriate storage
structures and access methods by writing a set of definitions, which is translated by the
data-storage and data-definition-language compiler.

Schema and physical-organization modification: Programmers accomplish the


relatively rare modifications either to the database schema or to the description of the
physical storage organization by writing a set of definitions that is used by either the
DDL compiler or the data-storage and data-definition language.

Granting of authorization for data acess: Granting of different types of authorization


allows the DBA to regulate which parts of the database various users can access.

Integrity – constraint specification: Setting constraints (conditions) while entering data


to the database .For ex, the minimum balance in the account should be at least 500 etc.

17
DATABASE MANAGEMENT SYSTEM
UNIT I
Objective questions

1.Database is
a) Computer-based billing system
b) Computer-based record keeping system
c) Computer-based animation system
2.The software used for access to the database is
a) BASIC b) PASCAL c) DBMS
3.The end-users access the database from the terminal using
a) Query language b) English language c) C language
4.DBA stands for
a) Data Base Administrator b) Data base Access c) Data Batch Administration
5.Which of the following is not operational data
a) Product data b) Account data c) two numbers
6.The database system provides the enterprise with ___________ control of its
operational data
a) Centralized b) Single c) Shared
7.The ability to modify the schema definition in one level without affecting the schema in
the other level is called
a) Data dependence b) data independence c) data abstraction
8.Which of the following is not a level of database architecture
a) External b) logical c) super d) conceptual
9.Data sub language is a combination of
a) DDL and DML b) DDL and TCL c) C and C++
10.A database that is not stored in a single physical location in its entirety and spread
across the network is
a) Centralized database b) Distributed database c) Shared database
11.DBMS is
a) A software that handles all access to the database
b) A hardware
c) An interface between end-user and computer
12.The component responsible for internal/physical conversion is called
a) Access method b) internal conversion c) a hardware
13. SRA is
a) Stored Record Array b) Stored Record Access c) Stored Record Address
14.Primary key is the key which

18
a) Avoids duplication of data b) supports duplication of data c) allows null values
15.The data is represented in terms of
1) Relational approach 2) hierarchical approach 3) network approach

a) 1,2 b) 1,2,3 c) none of the above

16.The representation of data in relational approach


1) Tables 2) tuples 3) relations
Ans: a) 1 b) 1,2 c) 1,2,3 d) none
17.The data represented in network approach is through
a) Records and links b) tables c) trees
18.The ___________permits the DBMS to view the storage structure as a collection of
stored files.
a) Stored record interface b) Stored record address c) Access method
19.Entity is
a) Any distinguishable real world object
b) Not an object
c) Incident
20.DBMS stands for
a) Data Base Management System b) Database Multimedia system
c) Data Base Management Standards

Short questions

1.What are the basic components of database system?


2.Explain the components of a database system with the simplified diagram.
3.What is an operational data?
4.Explain operational data with example.
5.Explain data independence.
6.Why database systems is adopted rather than filesystem or write down the advantages
of database system.
7.Distinguish between input, output, and operational data
8.Explain three levels of database system in brief.
9.What is the role of DBA?
10.What are the functions of DBMS?
11.Explain in brief distributed databases.
12.Relate distributed databases with client server architecture.
13.Explain access method, SRA, SRI.
14.Differentiate relational, network, hierarchical approaches.
15.Explain any one form of data representation.

Elaborate questions

19
1.Role of DBA with any one-function explanation in detail
2.DBMS and its functions, advantages, disadvantages
3Database system is followed now-a-days. Justify
4.Explain the architecture of database system.
5.Explain database system with simplified structure.
6.Explain storage structures with at least any one representation.
7.Explain various data structures used to represent data in database system.

Course : B.Com CA
Semester : III

Subject : Data Base Management System

Unit : Two
Unit II

Syllabus
Relational approach: Relational data structure: relation, domain, attributes, keys
Relational algebra: Introduction, traditional set operation, attribute names for
derived relations, special relational operations.

Books for Reference:

Database system Concepts - Abraham silberschatz, Henry


F.Korth, S.Sudharsan

An introduction to database system - C.J.Date

Principles of database system -Aho D.Ullman

An introduction to database systems -Bipin


P.Desai

Relational Approach

20
Introduction:
The relational model has established itself as the primary data model for
commercial data-processing applications. The first database systems were based on
either the network model or the hierarchical model. The relational model is now
being used in numerous applications outside the domain of traditional data
processing.

Structure of relational databases.

A relational database consists of a collection of tables, each of which is


assigned a unique name. A row in a table represents a relationship among a set of
values. The rows are termed as tuples and columns are termed as attributes. Since a
table is a collection of such relationships, there is a close correspondence between
the concept of table and the mathematical concept relation, from which the
relational data model takes its name.

The following account table or relation has three column headers: branch-
name, account-number and balance. These are the attributes (columns are referred
as attributes). For each attribute there is a set of permitted values, called the domain
of that attribute. For the attribute, branch-name set of all branch-names is its
domain.

The account relation

Branch- Account-number Balance


name
Downtown A-101 500
Mianus A-215 700
Perry ridge A-102 400
Round Hill A-305 350
Brighton A-201 900
Redwood A-222 700
Brighton A-217 750

Let D1 denote the set of all branch-names, D2 denote the set of all account-
numbers, and D3 the set of all balances. In the account relation it consists of a 3-
tuple (v1, v2, v3), were v1 is a branch name, v2 is an account number and v3 is a
balance. The account will contain only a subset of the set of all possible rows. It can
be represented as
D1 * 2 * D3
In general a table of n attributes must be a subset of
D1 * D2 *……Dn-1 * D n
The relation is said to be a subset of a Cartesian product of a list of domains.
Tables are relations and the mathematical terms relation and tuple is used for the
terms table and row respectively. In the account relation of the above figure there

21
are seven tuples. Let the tuple variable t refer to the first tuple of the relation .We
use the notation t [branch-name] to denote the value of t on the branch-name
attribute. Thus, t [branch-name]=”Downtown”, and t [balance]=500.Since the
relation is a set of tuples, we use the mathematical notation of t E r to denote that
tuple r is in relation r.

Domain: -Domain is a pool of values.


Also we can say that domain is atomic if elements of the domain are
considered to be individual units. For example, the set of integers is a nonatomic
domain. The distinction is that we do not normally consider integers to have
subparts, but we consider sets of integers to have subparts-namely, the integers
comprising the set. It is possible for several attributes to have the same domain.

The customer relation

Customer- Customer- Customer-city


name street
Jones Main Harrison
Smith North Rye
Hayes Main Harrison
Curry North Rye
Lindsay Park Pittsfield
Turner Putnam Stamford
Williams Nassau Princeton
Adams Spring Pittsfield
Johnson Alma Palo Alto
Glenn Sand Hill Woodside
Brooks Senator Brooklyn
Green Walnut Stamford

It is possible for several attributes to have the same domain. For example,
suppose that we have a relation customer that has the three-attribute customer-
name, customer-street and customer-city, and a relation employee that includes the
attribute employee-name. It is possible that the attributes customer-name and
employee-name will have the same domain: the set of all person names. The domains
of balance and branch-name are certainly distinct. It is perhaps less clear whether
customer-name and branch-name should have the same domain. At the physical
level, both customer names and branch-names are character strings. However, at the
logical level, we may want customer-name and branch-name to have distinct
domains.

22
Relation:

Definition for relation (mathematically):


Given a collection of set D1, D2,……Dn (not necessarily distinct,R is a relation on
those n sets if it is a set of ordered n-tuples <d1,d2,……dn> such that d1 belongs to
D1,d2 belongs to D2 ,…..dn belongs to Dn.Set D1,D2,D3,…..Dn are the domains of
R.The value of n is the degree of R.

The concepts of relation correspond to the programming-language notion of a


variable. The concept of a relation schema corresponds to the programming-
language notion of type definition. It is convenient to give a name to a relation
schema, just as we give names to type definitions in programming languages. We
adopt the convention of using lowercase names for relations, and names beginning
with an uppercase letter for relation schemas. For example,

Account-schema=(branch-name, account-number, balance)

The explanation of relation can be expressed diagrammatically with the help of


E-R diagrams. Before discussing E-R diagrams, the common terms used in the
diagrams is analysed.

Entity: This is a thing or object in the real world that is distinguishable from all
other objects. For example, each person in an enterprise is an entity. An entity has a
set of properties, and the values for some set of properties may uniquely identify
entity. For example, the social-security number 677-89-9011(employee number
1111) uniquely identifies one particular person in the enterprise.

Entity Set: An entity set is a set of entities of the same type that share the same
properties or attributes. The set of all persons who are customers at a given bank,
for example, can be defined as the entity set customer.

Attributes: An entity is represented by a set of attributes. Attributes are descriptive


properties possessed by each member of an entity set. Possible attributes of
customer entity are customer-number, customer-street, and customer-city. The
following attribute types, as used in the E-r model, can characterize an attribute.

• Simple and Composite attributes: The attributes, which can be divided into
subparts, are composite attribute. For example, name is an attribute,
which is combination of first-name, middle name, and last-name.

• Single-valued and Multivalued attributes: The attributes that we have


specified in our examples all have a single value for a particular entity.
For instance, the loan-number attribute for a specific loan entity refers to
only one loan number. Such attributes are said to be single valued. There

23
may be instances where an attribute has a set of values for a specific
entity.

• Null attributes: A null value is used when an entity does not have a value
for an attribute.

• Derived attribute: The value for this type of attribute can be derived from
the values of other related attributes or entities. For instance, let us say
that the customer entity set has an attribute loans-held, which represents
how many loan a customer entity set has from the bank. We can derive
the value for this attribute by counting the number of loan entities
associated with that customer.

Relationship sets

Consider the relation loan.

Branch- Loan- Amount


name number
Downtown L-17 1000
Redwood L-23 2000
Perry ridge L-15 1500
Downtown L-14 1500
Mianus L-93 500
Round Hill L-11 900
Perry ridge L-16 1300

A relationship is an association among several entities. For example, we can define


a relationship that associates customer Hayes with loan number L-15.This
relationship specifies that Hayes is a customer with loan number L-15.

A relationship set is a set of relationships of the same type.Formally.it is a


mathematical relation on n>=2 (possibly non distinct) entity sets. If E1, E2,…..En
are entity sets, then a relationship set R is a subset of
{(e1, e2,…………..,en)|e1 ∈ E1,e2 ∈ E2 ,…..en ∈ En}
Where (e1, e2,…….en) is a relationship.

Consider the two entity sets customer and loan, we can define the relationship set
borrower to denote the association between customers and the bank loans that the
customers have. As another example, consider the two-entity sets loan and branch.
We can define the relationship set loan-branch to denote the association between a
bank loan and the branch in which that loan is maintained.

24
Each row of the table represents one n-tuple of the relation. The number of tuples
in the relation is called the cardinality of the relation. Eg. The cardinality of the
relation loan is 7.

The relations may be unary, binary, ternary, n-ary etc.

Unary: Relations of degree one is unary.

For ex, the query Find the branch name that issued loan with number L-17.The
output will be

Branch-name
Downtown

Binary: Relations of degree two are binary.

Ex, Find branch-name and amount for loan-number L-17 from branch
relation
The output will be,

Branch- Amount
name
Downtown 1000

Ternary: Relations of degree three are ternary

N-ary: Relations of degree n are n-ary.

Mapping cardinalities: Mapping cardinalities, or cardinality ratios, express the


number of entities to which another entity can be associated via relationship set.
Mapping cardinalities are most useful in describing binary relationship sets,
although occasionally they contribute to the description of relationship sets that
involve more than two entity sets.
For binary relationship set R between sets A and B, the mapping cardinality must
be one of the following:

One to one: An entity is associated with at most one entity in B, and an entity in B
is associated with at most one entity in A.

One to Many: An entity in A is associated with any number of entities in B.An


entity in B, however, can be associated with at most one entity in A.

Many to one: An entity in A is associated with at most one entity in B.An entity in
B, however, can be associated with any number of entities in A.

25
Many to Many: An entity in A is associated with any number of entities in B, and
an entity in B is associated with any number of entities in A.

Keys:

In a relation there is one attribute whose values is unique within the relation and
thus can be used to identify the tuples of that relation.

For ex, in the above said loan relation the loan number can be considered as a key,
which is unique, and can be used to distinguish all other tuples in that relation.
Befrore discussing on various keys let us have a glance on integrity constraints.

Integrity constraints:

An integrity constraint is a mechanism used by oracle to prevent invalid


data entry into the table. It is nothing but enforcing rule for the coloumn in
a table. The following are the various types of integrity constraints: -

*Domain integrity constraints

Maintains value according to the specification like ‘not null’


condition, so that the user has to enter a value for the coloumn on which it is
specified.
‘Not null’ and ‘Check’ constraints fall unde this category.

*Entity integrity constraint

Maintains uniqueness in a record.

*Referential integrity constraint

Enforces relationship between tables

To establish a ‘parent-child’ or a ‘master-detail’ relationship


between two tables having a common column we make use of referential
integrity constraints. To implement this we should define the column in the
parent table as a primary key and the same column in the child table as a
foreign key referring to the corresponding parent entry.

We define constraint to either at table or column level. If it is defined at


the table level, then it can be enforced to any number of columns in a
table .On other hand, if it is defined at the column level then it holds good
only for the column for which it is defined.

Various keys related to relational approaches are

26
Primary Key: Primary key is a set of one or more attributes that, taken
collectively allows us to identify uniquely an entity in the entity-set.

Ex.1) An-number in the loan relation


2) Also the combination of branch-name and loan-number

Candidate Key: Several distinct sets of attributes could serve as candidate


key

Referenced key:It is a unique or a primary key, which is defined on a


coloumn belonging to the parent table.

Foreign Key: A coloumn or combination of coloumns included in the


definition of referential integrity, which would refer to a referenced key.

Child table: This table depends upon the values present in the referenced
key of the parent table, which is referred by a foreign key.

Parent table: This table determines whether insertion or updation of data


can be done in child table. This table would be referred by child table’s
foreign key.

On delete cascade clause

If all rows under the referenced key coloumn in a parent table are
deleted, than all rows in the child table with dependent foreign key will also
be deleted automatically.

Entity-Relationship Diagrams:

An E-R diagram can express the overall logical structure of a database


graphically. Such a diagram consists of the following major components:

The symbol used to represent entity is rectangle

The symbol used to represent attribute is ellipse

The symbol used to represent links is lines _______

The symbol used to represent the relation is

The symbol used to represent multivalued attributes is Double ellipses

The symbol used to represent the derived attributes is dashed ellipses

27
The symbol used to represent the total partition of entity in a relationship set is
double lines.

E-R diagram for a Banking-Enterprise Branch-city

Account-
number Balance Branch- Assets
name

Accoun
account t- branch
branch

Depo Loan-
sit-or branch

Borr
customer o- loan
wer
Customer-
Customer- street Loan-number
name
Customer-city Amount

Various relations used for the discussion of this chapter are

1.Account relation

Branch- Account-number Balance


name
Downtown A-101 500
Mianus A-215 700
Perry ridge A-102 400
Round Hill A-305 350
Brighton A-201 900
Redwood A-222 700
Brighton A-217 750

28
2.Loan relation

Branch- Loan- Amount


name number
Downtown L-17 1000
Redwood L-23 2000
Perry ridge L-15 1500 3.Branch relation
Downtown L-14 1500
MianusBranch-city
Branch-name L-93 Assets 500
DowntownRound BrooklynL-11
Hill 9000000900
Redwood Perry ridge
Palo altoL-16 21000001300
Perryridge Horse neck 1200000
Mianus Horse neck 400000
Round hill Horse neck 8000000
Pownal Bennington 300000
North town Rye 3700000
Brighton Brooklyn 7100000

4.Customer relation

Customer- Customer- Customer-city


name street
Jones Main Harrison
Smith North Rye
Hayes Main Harrison
Curry North Rye
Lindsay Park Pittsfield
Turner Putnam Stamford
Williams Nassau Princeton
Adams Spring Pittsfield
Johnson Alma Palo Alto
Glenn Sand Hill Woodside
Brooks Senator Brooklyn
Green Walnut Stamford

5.Depositor relation

Customer- Account-number
name
Johnson A-101
Smith A-215
Hayes A-102
Turner A-305

29
Johnson A-201
Jones A-217
Lindsay A-222

6.Borrower relation

Customer- Loan-number
name
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Adams` L-16

Relational Algebra

Note: Query languages


A query language is a language in which a user requests information from the
database. These languages are typically of a level higher than that of a standard
programming language. Query languages can be categorized as being either
procedural or non-procedural .In procedural language, the user instructs the system
to perform a sequence of operations on the database to compute the desired result.
In a non-procedural language, the user describes the information desired without
giving a specific procedure for obtaining that information.

Introduction

Relational algebra is a collection of operations on relations. Also it is a procedural


query language, it consists of a set of operations that take one or two relations as
input and produce a new relation as their result.

The fundamental operations or traditional set operations available with relational


algebra are select, project, set difference, Cartesian, rename, union. In addition to the
fundamental operations, there are several other operations-namely, set intersection,

30
natural join, division, and assignment. These operations will be defined in terms of
the fundamental operations. Also we can state the selction, projection, join and
division operations as special relational operators.

Fundamental operations

The select, project and rename operations are called unary


operations, because they operate on one relation. The other three operations union,
setdifference and Cartesian product operate on pairs of relations and are, therefore
called binary operations.

The select operation

The select operation selects tuples that satisfy a given predicate. The
lowercase Greek letter sigma (σ ) is used to denote selection. The predicate appear
as a subscript to σ . The argument relation is given in parenthesis following the σ .

Example:
1.Select those tuples of the loan relation where the branch is “Perryridge”.
σ branch _name=”perryridge”(loan)
The result of the query is
Branch- Loan- Amount
name number
Perryridge L-15 1500
Perryridge L-16 1300
2.Find all tuples in which the amount lent is more than $1200
σ Amount>1200(loan)
All comparisons using =,≠ , <,≥ ,≥ in the selection predicate. Also we can
combine larger predicates using the connectives and (^) and or (۷).

3.Find those tuples pertaining to loans of more than $1200 made by Perryridge
branch
σ branch _name=”perryridge”^amount>1200(loan)

The project operation

Suppose we want to list all loan numbers and the amount of the loans, but do not
care about the branch name. The project operation allows us to produce this relation.
The project operation is a unary operation that returns its argument relation, with
certain attributes left out. Since a relation is a set, any duplicate rows are eliminated.

31
Projection is denoted by the Greek letter pi (π). We list those attributes that we wish to
appear in the result as subscript to π.The argument relation follows in parentheses.

Example:
1.List all loan numbers and the amount of the loan .The corresponding query is
π loan-number,amount(loan)
The relation that results from this query is

Loan-number Amount
L-17 1000
L-23 2000
L-15 1500
L-14 1500
L-93 500
L-11 900
L-16 1300

The set difference operation

The set-difference operation, denoted by -, allows us to find tuples that are in one relation
but are not in another. The expression r – s results in a relation containing those tuples in r
but not in s.

Example:
1.Find all customers of the bank who have an account but not a loan
π customer-name (depositor) – πcustomer-name (borrower)
The result will be

Customer-name
Johnson
Turner
Lindsay

For a set difference operation r-s to be valid, we require that the relations r and s be of the
same arity, and that the domains of the ith attribute of r and the ith attribute of s be the same.

The cartesian – product operation

The Cartesian-product operation, denoted by a cross (X), allows us to combine


information from any two relations. We write the Cartesian product of relations r1 and
r2 as r1 X r2. Since the same attribute name may appear in both r1 and r2, we need to
devise a naming schema to distinguish between these attributes. We do so here by
attaching to an attribute the name of the relation from which the attribute originally
came. For example, the relation schema for r = borrower X loan is

32
(borrower.customer-name,borrower.loan-number,loan.branch-name,loan.loan-
number,loan.amount)
So now we can distinguish borrower.loan-number from loan.loan-number.For those
attributes that appear in only one of the two schemas,we shall usually drop the relation-
name prefix.We can wrte the relation schema for r as
(customer-name,borrower.loan-number,branch-name,loan.loan-number,amount)
This above naming convention requires that the relations that are arguments of the
Cartesian-product operation have distinct names.

Assume that we have n1 tuples in borrower and n2 tuples in loan. Then, there are n1 * n2
ways of choosing a pair of tuples –one tuple from each relation; so there are n1*n2 tuples in
r. In particular ,note that for some tuples t in r,it may be that t[borrower. loan-number] not
equal to t[loan.loan-number].
In general ,if we have relations r1(R1) and r2(R2),then r1 X r2 is a realtion whose
schema is the concatenation of R1 and R2.Relation R contains all tuples t for which there is
a tuple t1 in r1,and t2 in r2 for which t[R1]=t1[R1] and t[R2]=T2[R2].

For example

1.if we want to find the names of all customers who have a loan at the Perryridge
branch.We need the information in both the loan relation and the borrower relation to do
so.If we write
σ branch-name=”Perryridge”(borrower X loan)

Customer-name Borrower.loan- Branch-name Loan.loan-number


Amount
number
Jones L-17 Downtown L-17 1000
Jones L-17 Redwood L-23 2000
……. ……. ……. …….. …..
……. ……. …… ……. …..
……. ……. …… ……. …..
Adams L-16 Round hill L-11 900
Adams L-16 Perryridge L-16 1300
Table:Result of borrower X loan

Now the output of the query stated above will be as

Customer-name Loan-number Branch-name Loan-number Amount


Jones L-17 Perryridge L-15 1500
Jones L-17 Perryridge L-16 1300
Smith L-23 Perryridge L-15 1500
Smith L-23 Perryridge L-15 1300
Hayes L-15 Perryridge L-15 1500
Hayes L-15 Perryridge L-16 1300
Jackson L-14 Perryridge L-15 1500
Jackson L-14 Perryridge L-16 1300

33
Curry L-93 Perryridge L-15 1500
Curry L-93 Perryridge L-16 1300
Smith L-11 Perryridge L-15 1500
Smith L-11 Perryridge L-16 1300
Williams L-17 Perryridge L-15 1500
Williams L-17 Perryridge L-16 1300
Adams L-16 Perryridge L-15 1500
Adams L-16 Perryridge L-16 1300
Table:result of query σ branch-name=”Perryridge”(borrower X loan)

The relation describes the details relating to perryridge branch alone.But there is a
chance that many customers may not have a loan at perryridge branch.So the query
can be re-written as
σ borrower.loan-number=loan.loan-number
(σ branch-name=”Perryridge”(borrower X loan))
In order to retrieve only the customer-name ,we vcan have the projection operation as
π customer-name (σ borrower.loan-number = loan.loan-number
(σ branch-name=”Perryridge”(borrower X loan)

The result is as shown below

Customer-name
Hayes
Adams
Table:Result of π customer-name (σ borrower.loan-number = loan.loan-number
(σ branch-name=”Perryridge”(borrower X loan)

The rename operation

Unlike relations in the database, the results of relational-algebra expressions do


not have a name that we can use to refer to them. It is useful to be able to give them
names; the rename operator, denoted by the lower-case Greek letter rho (ρ ), lets us
perform this task.

Given a relational-algebra expression E, the expression


ρ x(E)
returns the result of expression E under the name x.

A relation r by itself is considered to be a trivial relational-algebra expression. Thus,


we can also apply the rename operation to a relation r to get the same relation under a
new name.

34
A second form of the rename operation is as follows. Assume that a relational-
algebra expression E has arity n. Then the expression
ρ x(A1,A2,.....An)(E)
returns the result of expression E under the name x,and with the attributes renamed to
A1,A2,.....An.

For example,

1.Find the largest balance in the bank


Steps invloved are
• Compute first the relation consisting of those balances that are not the
largest
• The take the set difference between the relation balance(account)
• Then comes the temporary relation

The corresponding queries are


π account.balance( σ account.balance < d.balance(account Xρ d (account)))

This expression gives those balances in the account relation for which a
larger balance appears somewhere in the account relation(renamed as
d).The result contains all balances except the largest one.
The relation is

Balance
500
700
400
350
750

The query to find the largest account balance in the bank can be written as follows:
π balance(account) –
π account.balance (σ account.balance <d.balance(account X ρ d (account)))
the result of this query is

Balance
900
Fig: largest account balance in the bank

2.Find the names of all customers who live on the same street and in the same city as Smith
The street and city of smith can be obtained by writing as
π customer-street,customer-city(σ customer-name=”Smith”(customer))

35
In order to find other customers with this street and city, we must reference the
customer relation a second time. In the following query, we use the rename operation
on the preceding expression to give its result the name smith-addr, and to rename its
attributes to street and city, instead of customer-street and customer-city:

π customer.customer-name
(σ customer.customer-street=smith-addr.street^customer.customer-city=smith-addr.city
(customer X ρ smith-addr(street,city)
(π customer-street,customer-city(σ customer-name=”Smith”(customer)))))

The result of this query is as shown below

Customer-name
Smith
curry

Additional operations or special relational operations

1.The set-intersection operation


The symbol used to identify is ∩ .

Example:
1.Find all customers who have both a loan and an account.
Query is
π customer-name(borrower) ∩ π customer-name(depositor)
The result will be

Customer-name
Hayes
Jones
Smith
Table: customers with both an account and a loan at the bank

The intersection operation can be replaced using the set difference operation as
r ∩ s =r-(r-s)

The Union operation

With the help of this operation we can choose the details which are present in either of
two relations.

For example:

36
1.Find the names of all bank customers who have either an accoubt or a loan or both.
The customer relaion does not contain the information ,since a customer does not need
to have either an account or a loan at the bank.And to answer this query we need the
information in the depositor relation and in the borrower relation .
*To find the customers with loan at the bank we use
π customer-name(borrower)
*To find the names of all customers with an account in the bank:
π customer_name(depositor)
To find both account and loan holding customers we need to union these two as
π Customer-name(borrower) ∪ π customer-name(depositor)
The result of this query is
Customer-name
Johnson
Smith
Hayes
Turner
Jones
Londsay
Jackson
Curry
Williams
Adams

For union operation r U s to be valid, we require two conditions:

1.The relations r and s must be of the same arity. That is, they must have the same
number of attributes.
2.The domain of the ith attribute of r and the ith attribute of s must be the same, for all i.
Where r and s can be, in general temporary relations that are the result of relational-
algebra expressions.

The natural-join operation

It is often desirable to simplify certain queries that require a Cartesian product. A


query that involves a Cartesian product includes a selection operation on the result
of the Cartesian product.

Assume:
Find the names of all customers who have a loan at the bank, and find the
amount of the loan.
Steps :
1.Form the Cartesian product of the borrower and loan relations.
2.Select those tuples that pertain to only the same loan-number.
3.Project the customer-name,loan-number and amount.
π customer-name,loan.loan-number,amount

37
(σ borrower.loan-number=loan.loan-number(borrower X loan))

The natural join is a binary operation that allows us to combine certain selections and a
Cartesian product into one operation. It is denoted by the “join” symbol ⋈.The natural-join
operation forms a Cartesian product of its two arguments, performs a selection forcing
equality on those attributes that appear in both relation schemas, and finally removes
duplicate attributes.

For example:
1.Find the names of all customers who have a loan at the bank, and find the amount of
the loan.
π customer-name,loan-number,amount(borrower⋈ loan)
The result of the query is

Customer-name Loan-number Amount


Jones L-17 1000
Smith L-23 2000
Hayes L-15 1500
Jackson L-14 1500
Curry L-93 500
Smith L-11 900
Williams L-17 1000
Adams L-16 1300

2.find names of all branches with customers who have an account in the bank and who
live in Harrison
π branch-name(σ customer-city=”Harrison”(customer ⋈ account ⋈ depositor))
The result of the query is
Branch-name
Brighton
Perryridge

The division operation

The division operation, denoted by, is suited to queries that include the phrase “for all”.

Example:
1.Find all customers who have an account at all the branches located in Brooklyn.
Steps:
1.All branches in Brooklyn can be obtained as
r1= π branch-name(σ branch-city=”Brooklyn”(branch))

The result is
Branch-name
Brighton
Downtown

38
We can find all (customer-name,branch-name) pairs for which the customer has an
account at a branch by writing
r2=π customer-name,branch-name(depositor⋈ account)

Customer-name Branch-name
Johnson Downtown
Smith Mianus
Hayes Perryridge
Turner Round hill
Williams Perryridge
Lindsay Redwood
Johnson Brighton
Jones Brighton
Table:Result of π customer-name,branch-name(depositor⋈ account)

Our question is to find those customers who appear in r2 with every branch name
in r1.We formulate the query by writing
π customer-name,branch-name(depositor ⋈ account)
⊹ π Branch-name( σ branch-city=”Brooklyn”(branch))

Extended relational-algebra operations

The basic relational-algebra expressions have been extended in several ways.


A simple extension is to allow arithmetic operations as part of projection. An
important extension is to allow aggregate operations, such as computing the sum
of the elements of a set, or their average. Another important extension is the
outer-join operation, which allows relational-algebra expressions to deal null
values, which model missing information.

Generalized Projection
The generalized projection operation extends the projection operation
by allowing arithmetic functions to be used in the projection list. The
generalized projection has the form
π F1,F2,……Fn(E)
Where E is any relational-algebra expression, and each F1, F2,…Fn are
arithmetic expressions involving constants and attributes in the schema of
E.As a special case, the arithmetic expression may be simply an arithmetic or
a constant. The following example demonstrates the basis for the use of the
generalized projection operation. Suppose we have a relation credit-info, as
shown, which lists the credit limit and expenses so far .If we want to find how
much more each person can spend, we can write the following expression:

39
π customer-name,limit - credit-balance(credit-info)

Customer-name Limit Credit-balance

Jones 6000 700


Smith 2000 400
Hayes 1500 1500
Curry 2000 1750

Table:The credit-info relation

Customer-name Limit-credit_balance

Jones 5300
Smith 1600
Hayes 0
Curry 250
The result of π customer-name, limit - credit-balance (credit-info)

Outer join

The outer-join operation is an extension of the join operation to deal with missing
information.

Aggregate functions

Aggregate functions are functions that take a collection of values and return a
single value as a result. For example, the aggregate function sum takes a collection of
values and returns the sum of the values.

The function sum applied on the collection


<1,1,3,4,4,11>
returns the value 24.

The function avg returns the average of the values. So average of the above is 4.

The function count returns the number of the elements in the collection and would
return 6 on the preceding collection.

40
The functions min and max, returns the minimum and maximum values in a
collection; they return 1 and 11.

Examples:

1.Find out the total sum of salaries of all part-time employees in the bank.

The query is
Sum salary (pt-works)
The result of this query is a relation with a single attribute, containing a single row
with a numerical value corresponding to the sum of all the salaries of all employees
working part-time in the bank.

Refer for further details of aggregate functions in the text

1.Database system concepts


-Abraham Silberschatz,Henry K.Forth

2.Refer ‘An introductin to database systems’ –chapter 4


-Bipin P.Desai
for relational approach.

Short questions:

1.What is relational approach.


2.What is relational algebra.
3.Write the definition for relational algebra.
4.What are the fundamental operations of relational algebra.
5.What is entity, relation, entity set, relaionship, relationship set, attribute.
6.Briefly explain mapping cardinalities.
7.Draw the entity relationship diagram for banking enterprise.
8.Explain selection and projection operation with example.
9.Explain aggregate functions in brief.
10.Explain set operations.
11.Explain binary, unary, ternary and n-ary relations.
12.What are the various symbols used in entity relationship diagram.
13.What is constraint?
14.Write note on integrity rules.
15.What is a key?

Elaborate questions:

1.Write the definition for key and explain various keys with example.
2.Explain the structure of relational databases with example.
3.Explain referential integrity constraint or rule, with example.

41
4.Explain all fundamental operations of relational algebra or traditional set
operations with example.
5.Write all aggregate functions and explain in detail with example.
6.What is extended relational operations and explain all the available operations.

STUDY MATERIAL

Course :B.Com CA
Semester:III

Subject :Data Base Management System

Unit :Three

_______________________________________________________________________

Unit III
Syllabus
Embedded SQL:Introduction –operators not involving cursors, involving cursors-
Dynamic statements. Query by example-retrieval operations, builtin-functions,
update operations, QBE Dictionary.Normalization: Functional Dependency, First,
Second, third normal formd, relations with more than one candidate key, good and
bad decomposition.

Books for Reference:

An introduction to database system - C.J.Date

Database system Concepts - Abraham silberschatz, Henry F.Korth,


S.Sudharsan

Principles of database system -Aho D.Ullman

Embedded SQL

SQL provides a powerful declarative query language; writing queries in


SQL are typically much easier than is coding the same queries in a general-purpose

42
programming language. To access a database from a general-purpose programming
language is for the following two reasons.
1.Not all queries can be expressed in SQL, since SQL does not provide the full
expressive power of a general-purpose language. That is, there exists queries that can be
expressed in a language such as Pascal, C, COBOL or FORTRAN that cannot be
expressed in SQL write queries, we can embed SQL within a more powerful language
2.Nondeclarative actions-such as printing a report, interacting with a user, or sending
the results of a query to a graphical user interface-cannot be done from within SQL.
A language in which SQL queries are embedded is referred to as host language,
and the SQL structures permitted I the host language constitute embedded SQL.
Languages such as PL/I however are not well equipped to handle more that one
record at a time. It is therefore necessary t provide some form of bridge between the two
functional levels and embedded SQL provides such a bridge by means of a new type of
object called a cursor.

Operations not involving cursors


The DML statements that do not need cursors are as follows:

 “Singleton SELECT”
 UPDATE
 INSERT
 DELETE

Singleton SELECT

We use the term “singleton SELECT “ to mean statement for which the
retrieved table contains at most one row.
Example: SELECT statement

UPDATE

This statement can be executed to have changes in the databases


designed.
Example: UPDATE, statement of SQL.

INSERT

This statement is used to include new row or information.


Example: INSERT, statement of SQL.

DELETE

43
This is used to delete information from the database.
Example: DELETE, statement of SQL.

Operations involving cursors


Consider the case of a SELECT that selects a whole set of records, not just
one. What is needed is a mechanism for accessing the records in the set one
by one; and cursors provide such a mechanism. Explicitly defined cursors
are constructs that enable the user to name an area of memory to hold a
specific statement for access at a later time.
The programmer to process a multiple-row active set one record at a time
defines explicit cursors. The following are steps for using explicitly defined
cursors within PL/SQL.

1.Declare the cursor


* Name the cursor
* Each cursor associates a query with cursor

Syntax
Declare cursor-name is select statement
Example
Declare c_names is select branch_name from branch where
branch_city=’Brooklyn’;

2.Open the cursor


Opening the cursor activates the query and identifies the active set.
Open also initializes the cursor pointer to just before the first row of the
active set.

Syntax
Open cursor-name;

3.Fetching the cursor

Getting data into the cursor is accompolished with the fetch


command.The fetch command retrieves the rows in the cursor set one row at
a time.

Syntax
Fetch cursor-name into record-list;

4.Closing the cursor

44
The close statement closes or deactivates the previously opened cursor
and makes the active set undefined oracle will implicitly close a cursor when
the user’s program or see\ssion is terminated.After a cursor is closed ,we
cannot perform any operation on it.

Syntax
Close cursor-name;

Attributes involved in cursors

 %ISOPEN returns TRUE if the cursor is already OPEN


 %FOUND returns TRUE if the last FETCH returned a row, and
returns FALSE if the last FETCH
failed to return a row.
 %NOTFOUND is the logical opposite of %FOUND.
 %ROWCOUNT yields the number of rows fetched.

Example to illustrate cursor


1) Declare
Cursor c4 is select salary,job from emp where job=’CLERK’;
Begin
if c4%isopen then
dbms.output.put_line(‘This message will not be displayed’);
else
open c4;
dbms.output.put_line(‘Cursor not found’);
end if;
close c4;
end;

2) The procedure to update students information by finding the total and


average.

Declare
st stu%rowtype;
cursor c1 is select * from stu;
Begin
Open c1;
loop;
fetch c1 into st;
exit when c1%notfound;
st.tot1l:=st.m1+st.m2+st.m3;
st.average:=st.total/3;
if st.m1>=50 and st.m2>=50 and st.m3>=50 then
st.result:=’PASS’;

45
else
st.result:=’FAIL’;
end if;
update stu set
total=st.total,average=st.average,result=st.result where regno=st.regno;
end loop;
commit;
end;

Dynamic Statements

Embedded SQL provides certain features to facilitate the writing of


on-line application programs that is programs to support on-line access to the
database from an end-user at the terminal. Steps involved are

1.accept a command from the terminal


2.analyze the command
3.issue appropriate SQL statements
4.return a message and/or results to the terminal

The precompiler is a compiler for the SQL language. Suppose the application
programs have written a program P that includes some embedded SQL statements.

Pre-compilation proceeds as follows.

 The precompiler scans the source program P and locates the embedded
SQL statements.
 For each statement it finds the precompiler decides on a strategy for
implementing that statements in terms of RSI operations. This process is
referred to as optimization
 The precompiler replaces each of the original embedded SQL statements
by an ordinary PL/I statement

The dynamic SQL component of SQL-92 allows programs to construct and


submit SQL queries at run-time. In case of embedded SQL, each statement
must be completely present at compile time, and are compiled by the
embedded SQL preprocessor.
Using dynamic SQL, programs can create SQL queries as strings at run-time
(based on i/p from the user) and can either have them executed immediately,
or have them prepared for subsequent use.
The two principal dynamic statements are PREPARE and EXECUTE.

DCL SQLSOURCE CHAR (256);

SQLSOUCE =’DELETE FROM BRANCH WHERE


BRANCH_CITY=’PERRYRIDGE’;

46
$PREPARE SQLOBJ FROM SQLSOURCE:
$EXECUTE SQLOBJ:

The PREPARE statement passes the SQLSOURCE string to the RDS


precompiler which goes through its normal process of parsing, optimization,
code generation and builds a machine language versions of the statement
called SQLOBJ.EXECUTE statement causes this machine language routine to
be executed and thus causes the actual deletions to occur.
Once PREPAREd ,a given dynamically generated SQL statement can be
EXECUTED many times. The generated statement can be replaced by
another by issuing PREPARE again with the same target and a different
source.

QUERY-BY-EXAMPLE

Query-by-example (QBE) is the name of both a data-manipulation


language and the database system that included this language. The QBE
database system was developed at IBM T.J.Watson Research center in the
early 1970s.Today,some-database systems for personal computers support
variants of QBE languages. It has two distinctive features:
1.Unlike most query languages and programming languages, QBE has a
two-dimensional syntax: Queries look like tables. A query in one-dimensional
language can be written in a one line. A two-dimensional language requires
two dimensions for its expression.
2.QBE queries are expressed “by example”. Instead of giving a procedure for
obtaining the desired answer, the user gives an example of what is desired. The
system generalizes this example to compute the answer to the query.

We express queries in QBE using skeleton tables. These tables


show the relation schema as shown below.

Example the representation of branch relation

Branch Branch Branch city assets


name

Retreival operations

47
Queries on One relation

Examples:

1:Find all loan numbers at the Perryridge branch

Loan Branch- Loan- Amount


name numb
er
Perryridge P._x

The proceeding query causes the system to look for tuples in loan
that have “perryridge” as the value for the branch-name attribute. For each
such tuple the value of the loan-number attribute is assigned to the variable
x. The value of the variable x is “printed”, because the command P. appears
in the loan-number coloumn next to the variable x.QBE assumes that a blank
position in a row contains unique variable.As a result,if a variable does not
appear more than once in a query,it may be omitted.

Thus the previous query can be re-written as

Loan branch-name loan-number amount


Perryridge P.

QBE performs duplicate elimination automatically.To suppress the


duplicate elimination,we insert the command ALL. After the P.
command:

Loan branch-name loan-number amount


Perryridge P.ALL

To display the entire loan relation ,we can create a single row consisting of P. in
every field.

Loan branch-name loan-number amount


P.

QBE allows queries that involve arithmetic comparisons

Example

1.Find the loan numbers of all loans with a loan amount of more than $700.

48
Loan Branch-name Loan-no. Amount
P.>700

The arithmetic operations that QBE supports are =,<,≤,≥ and ¬

2.Find the names of all branches that are not located in Brooklyn.

Branch Branch-name Branch-city Assets

P. ¬Brooklyn

3.Find the loan-no. of all loans made jointly to Smith and Jones.

Borrower Customer-name Loan-no.


‘Smith’ P._x
‘Jones’ _x

4.Find the loan numbers of all loans made to smith ,to Jones or to both
jointly.

Borrower customer-name loan-no.


‘Smith’ P._x
‘Jones’ P._y

5.Find all customers who live in the same city as Jones.

Customer Customer-name Customer-street Customer-city


P._x _y
Jones _y

Queries on several relations

QBE allows queries that span several different relations. The


connections among the various relations are achieved through variables that
force certain tuples to have the same value on certain attributes.

Example

1.Find the names of all customers who have a loan from the ‘perryridge’
branch..

49
loan branch_name loan_no. amount

perryridge _x

borrower cust_name loan_no.

P._x _x

2.Find the names of all customers who have both an account and a loan at
the bank.

Depositor customer-name account-no.

P._x

Borrower customer-name account-no.

_x

3.Find the names of all customers who have an account at the bank ,but who
have a loan from the bank.

Depositor customer-name account-no.


P._x

Borrower customer-name loan-no.


_x

4.Find all customers who have atleast two account.

Depositor customer-name account-no.

P._x _y
x y

The condition box

It is not convenient to express all the constraints on the domain


variables within the skeleton tables. To overcome this QBE includes a
condition box feature that allows the expression of general constraints over
any of the domain variables.

50
Example:

1:Find all customers who are not named ‘Jones’ and who atleast two
account.

Depositor customer-name account-no.

P._x _y
x  y

Conditions

-Y>_z

2.Find all account-no. with a balance between $1300 and $1500 ,we write

acc-no. branch-name acc-no. balance


P. _x

Conditions

_x.≥1300
_x≤1500

3.Find all branches that have assests greater than those of atleast one branch
loacated in ‘Brooklyn’.

Branch branch-name branch-city assets

P._x _y
Brooklyn _x

Conditions

_Y >_z

51
Options available with condition Box

1.QBE allows complex arithmetic expressions to appear in a condition


box.
Example:
Find all branches that have assets that are atleast twice as large as the assets
of one of the branches located in Brooklyn.

Branch branch-name branch-city assets

P._x _y
Brooklyn _x

2.QBE allows logical expressions to appear in condition box.Operators used


are and( & ),or( | )

Example

Find all account numbers with a balance between $1300 and $2000 but not
exactly $1500.

Account branch-name account-no. balance


P. _x

Conditions

_x=( ≥1300 and ≤2000 and


┐1500)

The result relation

If the result of a query includes attributes from several relation schemas,


we need a mechanism to display the desired result in a single table.
Example
1.Find the customer-name, account-no. and balance for all accounts at the
perryridge branch
In relational algebra

52
1.Join depositor and account relation
2.project customer-name, account-no. and balance
QBE related with this.

1.Create a skeleton table called result with attributes customer-name,


account-no. and balance.

Account branch-name account-no. Balance

Perryridge _y _z

Depositor customer-name account-no.

_x _y

Result customer-name account-no. Balance

P. _x _y _z
Ordering of the display of tuples

By using the command AO. And DO. we can order the contents.

Example

1.List all customers in descending alphabetical order.

Depositor customer-name account-no.

P.DO.

Aggregate functions[Built-in
functions]
QBE includes the aggregate operators AVG, MAX, MIN, SUM and
CNT.we must postfix these operators with ALL. to create a multiset on which
the aggregate operation is evaluated.

53
Example

1.Find the total balance of all the account maintained at the perryridge
branch.

Account branch-name account-no. balance

Perryridge P.SUM
ALL.

2.Find the total no. of customers who have an account at the bank.

Depositor customer-name account-no.

P.CNT.UNQ.ALL.

3.Find the name,street and city of all customers who have more than one
account at the bank.

Customer cust-name cust-street cust-city

P. _x

Depositor Cust-name Account-No.

G._x CNT.ALL._y

Conditions

CNT.ALL._y > 1

Update operations/Modification
of the database
This section deals with the options how to add, remove or change
information using QBE.

Deletion
Deletion of tuples from a relation is expressed in much the same way as a query. The major difference is the use of
D. in the place of P..In QBE we can delete whole tuples, as well as values in selected coloumns. To delete information in
only some of the columns, null values, specified by-are inserted.

D. Operates on only one relation. To delete tuples from several


relations, we must use one D. operator for each relation.

54
*Delete customer smith

customer cust_name cust_street cust_city


D. Smith

*Delete the branch-city value of the branch whose name is “Perryridge”.

Branch branch-name branch-city asstes

Perryridge D.

*Delete all loans with a loan amount between $1300 and $1500

Loan Branch-name loan-no. amount


D. _y _x

Borrower cust_name loan_no.


D. _y

Condition

_x=(>=1300 and <= 1500)

*Delete all accounts at all branches located in Brooklyn.

Account branch_name account_no. balance

D. _x _y

Depositor cust_name acc_no.

D. _y

branch branch_name branch_city assets

55
_x Brooklyn

Insertion

We do the insertion by placing the I. Operator in the query


expression.The attribute values for inserted tuplles must be members of the
attributes domain

Example

*To insert into the branch relation information about a new branch with
name “Capital” and city “Queens”,but with a null asset value,we write

branch branch_name branch_city assets


I. Capital Queens

*To insert the account A-9732 at the Perryridge branch has a balance of
$700.

Account branch-name account_no. balance


I. Perryridge A-9732 700

Updates

If we want to changeone value in a tuple withput changing all values


in the tuple we use the update facility and the operartor used is U. .QBE
allows users to update the primary key fields.

• Update the asset value of the Perryridge branch to $10,000,000

Branch branch-name branch-city assets

Perryridge U. 100000000

The query updates the assets of the Perryrigde branch to


$10,000,000 regardless of the old values.If we want to update a value
using the previous vaulue ,we must express a request using two

56
rows:One specifying the old tuples that need to be updated,and the
other indicating the new updated tuples to be inserted in the database

• The interesty payments are being made,and all branches are to be


increased by 5%.

Account branch-name account-no. balance

U. _x * 1.05
_x.

QBE Dictionary

QBE has a built-in dictionary that is represented to the user as a collection of


tables. The dictionary include for example, a TABLE and a DOMAIN table, giving
details of all tables and all domains currently known to the system. The dictionary
tables can be interrogated using the ordinary retrieval operations of the DML.

Retrieval of table-names

Get the names of all tables known to the system.

P.

Instead of having to build a skeleton for the TABLE table and entering “P.”
in the NAME column of that skeleton, the user can formulate this query by simply
entering the “P.” in the table-name position of the blank table.

Retrieval of column-name for a given table

Get names of all columns in table S.


S P.

User enters the table-name (S) followed by “P.” against the row of (blank) column-
names.

Creation of a new table

57
1.Create table branch

I. branch I. Branch name branch city branch street

The first I. Creates a dictionary entry for table branch; the 2 nd I. Creates
dictionary entries for the four columns of the table branch. Also the information for
each column must be specified .The information includes the name of the underlying
domain; the data-type of the domain; if that domain is not already known to QBE.

Dropping a table

Drop table branch.

A table can be dropped only if it is currently empty.

1)Delete all branch details

branch branch name branch city branch street

D.

2)Drop the table

D. Branch branch name branch city branch street

Expanding a table

Add a asset coloumn to the table branch.

QBE does not directly support the dynamic addition of a new column to an
existing table is currently empty.

So the following steps should be followed.

1) Define a new table the same shape as the existing table plus the new column.
2) Load the new table from the old using a multiple-record insert.
3) Delete all data from the old table.

58
4) Drop the old table.
5) Change the name of the new table to that of the old table.

Normalization

Introduction

Normalization theory is build around the concept of normal forms. A relation is said to
be in a particular normal form if it satisfies a certain specified set of constraints. For
example, a relation is said to be in first normal form if and only if it satisfies the
constraint that it contains atomic values only. Various normal forms are First Normal
Form, Second Normal Form, Third Normal Form, DKNF, and BCNF etc. Concept of
normalization arises in the case to design a relational-database without unnecessary
redundancy, easy way of retrieval etc…So if we want to design such a database we go
for normalization.

For the description of normalization, we shall consider the supplier-and-parts


database. The database or relation is as follows:

P# Color Weight City


PART---P Pname
P1 Nut Red 12 London
P2 Bolt Green 17 Paris
P3 Screw Blue 17 Rome
P4 Screw Red 14 London
P5 Cam Blue 12 Paris
P6 Cog Red 19 London
S# P# QTY S# Sname Status City
S1 P1 300 S1 Smith 20 London
S1 P2 200 SP------ S2 Jones 10 Paris
S1 P3 400 S3 Blake 30 Paris
S1 P4 200 S4 Clark 20 London
S1 P5 100 S5 Adams 30 Athens
S1 P6 100
S2 P1 300
S2 P2 400
S3 P2 200
S4 P2 200
S4 P4 300
S4 P5 400

59
FIG:1

Functional Dependency

Definition:

Given a relation R, attribute Y of R is functionally dependent on attribute


X of R if and only if each X-value in R has associated with it precisely one Y-
value in R.

In the supplier-and-parts database the attributes SNAME, STATUS and


CITY of a relation S are each functionally dependent on attribute S#. For a
particular value for S# there exists precisely one corresponding value for each
of SNAME, STATUS and CITY.

S.S#  S.SNAME
S.S#  S.STATUS
S.S# S.CITY
Or we can say represent as
S.S#S. (SNAME, STATUS, CITY)

The statement S.S#S.CITY is read as “attribute S.CITY is functionally


dependent on attribute S.S#”, or “attribute S.S# functionally determines
attribute S.CITY”.

Alternate definition for functional dependence

Given a relation R, attribute Y of R is functionally dependent on


attribute X of R if and only if, whenever two tuples of R agree on their X-
value, they also agree on their Y-value.

S# P# Qty Status
S1 P1 300 20
S1 P2 200 20
S1 P3 400 20
S1 P4 100 20

Fig: Partial tabulation of relation SP’.

For example in this relation SP’

60
SP’.S#SP’.STATUS

A functional dependence is a special form of integrity constraint. For


example, if a relation S satisfies the FD S.S#S.CITY then we say that every
legal extension of that relation satisfies that constraint.
It is convenient to represent the FDs in a given set of relations by means of a
functional dependency diagram.

Example:

PNAME
S# STATUS
S#
COLOR
P#
QTY
WEIGHT
SNAME CITY P#
CITY

Fig: Functional dependencies in relations S, P, SP.

Various Normal Forms

Brief description of Normal forms

First Normal Form

 Eliminates repetition of data that is converts each data value to


its atomic form
 No two rows should be identical
 Each table entry should be single valued
 Every table has a primary key, which is a unique label or
identifier for each row

Second Normal Form

 Requires taking out data that is only dependent on a part of


the key
 Each non-key attribute is functionally dependent on the entire
key

Third Normal form

61
 Involves getting rid of anything in the tables that does not
depend solely on the primary key
 3NF is sometimes characterized as “the key, the whole key, and
nothing but the key”

First Normal Form


Definition:

A relation R is in first normal form(1NF) if and only if all underlying


domain contain atomic values only.
A relation that is only in first normal form has a structure that is undesirable for a number of reasons.

For example:
Let us assume that information concerning suppliers and shipments, rather than being split into two separate
relations (S and SP) is combined into a single relation and let the name be FIRST with fields (S#, STATUS, CITY, P#,
QTY).

Where S# represents the supplier number, STATUS represents the supply details, CITY represents the city where
the supply has been made P# represents the Part number, QTY represents the quantity of supply.

Here the constraint is STATUS is functionally dependent on CITY. That is the meaning of this constraint is that a
supplier’s status is determined by the corresponding location: e.g., all LONDON suppliers must have a status of 20.Also
we ignore the attribute SNAME for simplicity The primary key of FIRST is the combination of (S#, P#). The following is
the functional dependency diagram for this relation

S# STATUS

QTY
P# CITY

Fig: Functional dependencies in the relation FIRST

In the diagram

62
i) STATUS and CITY are not functionally dependent on the primary key.
ii) STATUS and CITY are not mutually dependent.

Certain difficulties of the FIRST relation occurs while UPDATION.They are explained as

Insert: We cannot enter the fact that a particular supplier is located in a particular city until that supplier supplies at
least one part. The following is the tabulation of FIRST.

S STATUS CITY P# QTY


#
S 20 London P1 300
1
20 London P2 200
S
1 202 London P3 400

S 20 London P4 200
1 20 London P5 100
S 20 London P6 100
1
10 Paris P1 300
S
1 10 Paris P2 400
S 10 Paris P2 200
1
20 London P2 200
S
2 20 London P4 300

S 20 London P5 400
2
S
3
S
4
S
4
S
4
Table: FIRST

The FIRST relation does not show that supplier S% is located in ATHENS. Because until S5 supplies some part, we have
not appropriate primary key value.
Deletion: If we delete the only FIRST tuple for a particular supplier, we destroy not only the shipment connecting that
supplier to some part but also the information that the supplier is located in a particular city.
For example if we delete the FIRST tuple with S# value S# and P# value P2, we lose the information that S3 is located
in Paris.

Updation: the city value for a given supplier appears in FIRST many times, this redundancy causes update problems.

For example, if supplier S1 moves from London to Amsterdam then the two difficulties occurs. They are
Searching the FIRST relation to find every tuple connecting S1 and London and this produces an inconsistent result. The
solution to these problems is to replace the relation FIRST by the two relations SECOND (S#, STATUS, CITY) and SP
(S#, P#, QTY). The functional dependency diagrams for these two relations are as shown here.

63
STATUS S#
CITY
S#
CITY P#

Fig:Functional dependencies in the relation SECOND and SP.

The following tables shows the sample tabulations corresponding to the data values of FIG:1 except the information for
supplier S5 has been included in SECOND and not in SP.

SECOND
S# Status City
S1 20 London
S2 10 Paris
S3 10 Paris
S4 20 London
S5 30 Athens

SP
S# P# QTY
S1 P1 300
S1 P2 200
S1 P3 400
S1 P4 200
S1 P5 100
S1 P6 100
S2 P1 300
S2 P2 400
S3 P2 200
S4 P2 200
S4 P4 300
S4 P5 400

Fig: Sample tabulations of SECOND and SP.


After building the tables as shown we overcome the difficulties of FIRST relation. Now we can easily do the
operations on the tables. This is about first normal form.

SECOND NORMAL FORM:

64
DEFINITION: A relation R is in second normal form (2NF) if and only if it
is in 1NF and every nonkey attribute is fully dependent on the primary key.

Relations SECOND and SP are both 2NF (the primary keys are S# and the
combination (S#,P#), respectively). Relation FIRST is not in 2NF. A relation
that is in first normal form and not in second can always be reduced to an
equivalent collection of 2NF relations. The reduction consists of replacing the
relations by suitable projections; the collections of these projections is
equivalent to the original relations, in the sense that the original relation can
always be recovered by taking the natural join of these projections, so no
information is lost in the process. In other words, the process is reversible.
In our example: SECOND and SP relations are projections of FIRST,
and FIRST is the natural join of SECOND and SP over S#.

The reduction of FIRST to the pair (SECOND, SP) is an example


of nonloss decomposition. In general, given a relation R with possibly
composite attributes A, B, C satisfying the FD R.A R.B, R can always be
“nonloss-decomposed” into its projections R1 (A, B) and R2 (A, C).Since no
information is lost in the reduction process, any information that can be
derived from the original structure can also be derived from the new
structure. The converse is not true, however: The new structure may contain
information (such as the fact that S5 is located in Athens) that could not be
represented in the original. In the sense the new structure is a slightly more
faithful reflection of the real world.

The SECOND /SP structure still causes problems, however.


Relation SP is satisfactory; as a matter of fact, relation SP is now in the
normal form, and we shall ignore it for the reminder of this section. Relation
SECOND, on the other hand, still suffers from a lack of mutual
independence among its nonkey attributes. The dependence diagram for
SECOND is still more complex than a 3NF diagram. To be specific, the
dependency of the STATUS on S#, thought it is functional, is transitive (via
CITY): Each S# value determines a CITY value, and this in returns
determines the STATUS value. This transitivity leads, once again, to
difficulties over update operations. (We now concentrate on the association
between cities and status values-ie.,on the functional dependency of STATUS
on CITY .)

INSERTING: We cannot enter the fact that a particular city has a


particular status value-for example, we cannot state that any supplier in
Rome must have a status of 50-until we have some supplier located in that
city. The reason is, again, that until such a supplier exists we have no
appropriate primary key value.
DELETING: If we delete the only SECOND tuple for a particular city, we destroy not only the information for the
supplier concerned but also the information that that the city has that particular status value. For example, if we delete
the SECOND tuple for S5, we lose the information that the status for the Athens is 30.

65
UPDATING:The status value for a given city appears in SECOND many times.Thus,if we need to change the status value
for London from 20 to 30 we are faced with either the problem of searching the SECOND relation to find every tuple for
London or the possibilbity of producing an inconsistent result.

The solution to the problems is to replace the original relation (SECOND) by two projections SC(S#,CITY) and
CS(CITY,STATUS).And the corresponding functional dependency diagram is shown here.

S# CITY CITY STATUS

The tabulations corresponding to these is

S City City Status


#
SC Athens 30
S London
1 CS--- London 20
Paris
S Paris 10
2 Paris

S London
3 Athens
S
4
Fig:2 Sample tabulations of SC and CS.
S
5
It should be clear that this new structure overcomes all the problems over
update operations concerning the CITY-STATUS association.

Third Normal Form

Definition: A relation R is in third normal form (3NF) if and only if is in 2NF and every non-key attribute is non-
transitively dependent on the primary key.

Relations SC and CS (shown in Fig:2)are both 3NF;relation SECOND (shown in page 20)is not in 3NF.A relation
that is not in second normal form and not in third can always be reduced to an equivalent collection of 3NF relations.

Relations with more than one candidate key or BCNF (Boyce-


codd normal form)

Definition:

A relation R is in BCNF if and only if every determinant is a


candidate key.

66
The objective of BCNF is to handle a relation having two or more composite
and overlapping candidate keys. Although BCNF is stronger than 3NF,it is
still true that any relation can be decomposed in a non-less way into an
equivalent collection of BCNF relations.

Relation FIRST consists of three determinants: S#, CITY and the


combination (S#, P#). Among these (S#, P#) alone is a candidate key; hence
FIRST is not in BCNF.

Relation SECOND is also not in BCNF because the determinant


CITY is not a candidate key.

Relations SP, SC and CS are in BCNF because in each case the


primary key is the only determinant in the relation.

Example: involving two disjoint (non-overlapping) candidate keys.


Let us consider relation S (S#, SNAME, STATUS, CITY) .the relation S is
BCNF.However, it is desirable to specify both keys in the definition of the
relation:
a) To inform the DBMS, so that it may enforce the constraints implied
by the two-way dependency between the two keys-namely, that
corresponding to each supplier number there exists a unique supplier name,
and conversely
b) To inform the users, since of course the uniqueness of the two
attributes is an aspect of the semantics of the relation and is therefore of
interest to people using it.

Example -where the candidate keys overlap.


Two candidate keys overlap if they involve two or more attributes
each and have an attribute in common.

1) We suppose that the supplier names are unique, and we consider


the relation SSP (S#, SNAME, P#, QTY). The keys are (S#, P#) and
(SNAME, P#). This is relation is not in BCNF because we have two
determinants# and SNAME, which are not keys for the relation (S#
determines SNAME, and conversely). But the relation is in 3NF if we
consider the definition----A relation R is in 3NF if and only if it is in 2NF
and every non-key attribute is non-transitively dependent on the primary
key. Here in this definition it does not require an attribute to be fully
dependent on the primary key if it was itself a component of some other
key in the relation, and so the fact that SNAME is not fully dependent on
(S#, P#). But this fact leads to redundancy and hence to update problems
in the relation SSP.If we go for updating the name of supplier S from
Smith to Robinson leads either to search problems or to possibly
inconsistent results. The solution to the problems as usual is to decompose

67
the relation SSP into two projections, in this case SS (S#, SNAME) and
SP (S#, P#, QTY) for SP (SNAME,P#,QTY).These projections are both
BCNF.
2) Second example;
Consider the relation SJT with attributes S(student),J(subject) and
T(teacher).The meaning of an SJT tuple is that the specified student is
taught the specified subject by the specified teacher. The semantic rules
follow:
1.Only one teacher teaches each student of thet subject
2.Each teacher teaches only one subject
3.Several tachers teach each subject.
The sample tabulation of this relation is as follows

SJT

S J T
Smith Math Prof.white
Smith Physics Prof.Green
Jones Math Prof.White
Jones Physics Prof.Brown

The functional dependencies of SJT are:


From the first semantic rule we have functional dependency of T on the
composite attributes (S, J).
Form the second semantic rule we have a functional dependency of J on
T.
From the third semantic rule it is understood that there is no functional
dependency of T on J.
So the diagram is as follows

S
T

Fig: Functional dependencies in the relation SJT.

Here again we are having two overlapping candidate keys: the combination (S, J)
and the combination (S, T). Once again the relation is 3NF and not BCNF; and once
again the relation suffers from certain anomalies in connection with update
operations. For example, if we wish to delete the information that Jones is studying

68
physics, we cannot do so without at the same time losing information that professor
Brown teaches physics.
The difficulties are caused by the fact that T is determinant but not a
candidate key. Again we can get over the problem by replacing the original relation
by two BCNF projections, in this case ST (S, T) and T, J (T, J).
Finally we say that the concept of BCNF eliminates certain problem cases
that could occur under the old definition of 3NF.Moreover,BCNF is conceptually
simpler than 3NF,in that it involves no reference to the concepts of primary key,
transitive dependence and full dependence. The reference of candidate keys can also
be replaced by a reference to the more fundamental notion of functional
dependence. The reference to candidate keys can also be replaced by a reference to
the more fundamental notion of functional dependence.

Good and Bad decompositions

During the reduction process it is frequently the case that a given relation
can be decomposed in a variety of different ways. Consider the relation SECOND
(S#, STATUS, CITY) with functional dependencies (FDs).

SECOND.S#SECOND.CITY
SECOND.CITYSECOND.STATUS
And therefore by transitivity
SECOND.S#SECOND.STATUS

The representation of SECOND relation is


PNAME

S# STATUS
COLOR
P#

SNAME CITY WEIGHT

CITY

S# QTY

P#

69
Fig: Functional dependencies in relations S, P, SP

The above diagram clearly states that the update problems encountered with
SECOND could be overcome by replacing it by its decomposition into the two 3NF
projections
SC (S#, CITY) and CS (CITY, STATUS)------------------A
Let this composition be A.
An alternative decomposition is
SC (S#, CITY) and SS (S#, STATUS)---------------------------B
Decomposition B is also nonloss, and the two projections are again
BCNF.But decomposition B is less satisfactory than decomposition A.
For example, it is still not possible (in B) to insert the fact that a particular
city has a particular status value unless supplier is located in that city. The
explanation of this example is as follows:
In decomposition A the two projections are independent of each other, in the
sense that updates can be made to either one without regard for the other; So
joining them will not violate the FD constraints on SECOND.
In decomposition B updates to either of the two projections must be
monitored to ensure that the FD SECOND.CITYSECOND.STATUS is not
violated. Thus projections SC and SS are not independent of each other.
A relation that cannot be decomposed into independent component is said to
be atomic.

Questions:

1.What is embedded SQL?


2.Define QBE.
3.Explain operations involving cursors and not involving cursors.
4.What do you meant by dynamic statements?
5.Explain retrieval operations of QBE.
6.Explain update operations of QBE.
7.Explain built-in functions of QBE.
8.Define Normalization.
9.What are various forms of normalization?
10.What do you meant by QBE dictionary?

70
11.Explain first, second and third normal forms.
12.Explain relations with more than one candidate keys [BCNF].
13.what do you meant by good and bad decomposition?
14.What are QBE-aggregate functions?
15.What is functional dependency?

STUDY MATERIAL

Course: B.Com CA
Subject: Data base management system
Semester:III

Unit: Four

Unit IV
Syllabus
Hierarchical Approach:IMS data structure. Physical database, database description,
Hierarhical sequence. External level of IMS: Logical Databases, the program
communication block. IMS data manipulation: Defining the program communication
block: DL/I Examples.

Books for Reference:

An introduction to database system - C.J.Date

Database system Concepts - Abraham silberschatz, Henry F.Korth,


S.Sudharsan

Principles of database system -Aho D.Ullman

IMS data structure(Information Management System)

A physical database is an ordered set, the elements of which consist of all


occurrences of one type of physical database record(PDBR).A PDBR occurrences in turn
consists of a hierarchical arrangement of fixed-length segment occurrences; and a
segment occurrence consists of a set of associated fixed-length field occurrences.

71
As an example we consider a PDB that contains information about the internal
education system of a large industrial company. The hierarchical structure of this PDB-
that is the PDBR type is shown here

Course
Course# Title Description

Prereq Offering

Course# Title Date Location Format

Teacher
Student

Emp# Name Emp# Name Grade

Fig: PDBR type for the education database.

In this example we are assuming that the company maintains an education


department whose function is to run a number of training courses. Each course is offered
at a number of different locations within the company. The PDB contains details both of
offerings already given and of offerings scheduled to be in the future,. The details are as
follows:

• For each course: course number (unique), course title, course description,
details of prerequisites courses if any, and details of all offerings.
• For each prerequisite course for a given course: course number and title.
• For each offering of a given course: date, location, format, details of all
teachers and details of all students;
• For each teacher of a given offering: employee number and name
• For each student of a given offerings: (EMP_N), name and grade.

In the PDBR structure shown, we have five types of sgments:

COURSE, PREREQ, OFFERING, TEACHER and STUDENT, each one


consisting of the field types indicated.

COURSE is the root segment type and the others are department segment
types. Each dependent has a parent for example the parent of TEACHER is
OFFERING. Similarly each parent has at least one child, for example COURSE
has two children. For one occurrence of any given segment type may be any
number occurrences of each of its child segment types.

72
Course
M23 Dynamics …

Prereq Offering

M19 Calculus 750106 Oslo F2


M16 Trignomentry 751104 Dublin F3
730813 Madrid F3

Student
Teacher 42163 Sharp.R
3 761620 Tallis.T B
183009 Gibbons.O A
102141 Byrd,W B

Fig: Sample PDBR Occurrence for the education database.

The database Description

Each physical database is defined together with its mapping to storage by


a database description (DBD). The source form of the DBD is written using
special System/370 Assembler language macro statements, once written the DBD
is assembled and the object form is stored away in a system library, from which it
may be extracted when required by the IMS control program. So the following is
the DBD for the education database.

1 DBD NAME=EDUCPDBD
2 SEGM NAME=COURSE, BYTES=256
3 FIELD NAME=(COURSE#, SEQ), BYTES=3,START=1
4 FIELD NAME=TITLE, BYTES=33,START=4
5 FIELD NAME=DESCRIPN, BYTES=220,START=37
6 SEGM NAME=PREREQ, PARENT=COURSE, BYTES=36
7 FIELD NAME=(COURSE#, SEQ), BYTES=3,START=1
8 FIELD NAME=TITLE, BYTES=33,START=4
9 SEGM NAME=OFFERING, PARENT=COURSE, BYTES=20
10 FIELD NAME=(DATE, SEQ, M), BYTES=12,START1
11 FIELD NAME=LOCATION, BYTES=12,START=19
12 FIELD NAME=FORMAT, BYTES=2,START=19
13 SEGM NAME=TEACHER,PARENT=OFFERING,BYTES=24
14 FIELD NAME=(EMP#, SEQ), BYTES=6,START=7

73
15 FIELD NAME=NAME, BYTES=18,START=7
16 SEGM NAME=STUDENT,PARENT=OFFERING, BYTES=25
17 FIELD NAME=(EMP#, SEQ), BYTES=18MSTART=7
18 FIELD NAME=NAME, BYTES=18,START=7
19 FIELD NAME=GRADE, BYTES=1,START=25

FIG: DBD for the education PDB.

Explanation

 Statement 1:Assigns the name EDUCPDBD (“education physical database


description”) to the DBD.All the names in IMS are limited to a maximum length
of eight characters.

 Statement 2:Defines the root segment type with the name COURSE and has
totally 256 bytes length.

 Statement 3-5:Defines the field types that go to make up COURSE. Each is given
a name, a length in bytes, and a start position within the segment. The first field,
COURSE# is defined to be the sequence field for the segment. So the PDBR
occurrences will be sequenced in ascending course number order.

 Statement 6:Defines PREREQ as a 36-byte segment and is dependent on


COURSE.

 Statements 7-8:Define the fields of PREREQ.

 Statement 9:Defines OFFERING as a child of COURSE.

 Statements 10-12:Define the fields of OFFERING.DATE are defined as the


sequence field for OFFERING. The specification M (multiple) means that twin
OFERING occurrences may contain the same date value.

 Statements 13-15:Define the TEACHER segment and its fields

 Statements 16-19:Define the STUDENT segment and its fields

The sequence of statements in the DBD is significant. Specifically SEGM


statements must appear in the sequence that reflects the hierarchical structure also
each SEGM statement must be immediately followed by the appropriate FIELD
statements.

Hierarchical Sequence

74
The concept of hierarchical sequence within a database is a very important
one in IMS.The definition for this is as follows:

For each segment occurrence, we define the “hierarchical sequence key


value” to consist of the sequence field value for that segment, prefixed with the type
code for that segment, prefixed with the hierarchical sequence key value of its
parent, if any. For example, the hierarchical sequence key value for the STUDENT
occurrence for “Byrd,W.” is

1M2337308135102141

Here 1 is the type code for COURSE, M23 the course#, 3 is the type code of
OFFERING, 730813 is the DATE of OFFERING, 5 is the type code of
STUDENT, 102141 is the EMP# of STUDENT.

Then the hierarchical sequence for an IMS database is that sequence of segment
occurrences defined by ascending values of the hierarchical sequence key. This
notion is important in case of IMS databases because in IMS databases are stored
in hierarchical sequence.

External Level OF IMS

Logical databases:
In architecture the user’s external view was defined as subset of
the corresponding physical database. A LDB (logical database) is an ordered set,
the elements of which consist of all occurrences of one type of LDBR (logical
database record).An LDBR type is a hierarchical arrangement of segment types,
and is derived from the corresponding PDBR hierarchy in accordance with the
following rules.

• Any segment type of the PDBR hierarchy together with all its dependents can
be omitted from the LDBR hierarchy
• The fields of an LDBR segment type can be a subset of those of the
corresponding PDBR segment type, and can be rearranged within that LDBR
segment type.

Example:

Course
Course# Title Description

Offering

75
Date Location Format

Student
Emp# Name Grade
Fig: Sample LDBR type for the education database.

Sensitive Segments:

The segments, which are present in PDB and is included in LDB are said
to be sensitive segments. In the above example COURSE, STUDENT,
OFFERING are sensitive segments .The user of this LDB will not be aware of the
existence of any other segments.
For example, the DL/I “get next” operation, which in general is used for
sequential retrieval, will simply skip over any segments that are not sensitive for
the user. If the user deletes a sensitive segment all children of that segment will be
deleted regardless of sensitiveness. So the user should not be given the authority
to delete a segment, which allows the deletion of other hidden segments too.
Also sensitive-segment concept protects the user from modification like
addition to the PDB unless it is proved that the addition of new segment may not
affect any existing parent-child relationship.
Also sensitive-segment concept provides a degree of control over data
security, is as much as users can be prevented from accessing particular segment
types by the omission of those segments from the LDB.

Sensitive fields

Sensitive fields are those fields of the PDB that are included in the
LDB.Every sensitive field must be controlled within a sensitive segment A given
LDB may include or exclude any combination of fields from the PDB, in general
except that if the program intends to insert new occurrences of a given segment
type, then it must be “sensitive to” the sequence filed for that segment type.

Field sensitivity, like segment sensitivity, protects the user from certain
types of growth in the database and provides a simple level of data security.

The program communication block (PCB)

Each LDB is defined by a PDB.The PCB includes the specification of the


mapping between the LDB and the corresponding PDB.Like DBD (database
description) a PCB is written using special system/370 assembler language macro
statements. These statements constitute the “external DDL”for IMS.The set of all
PCBs for a given user forms that user’s program specification block (PSB); the

76
object form of the PSB is stored in a system library, from which it may be
extracted when required by the IMS control program.

Example:

1 PCB TYPE=DB,DBNAME=EDUCPDBD,KEYLEN=15
2 SENSEG NAME=COURSE, PROCOPT=G
3 SENSEG NAME=OFFERING,PARENT=COURSE,PROCOPT=G
4 SENSEG NAME=STUDENT,PARENT=OFFERING, PROCOPT=G

Fig: PCB for the LDB

Explanation

 Statement 1:Specifies that this is a PCB database and named as


EDUCPDBD, length of the key feedback area is 15 bytes.

Key Feedback: When the user accesses an LDB, the


corresponding PCB is held in storage and acts, as a
communication area between the user’s program and
IMS.One of the fields in the PCB is the key feedback area.
When the user retrieves a segment from the LDB, IMS not
only fetches the requested segment but also places a “fully
concatenated key” into the key feedback area.
The fully concatenated key consists of the
concatenation of the sequence field values of all segments
in the hierarchical path from the root down to the retrieved
segment.
Fetches the requested segment

For example;
Retrieve the STUDENT occurrence for
Byrd.W.

IMS will place the value M23730813102141


in the key feedback area. The fully concatenated key of a
segment is not quite the same as the “hierarchical sequence
key” as this does not include segment type code
information.

 Statement 2:Specifies the first sensitive segment in the LDB.The


name of the sensitive segment must be same as the name assigned
to the segment in the DBD.
The PROCOPT (processing options”) entry specifies the
types of operation that the user will be permitted to perform

77
on this segment. In this example the entry is G (“get”)
indicating retrieval only.
Other options are I (“insert”), R (“replace”) and D
(“delete”).
 Statement 3:Defines the next sensitive segments in the LDB.
 Statement 4:Defines the last sensitive segments. In our example
statements 3 and 4 are very similar. The PROCOPT entry is the
same for each of the three sensitive segments .In such a situation
we may specify PROCOPT in the PCB statement instead of in
each SENSEG statement.

If PROCOPT=K is specified in the SENSEG statement for


OFFERING, the user may largely ignore the presence of
OFFERINGs in the hierarchy. The output for this
modification is shown as follows.

Course
Course# Title Description

Student
Emp# Name Grade

Fig: Effect of specifying PROCOPT=K for offering

The main difference is that when a STUDENT occurrence is retrieved, the


fully concatenated key in the key feedback area will include the date value
from the parent OFFERING.

The LDB shown in the example figure 1, is sensitive to all fields in


segments COURSE, OFFERING and STUDENT of the underlying
PDB.Suppose if we wish to exclude the LOCATION field of the
OFFERING segment from the LDB while still remaining sensitive still all
other fields as shown here:

SENFLD NAME=FORMAT, START=1


SENFLD NAME=DATE, START=1

These statements specify the fields to be included in the LDB segment and
their start position within that segment. If no SENFLD statement is given for
a particular SENSEG statement, then by default that segment is taken to be
identical to the underlying PDB segment.

78
IMS Data Manipulation

Defining the Program Communication Block (PCB)

The IMS data manipulation language (DL/I) is invoked from the host
language (PL/I) by means of ordinary subroutine calls. When an application
program is operating on a particular logical database (LDB), the PCB for that
LDB is kept in storage to serve as a communication area between the programs
and IMS; infact when the program calls DL/I, it has to quote the storage address
of the appropriate PCB to identify to DL/I which LDB it is to operate on.

PCB address is supplied to the program by IMS when the program is first
entered. what actually happens is this.when a database application is to be run,
IMS is given control first. IMS determines which PSB and DBD(s) are required,
fetches them from their respective libraries and loads them into storage. IMS then
fetches the application program and gives it control, passing it the PCB address as
parameters.

In order for the application program to be able to access the information in


the PCB for a particular LDB, it must contain a definition of that PCB.

DLITPLI: PROCEDURE (COSPCB_ADDR) OPTIONS (MAIN);


.
.
.
Declare 1 COSPCB BASED(COSPCB_ADDR),
2 DBDNAME CHARACTER(8),
2 SEGLEVEL CHARACTER(2),
2 STATUS CHARACTER(2),
2 PROCOPT CHARACTER(4),
2 RESERVED FIXED BINARY(31),
2 SEGNAME CHARACTER(8),
2 KEYFBLEN FIXED BINARY(31),
2 #SENSEGS FIXED BINARY(31),
2 KEYFBAREA CHARACTER(15);

Fig A: Example of program entry and PCB definition (PL/I).

79
Explanation:

The procedure statement (labeled DLITPLI) is the program entry point. the
expression in parentheses following the keyword PROCEDURE represents the
parameters to be passed to the program by IMS, it consist of the pointer giving the
address of the PCB. The rest of the Fig A consist of a declare statement that
defines a structure to represent the single PCB used in the application.
The field DBDNAME contains the name of the underlying DBD
throughout the execution of the program.
The SEGLEVEL field is set after the DL/I operation to contain the
segment level number of the segment just accessed.
The STATUS field is the most important field in the PCB. After each DL/I
call, the two character value is placed in this field to indicate the success or
otherwise of the requested operation. A blank value indicates that the operation
was completed satisfactorily, any other value represents an exceptional or error
condition.
The PROCOPT field contains the PROCOPT value as specified in the
PCB statement when the PCB was originally defined.
The SEGNAME field contains the name if the segment last accessed.
The KEYFBLEN field contains the length of the fully concatenated key.
The #SENSEGS field contains a count of the number of sensitive
segments.
The field KEYFBAREA is the key feedback area contains the fully
concatenated key.

DL/I Examples

Get Unique (GU) Direct retrieval


Get next (GN) Sequential retrieval
Get next with parent (GNP) Sequential retrieval under current parent
Get hold (GHU), (GHN),(GHNP) Allows subsequent DLET/REPL
Insert (ISRT) Add new segment occurrence
Delete (DLET) Delete existing segment occurrence
Replace (REPL) Replace existing segment occurrence

Tab: DL/I Operations

Direct retrieval:
Get the first OFFERING occurrence where the location is Stockholm.

80
GU COURSE
OFFERING (LOCATION =’STOCKHOLM’)

Sequential retrieval with an SSA:


Get all STUDENT occurrences in the LDB, starting with the first student for the
first offering in Stockholm.

GU COURSE
OFFERING (LOCATION=’STOCKHOLM’)
STUDENT
NS GN STUDENT
GOTO NS

Sequential retrieval with an SSA within a parent:


Get all students for the offering on 13 august 1973 of course M23.

GU COURSE (COURSE#=’M23’)
OFFERING (DATE=’730813’)
NP GNP STUDENT
GOTO NP

Segment occurrence insertion:


Add a new segment occurrence for the offering on 13 august 1973 of course M23.

ISRT COURSE (COURSE#=’M23’)


OFFERING (DATE=’730813’)
STUDENT

Segment deletion:
Delete the offering of course M23 on aug 1973.

GHU COURSE (COURSE# = ‘M23’)


OFFERING (DATE=’730813’)
DLET

Segment replacement:
Change the location of the 13 Aug 1973 offering of course M23 to Helsinki.

GHU COURSE (COUSE# =’M23’)


OFFERING (DATE=’730813’)
REPL

81
Questions.
1. Explain physical and logical database of hierarchical approach with example.
2. Explain DataBase Description (DBD) with example.
3. Explain Hierarchical sequence key value.
4. Explain Program communication block (PCB).
5. Discuss DL/I operations with some examples.
STUDY MATERIAL
Course : B.Com CA
Subject : Data base management system
Semester :III

Unit : Five

UNIT-V

Syllabus

Network approach: Architecture of DBTG system. DBTG data structure: The set
construct, singular sets, sample schema, and the external level of DBTG-DBTG Data
manipulation

Books for reference:

1:Database system concepts


Abraham Silberschatz and Henry F.Korth

2:An introduction to database systems


C.J.Date

Basic concepts:

A network database consists of a collection of records, which are connected to


one another through links. A record is in many respects similar to an entity in the entity-
relationship model. Each record is a collection of fields (attributes), each of which
contains only one value. A link can be viewed as a restricted (binary) form of
relationship in the sense of the E-R model.
To illustrate, consider a database representing a customer-account relationship in
a banking system. There are two record types, customer and account. As we saw earlier,
the customer record type can be defined, using Pascal-like notation, as follows:

type customer = record


name: string;
street: string;

82
city: string;
end

The account record type can be defined as follows:

type account = record


number: integer;
balance: integer;
end

The sample database in figure A.1 shows that Lowman has account 305, Camp
has accounts 226 and 177, and kahn has account 155.

Lowman Square Dallas 305 500

226 336
Camp Downridge Garland
177 205

155 62
Fig:1
Sample database

Data-structure diagrams: [Architecture of


network model]
A data-structure diagram is the scheme representing the design of a network
database. Such a diagram consists of two basic components:

*Boxes, which correspond to record types.


*Lines, which correspond to links.

A data-structure diagram serves the same purpose as an entity-relationship diagram;


namely, it specifies the overall logical structure of the database. We shall consider the
representation of binary, ternary etc. relationships of entity-relationship diagrams.

Binary relationship

The entity-relationship diagram for banking example is shown as follows:

Street
Name Numbe Balance
r
83
City

Cust
customer account
Acct

E-R diagram (a)

Name street city Number balance

(b)

FIG:2
The above shown diagram (a) is the entity-relationship diagram and consists of
two entity-sets customer and account, and they are related through a binary ‘many-to-
many’ relationship ‘custacct’ with no descriptive attributes.
The diagram shows that a customer may have several accounts and that an
account may belong to several different customers. The corresponding data-
structure diagram is shown in figure (b). Here the record type customer
corresponds to the entity set customer. It includes three fields-name, street and
city.
Similarly, account is the record type corresponding to account entity-set and
includes the attributes number and balance. Since, in the E-R diagram of above figure the
CustAcct relationship is many-to-many, we draw no arrows on the link CustAcct
diagram. If the relationship custacct were one-to-many from customer to account then the
link custacct would have an arrow pointing to customer record type. The representation is
shown as follows:

name street city number balance


Customer account

(a)

name street city number balance


Customer account

FIG:3

84
A sample database corresponding to the data-structure diagram of figure as
shown. Since the relation is many-to-many, we show that katz has accounts 256 and
347 and that account 347 is owned by katz and Doner. A sample database
corresponding to the data-structure diagram is shown here:

Beck Maple San Francisco 200 55


256 100 000
Katz North San jose
347 667

Doner Sidehill Palo Alto 301 10 533

Fig:4
Sample database corresponding t diagram of FIG:3a

Since the relationship is one-to-many -------


From customer to account, a customer may have more than one account, as
is the case with Camp, who owns both 226 and 177. An account, however, cannot belong
to more than one customer, as is indeed observed in the sample database.
Finally, a sample database corresponding to the data-structure diagram of fig:3b is shown
in the FIG:1.

How to replace the E-R diagram shown in FIG:2a if the descriptive attribute has to be
included?

The transformation is more complicated because the link cannot contain any data
value.So new record type has to be created and links need to be established as follows:
If for example we consider the E-R diagram shown in FIG:2a and we are trying to
add the descriptive attribute date to the custacct relationship to denote the last time the
customer has accessed the account.The newly derived E-R diagram is shown here

To transform this diagram to a data-structure diagram we need to:


1:Replace entities customer and account with record types customer and account
2:Create a new record type date with a single field to represent the date.
3:Create the following many-to-one links:
*custdate from the date record type to the customer record type
*acctdate from the date record type to the account recotd type

85
The DBTG CODASYL Model
The Database Task Group wrote the first database standard specification, called
the CODASYL DBTG 1971 report, in the late 1960s. Then a number of changes have
been suggested to that report, the last official one in 1978.The rules or standards advised
by DBTG group are
Link restriction
DBTG Sets
Repeating Groups

Link Restriction
In the DBTG model, only many-to-one links can be used. Many-to-many links are
disallowed in order to simplify the implementation. One-to-one links are represented
using a many-to-one link. Let us illustrate this with the help of an example:

Consider a binary relationship that is either one-to-many or one-to-one. If for our


customer-account database, if the custacct relationship is one-to-many with no
descriptive attributes and with descriptive attribute is shown in the following figure:

Customer account
Name Street Number
City Balance

Customer account

Name Street Number


City Balance

Fig: Two data-structure diagramsDate

If the custacct relationship is many-to-many then our transformation algorithm


must be refined as follows. If the relationships have no descriptive attributes then the
following algorithm must be employed:

1:Replace the entity sets customer and account with record types customer and account.
2:Create a new dummy record type Rlink that may either have no fields or have a single
field containing an externally defined unique identifier.

86
3:Create the following two many-to-one links:
custrlink from rlink record type to customer record type
*acctlink from record type to account record type.

stre
et
numb Balanc
nam City er e
e

D Customer custA Account


cct

DBTG sets

Given that only many-to-one links can be used in the DBTG model, a data-
structure diagram consisting of two record types that are linked together has the
general form of the following figure:

Name street city Number


balance

B
Fig:A

87
The above shown structure is referred in the DBTG model as a DBTG-set. The name of
the set is usually chosen to be the same as the name of the link connecting the two record
types.
In each such DBTG-set, the record type A is said as the owner (or parent) of the
set, and the record type B is said as the member (or child) of the set. Each DBTG-set can
have any number of set occurrences-that is actual instances of linked records.
For example in the figure we are having three occurrences corresponding to the
DBTG-set of figure A.
Since many-to-many links are disallowed, each set occurrence has precisely one
owner and zero or more member records. In addition, no member record of a set can
participate. Simultaneoulsy in several set occurrences of different DBTG-sets.
To illustrate, consider the data-structure diagram shown here. There are two
DBTG-sets.
• Custacct, having customer as the owner of the DBTG-set, and account as the member
of the DBTG-set.
• Brncacct, having branch as the owner of the DBTG-set, and account as the member of
the DBTG-set.

The set custacct may be defined as follows:

Set name is custacct


Owner is customer
Member is account

The set brncacct may be defined similarly as


Set name is brncacct
Owner is branch
Member is account

An instance of the database is shown here:

Five set occurences are shown: three of set custacct,and two of set brncacct

1:owneer is customer record Lowman with a singke member account record 305
2:owner is customer record Camp with two member account records 177 and 226
3:Owner is cuatomer record Kahn with three member account records 155,402 and
408.
4:Owner is branch record Hillside with three member account records 305,226 and
155.
5:Owner is branch record Valleyview with three member account records 177,402
and 408
Here the fact, an account record cannot appear in more than one set occurrence of
one individual set type. This is because an account can belong to exactly one
customer, and can be associated with only one bank branch. An account can appear in

88
two set occurrences of different set types. For example, acccount 305 is a member of
set

occurrence 1 of type custacct and is also a member of set occurrence 4 of type


brncacct.
The member records of a set occurrence may be ordered in a variety of ways.

Repeating Groups:

The DBTG model provides a mechanism for a field to have a set of values, rather
than one single value.
For example, Suppose that a customer have several addresses. In this case, the
customer record type will have the (street, city) pair of fields is defined as repeating
group. So the customer record for Kahn is shown here:

The repeating groups construct is another way of representing the notion of weak
entities in the E-R model. To illustrate we shall split the entity set customer into two
sets:

*Customer, with descriptive attribute name


*Address, with descriptive attribute street and city.

The address entity set is weak entity set, since it depends on the strong entity set
customer.

DBTG data retrieval facility

The data manipulation language of the DBTG proposal consists of a number of


commands that are embedded in a host language. The commands are explained as
follows:

The Find and Get commands

The two most frequently used DBTG commands are

*find-locates a record in the database and sets the appropriate


currency pointers
*get,which copies the record to which the current of run-unit
points from the database to the appropriate program work area
template.

89
Access of individual records:

The find command has a number of forms. There are two different find commands for
locating individual records in the database. the simplest command has the form:

Find any <record type> using <record-field>

Purpose: Locates a record of type <record type> whose <record-field> value is the
same as the value of <record-field> in the <record-type> template in the program
work-area. The following currency pointers are set to point to that record:

*The currency of run-unit pointer


*The record-type currency pointer for <record type>
*For each set in which that record belongs, the appropriate set currency pointer

For example: Construct the DBTG query that prints the street address of Lowman.

Customer. name:=”Lowman”;
Find any customer-using name;

Get customer;
Print (customer.street);

To display the duplicate records the command is

Find duplicate <record type> using <record-field>

Which locates the next record, which matches the <record-field>.

Example: Construct the DBTG-query that prints the names of all the customers who
live in Dallas:
Customer.city:=”Dallas”;
Find any customer-using city;
While DB-status = 0 do
Begin
Get customer;
Print(customer.name);
Find duplicate customer using city;
End;

90
Access of records within a set

Purpose: Locate records in a particular DBTG-set.

There are three different types of commands.

The basic find command is

Find first <record type> within <set-type>

Which locates the first database record of type <record type> belonging to the current
<set-type>.

To locate the other members of a set the command is

Find next <record-type> within <set-type>

This command finds the next elements in the set <set-type>

Example: Construct the DBTG query that prints the total balance of all accounts
belonging to Lowman.

Sum: =0;
Customer. name:=”Lowman”;
Find any customer-using name;
Find first account within custacct;
While DB-status =0 do
Begin
Get account;
Sum:=sum + account. Balance;
Find next account within custacct;
End
Print (sum);

To find the owner of a particular DBTG-set .The command used is

Find owner within <set-type>

Example: Construct the DBTG-query that prints all the customers of the Hillside
branch:

Branch-name:=”Hillside”;
Find any branch-using name;

91
Find first account within brncacct;
While DB-status=0 do
Begin
Find owner within custacct;
Get customer;
Print(customer. name);
Find next account within brncacct;
End

DBTG update facility

Creating new records

To create a new record of type <record type> we insert the appropriate values in
the corresponding <record type> template. And the command used is

Store <record type>

Example: Construct the DBTG query to add a new customer Jackson to the
database.

Customer.name:=”Jackson”;
Customer.street:=”Old road”;
Customer.city:=”Richardson”;
Store customer;

Modifying an existing record

In order to modify an existing record of type <record type> we must find the
record in the database, get that record into the memory, and then change the desired
fields in the template of <record type>. Once this is accomplished, we reflect the
changes to the record to which the currency pointer of <record type> points by
executing the command:

Modify <record type>

The DBTG model requires the find command to be executed prior to modifying a
record must have the additional clause “for update” so that the system is aware of the
fact that the record is to be modified.

92
Example:
Construct the DBTG program to change the street address of Kahn to North Loop.

Customer.name:=”Kahn”;
Find for update any customer using name;
Get customer;
Customer.city:=”North Loop”;
Modify customer;

Deleting a record

To delete an existing record of type <record type> we use the command:

Erase <record type>

Example:
The query to construct the DBTG program to delete account 402 belonging to
Kahn:

Finish:=false;
Customer.name:=”Kahn”;
Find any customer using name;
Find for update first account within custacct;
While DB-status=0 and not finish do
Begin
Get account;
If account.number =402 then
Begin
Erase account;
Finish: = true;
End;
Else
Find for update next account within custAcct
End;

It is possible to delete an entire set occurrence by finding the owner of the set – say, a
record of type <record type> - and executing.

Erase all<record-type>

This will delete the owner of the set as well as its entire member. If a member of
the set is an owner of another set the members of that set are also deleted. That the erase
all operation is recursive.

93
Eg.
Consider the DBTG program to delete customer “Camp” and all of her accounts.
Customer.name :=”Camp”;
Find for update any customer using name;
Erase all customer.

DBTG set-processing facility

This mainly concerns with the mechanism of inserting records into and removing
records from a particular set occurrence.

The connect statement

To insert a new record of type <record type> into a particular occurrence of <set-
type> we must first insert the record into the database, then set the currency pointers of
<record type> and <set type> to point to the appropriate record and set occurrence.

The command used is

Connect <record type> to <set-type>

A new record can be inserted as follows:


1:create a new record of type <record type> .
2:Find the appropriate owner of the set <set type>.
3:Insert the new record into the set by executing the connect statement.

Example:

Create the DBTG query for creating new account 267 which belongs to Jackson:

Account.number:=267;
Account.balance:=0;
Store account;
Customer.name:=”Jackson”;
Find any customer using name;
Connect account to custacct;

The Disconnect statement

In order to remove a record of type <record type> from a set occurrence of <set-
type>, we need to set the currency pointer of <record type> and <set-type> to point to the
appropriate record and set occurrence. Once this is accomplished, the record can be
removed from the set by executing

Disconnect <record-type> from <set-type>

94
Eg. To remove account 177 from the set occurrence of type custacct.

Account.number :=177;
Find for update any account using number;
Get account;
Find owner within custacct;
Disconnect account from custacct;

The reconnect statement

In order to move a record of type <record-type> from one set occurrence to


another set occurrence of type <set-type>, we need to find the appropriate record and the
owner of the set occurrence to which the record is to be moved. Once this is done, we can
move the record by executing:

Reconnect <record-type> to <set-type>

Consider the DBTG program to move all accounts of Lowman that are currently
at the hillside branch to the valley view branch.

Customer.name :=”Lowman”;
Find any customer-using name;
Find first account within custacct;
While DB-status =0 do
Begin
Find owner within brncacct;
Getbranch;
If branch.name = “hillside” then
Begin
Branch.name:=”Valley view”;
Find any branch-using name;
Reconnect account to brncacct;
End;
Find next account within custacct;
End;

Set Insertion and Retention


When a new set is defined, we must specify how member records are to be
inserted. In addition, we must specify the conditions under which a record must be
retained in the set occurrence in which it was initially inserted.

95
Set Insertion
A newly created record of type <record type > of a set type <set type > can be
added to a set occurrence either explicitly (MANUALLY) or implicitly (automatically).
This distinction is specified at set definition time via

Insertion is < insert mode >

Where < insert mode > can take one of two forms.

♣ Manual : The new record can be inserted into the set manually ( explicitly ) by
executing .

Connect < record type > to <set-


type>
♣Automatic : The new record is inserted into the set automatically ( implicitly )
when it is created , that is , when we execute .

Store < record type >


In either case, just prior to insertion, the <set-type> currency pointer must point to
the set occurrence into which the insertion is to be made.

Set Retention
There are various restrictions on how and when a member record can be removed
from a set occurrence into which it has been inserted previously. These restrictions are
specified at set definition time via

Retention is < retention-mode >


Where <retention-mode> can take one of the three forms

96
♣Fixed : Once a member record has been inserted into a particular set
occurrence , it cannot be removed from that set . If retention is fixed , then to
reconnect a record to another set , we must first erase that record , re-create it ,
and then insert it into the new set occurrence .
♣Mandatory : Once a member record has been inserted into a particular set
occurrence , it can be reconnected only to another set occurrence of type <set-
type>. It can neither be disconnected nor be reconnected to a set of another type .
♣Optional : No restrictions are placed on how and when a member record can be
reconnected , disconnected ,and connected at will .The decision as to which to
option to choose is dependent on the application .

Deletion
When a record is deleted (erased) and that record is the owner of set occurrence of
type <set-type> , the best way of handling this deletion depends on the specification of
the set retention of <set-type>

♣ If the retention status is optional, then the record will be deleted and every
member of the set it owns will be disconnected. These records, however, are
kept in the database.
♣ If the retention status is fixed, then the record and all of its owned
members will be deleted. This follows from the fact that the fixed status
indicates that a member record cannot be removed from the set occurrence
without being deleted.
♣If the retention status is mandatory, then the record cannot be erased this is
because the mandatory status indicates that a member record must belong to
a set occurrence; it cannot be disconnected form that set.

Set Ordering
The members of a set occurrence of <set-type> may be ordered in a variety of
ways. A programmer specifies these orders when the set is defined
Order is <order-mode>

Where <order-mode> can be


♣ First : When a new record is added to a set , it is inserted in the first positive .
Thus, the set is in reverse chronological ordering
♣ Last : When a new record is added to a set , it is inserted in the ;last position .
Thus, the set is in chronological ordering
♣ Next : Suppose that the currency pointer of <set-type> points to record X . if X
is a member type , then when a new record is added to the set . It is

97
inserted in the position following X. If X is an owner type, then when a new
record is added, it is inserted in the last position.
♣ Prior : Suppose that the currency pointer of ,set-type> points to record X . If X
is a member type, then when a new record is added to the set it is
inserted in the position just prior to X. If X is an owner type, then
when a new record is added, it is inserted in the last position.
♣ System default : When a new record is added to a set , it is inserted in an
arbitrary position determined by the system .
♣ Sorted : When a new record is added to a set , it is inserted in a position that
ensures that the set will remain sorted . The sorting order is specified by a
particular key value when a programmer defines the set. The programmer must
specify whether members are ordered in ascending or descending order relative to
that key.

REFER THE TEXT BOOK FOR FURTHER REFERENCE

Questions:
1. Explain the architecture of network model.
2. Write short notes on
a) Link restriction
b) DBTG Sets
c) Repeating Groups
3. Explain DBTG data retrieval facility.
4. Explain DBTG set-processing facility.
5. explain DBTG update facility.
6. What is set insertion and retention.

98

You might also like