Database Management System Features

Database Management system
Dept of Computer Science & Engg, VJCET
MODULE I
1.1 Basic Concepts

DBMS is a collection of interrelated data and a set of programs to access this
data in convenient and efficient way. It controls the organization, storage,
retrieval, security and integrity of data in a database. In other words, it enables
users to create and maintain a database. It accepts requests from the
application and instructs the operating system to transfer the appropriate data.
It facilitates the processes of defining, constructing, manipulating and sharing
of database among various users and applications.
-Defining a database means specifying the different type of data elements to
be stored in the database. i.e. data types, structures and constraints. For a bank
database, specifies the fields like Name (string of alphabets), Acct Number
(integer with range) and also the characteristics of each field.
-Constructing the database is the process of storing the data itself on
some storage medium that is controlled by the dbms.
-Manipulating a database is the processing of database. It includes
updating and retrieving of database.
1.2 Purpose of database systems

File system Verses Database approach
One way to store information in a computer system is to store it
as in traditional file system. In this method each data is stored in different files.
And there is an application programs for each of the application.
Data redundancy and inconsistency

In traditional file systems the data may be duplicated. For eg: Consider
a bank having two accounts savings bank account and credit check
account. In this case, the address of customer is stored in two files: one
with SB account and other with checking record. Thus this duplication
will result in need of high storage space. And this will also leads to the
inconsistency. That is, if the address of a customer changes, then the
change may be reflected only in one account. This is the inconsistency

of data.
Difficulty in accessing information

Suppose the bank needs a list of customers with an account higher than
Rs. 10,000. But, we do not have an application at hand to list out this
request. Thus, to access this information we have two choices. First
one is that list out the SB account customers and then extracts the
needed list manually. In the second option, we have to develop a new
program to satisfy the new request. The two are difficult.
Data Isolation
Data are scattered in different files and files may be in various formats.
So it is difficult extract the appropriate data.
Integrity problems
The constraint of data is enforced through the programs by appropriate
code. So if we need to add a new constraint, we have to change the
code. Then, it is very difficult to add or change the constraints. The
problem will be compounded when constraints involves several
constraints from different files.
Atomicity problems
Suppose a failure occurs during execution of the program. Then the
execution stops in the middle of the program resulting in an
inconsistency. But the execution of a program should end to a
consistency state. For a traditional file system the failure mostly result
to an inconsistency state.
1.3 Features (characteristics) of DBMS

The basic difference difference between traditional file processing and
database approach is that in traditional file processing, each user defines and
implements the files needed for a specific application as part of programming
the application. But in case of database approach, a single repository of data is
maintained that is defined once and then is accessed by various users.
For eg. Consider a student record, in traditional file processing the office
should have a record for each student to keep his or her fees and payments.
And in department have another record for students to keep their marks and
progress. Even though both office and Department interested in data about
students, each user maintains separate files, because each user requires some
data that is not available from other user.
Now what are the features of database approach?
Database system is
1. Self describing:
i.e. The database system contains not only the database itself but also a
complete definition or description and structure of database . This structure
is stored in a catalog with type, storage format and constraints as I
mentioned earlier. The information stored in database is called meta-data.
2. Data security
The DBMS can prevent unauthorized users from viewing or updating the
database. Using passwords, users are allowed access to the entire database
or a subset of it known as a "subschema." For example, in a student
database, some users may be able to view payment details while others
may view only mark list of students.
3. Data Integrity
The DBMS can ensure that no more than one user can update the same
record at the same time. It can keep duplicate records out of the database;
for example, no two customers with the same customer number can be
entered.
4. Interactive Query
Most DBMSs provide query languages and report writers that let users
interactively interrogate the database and analyze its data. This important
feature gives users access to all management information as needed. i.e.
we will get easily all details of each student at any time.
5. Interactive Data Entry and Updating
Many DBMSs provide a way to interactively enter and edit data, allowing
you to manage your own files and databases. However, interactive
operation does not leave an audit trail and does not provide the controls
necessary in a large organization. These controls must be programmed into
the data entry and update programs of the application.
6. Data Independence
With DBMSs, the details of the data structure are not stated in each
application program. The program asks the DBMS for data by field name;
for example, a coded equivalent of "give me customer name and balance
due" would be sent to the DBMS. Without a DBMS, the programmer must
reserve space for the full structure of the record in the program. Any
change in data structure requires changing all application programs.
1.4 DBMS Components
Data:
Data stored in a database include numerical data which may be
integers (whole numbers only) or floating point numbers (decimal),

and non-numerical data such as characters (alphabetic and numeric
characters), date or logical (true or false). More advanced systems may
include more complicated data entities such as pictures and images as
data types.
Standard operations:
Standard operations are provided by most DBMS. These operations

provide the user basic capabilities for data manipulation. Examples of these
standard operations are sorting, deleting and selecting records.
Data definition language (DDL):
DDL is the language used to describe the contents of the database. It is used to describe, for example, attribute names
(field names), data types, location in the database, etc.
Data manipulation and query language:

Normally a query language is supported by a DBMS to form
commands for input, edit, analysis, output, reformatting, etc. Some degree
of standardization has been achieved with SQL (Structured Query
Language).
Programming tools:
Besides commands and queries, the database should be accessible
directly from application programs through function calls (subroutine

calls) in conventional programming languages.
File structures:
Every DBMS has its own internal structures used to organize the
data although some common data models are used by most DBMS.
Abstraction
We all know that each application program have some data relevant
to a particular task. And an application program needs to use a portion of data,
which is used by some other programs. In early days of computerization, each
application programmer designs the file structure, metadata of the file and the
access method each record. That is, each application program use its own data,
details concerning the structure of data as well as the access and to interpret
each data. The application programs are implemented independently and by
hence itself, any change in storage media requires changes to these structures
and access methods. Because the files were structured for one application, it was
difficult to use the data in these files to new applications requiring data from
several files belonging to different existing applications.
Eg: Consider two application programs that require the data on an entity
set EMPLOYEE. The first application program involves the public relation
department sending each employee a news letter and related material. This
application program interested in the record type EMPLOYEE, that
containing
the
values
for
the
attributes
of
EMPL_Name
and
EMPL_Address.
1.5 Architecture of DBMS

The generalized architecture of DBMS is called ANSI/SPARC model. The
architecture is divided into three levels: External level, Conceptual level and
Internal level.
The view at each of these levels is described by a schema. Schema
describes the records and its relationships in the view.
a. External view or User view
It is the highest level of data abstraction. This includes only those portions
of database of concern to a user or Application program. Each user has a
different external view and it is described by means of a scheme called external
schema. The schema contains the definition of the logical records and
relationships in external view. It also contains the method of deriving the objects
in the external view from the objects in the conceptual view.

b. Conceptual view
At this level of database abstraction, all the database entities and the
relationship among them are included. One conceptual view represents the entire
database called conceptual schema. It describes the method of deriving the
objects in the conceptual view from the objects in the internal view. And also
specify the checks to retain the data consistency and integrating.
c.Internal view
It is the lowest level of abstraction, closest to the physical storage method. It
describes how the data is stored, what is the structure of data storage and the
method of accessing these data. It is represented by internal schema.
View Level Defined by User
..View 1
View 2
View 3
..
Logical Level
Defined by DBA
Defined by DBA for
Physical Level
optimization
Fig 1.1
1.6 Data independence

Data independence of DBMS is the capacity to change the schema at one
level of database system without having to change the next high levels. The
three schema architecture can be used to achieve this data independence. We can
define data independence into two types:
1. Logical data independence
View 4
It is the capacity to change the conceptual schema without having to

change the external schema. Sometimes, we may need to change the
conceptual schema to expand the database, to change the constraints, or to
reduce the database. Only the view definitions and mappings need to be
changed in DBMS that supporting logical data independence. Application
programmer cannot feel any change in the schema construct of DBMS.
2. Physical data independence
Physical data independence is the capacity to change the internal
schema without having to change the conceptual schema and external
schema. The internal schema may change to improve the performance of
retrieval or update. Then the conceptual schema need not change if the data
remains same. For e.g.: We need not change the Query to retrieve a student
progress report even though the DBMS take a new method to store the
student record.
Advantages
1. Controlling Redundancy
In traditional file processing, every user group maintains its own
files. Each group independently keeps files on their db e.g., students.
Therefore, much of the data is stored twice or more. And the redundancy
leads to several problems:
a. duplication of effort
i.e. storage space wasted when the same data is stored repeatedly
b. files that represent the same data may become inconsistent (since the
updates are applied independently by each users group).
We can use controlled redundancy.
2. Restricting Unauthorized Access
A DBMS should provide a security and authorization subsystem.
Some db users will not be authorized to access all information in the db(e.g.,
financial data).
Some users are allowed only to retrieve data. Some users are allowed both to
retrieve and to update database.
3. Providing Persistent Storage for Program Objects and Data Structures

Data structure provided by DBMS must be compatible with the
programming languages data structures. E.g., Object oriented DBMS are
compatible with programming languages such as C++, SMALLTALK, and
the DBMS software automatically performs conversions between
programming data structure and file formats.
4. Permitting Inference and Actions Using Deduction Rules
Deductive database systems provide capabilities for defining
deduction rules to inference new information from the stored database facts.
5. Providing Multiple User Interfaces
(e.g., query languages, programming languages interfaces, forms, menudriven interfaces, etc.)
6. Representing Complex Relationships Between Data
The complex relationship between data is easily represented.
7. Enforce Integrity Constraints
The integrity constraint for information is reasonably enforced by the
database management system.
1.7 DBMS Disadvantages

A database system generally provides on-line access to the database
for many users. In contrast, a conventional system is often designed to meet
a specific need and therefore generally provides access to only a small
number of users. Because of the larger number of users accessing the data
when a database is used, the enterprise may involve additional risks as
compared to a conventional data processing system in the following areas.
1. Confidentiality, Privacy and Security
When information is centralized and is made available to
users from remote locations, the possibilities of abuse are often more than in
a conventional system. To reduce the chances of unauthorized users
accessing sensitive information, it is necessary to take technical,
administrative and, possibly, legal measures. Most databases store valuable
information that must be protected from deliberate attack and destruction.
2. Data Quality
Since the database is accessible to users remotely,
8
adequate controls are needed to control users updating data and to control
data quality. With increased number of users accessing data directly, there
are enormous opportunities for users to damage the data. Unless there are
suitable controls, the data quality may be compromised.
3. Data Integrity
Since a large number of users could be using a database
concurrently, we should have to ensure that data remain correct during
operation. The main threat to data integrity comes from several different
users attempting to update the same data at the same time. The database
therefore needs to be protected against accidental changes by the users.
4. Enterprise Vulnerability
Centralizing all data of an enterprise in one database may mean
that the database becomes critical resource. The survival of the enterprise
may depend on reliable information being available from its database. The
enterprise therefore becomes vulnerable to the destruction of the database or
to unauthorized modification of the database.
5. The Cost of using a DBMS
Conventional data processing systems are typically designed to
run a number of well-defined, preplanned processes. Such systems are often
"tuned" to run efficiently for the processes that they were designed for.
Although the conventional systems are usually fairly inflexible in that new
applications may be difficult to implement and/or expensive to run, they are
usually very efficient for the applications they are designed for.
The database approach on the other hand provides a
flexible alternative where new applications can be developed relatively
inexpensively. The flexible approach is not without its costs and one of these
costs is the additional cost of running applications that
the conventional system was designed for. Using standardized software is
almost always less machine efficient than specialized software.
1.8 Data model

Entities and Attributes
Entities are distinguishable objects of concern and are modeled using
their characteristics or attributes. A database usually contains large number
of similar entities. For eg: A company database consists of a large number of
employees may want to store similar information for each employee. Then
each of the employees can be termed as an entity. An entity can be an
object with physical existence. For eg: a car, a person or an employee. But
each entity will have its own value. Each entity has properties that describe
the entity called attribute of that entity. Collection of entities with same
attributes termed as an entity type.
For eg: Employee (Employee_id, Address, Designation, Salary)
Here Employee is an entity and Employee_id, Address, Designation, Salary
represents the attribute of entity Employee.
There can be several types of attributes such as Simple versus
composite, single-valued verses multi-valued and stored verses derived.
1. Composite versus Simple
Composite attributes are those attributes that can be divided into
smaller sub parts with independent meaning. Consider the above e.g.: in
which the attribute Address can be
divided into small sub parts like City, State and Street_address. The
attributes that are not divisible are called simple or atomic attributes. The
value of a composite attribute is the concatenation of the value of its
constituent simple attributes.
2. Single-valued verses multi-valued
Most of the attributes will have only single value for a particular
entity. Such attributes are called single valued. In some cases there may be
having more than one value for an attribute of a particular entity. These
attributes are called multi-valued. The attribute age of an entity person will
have only one value, while the college degree of that person will have more
than one degree. So the attribute age can be consider as single-valued and
college degree as multi-valued.
3.Stored verses derived
10
In some case the attribute values can be related so that one can be
derived from the other. Consider a person as an entity. The attributes age and
DateOfBirth of person is
related. i.e. the age of a person can be derived from the current date and his
DateOfBirth. The age attribute hence is called Derived attribute and the
DateOfBirth is called stored attribute from where age of person calculated.
Entity set
An entity set is a set of entities of the same type that share the
same properties, or attributes. It is represented by a set of attributes. An
attribute, as used in the E-R model can be characterized by the following
attributes.
Simple and composite attributes
Single and multi-valued attributes
Null attributes
Derived attributes
A relationship is an association among several entities. And a relationship

set is a set of relationships of the same type.
Keys
Before designing a database we should be able to specify how entities
within a given entity set and relationships within a given relationship set are
distinguished. Conceptually the individual entities and relationships are
distinct; but from a database perspective, the difference must be expressed
by their attributes. The concept of key is used to make such distinctions.
Super key is a set of attributes that, taken collectively, to identify
uniquely an entity in the entity set. For eg: the social_security_no attribute
of the entity set employee is sufficient to distinguish one employee entity
from another. Thus social_security_no is a superkey for the entity set
employee. Superkeys with minimal subset is known as the candidate key.
For eg: it is possible to combine the attributes, employ_id & employ_name
form a superkey. But the social_security_no is sufficient to distinguish the
two employees.
Thus social_security_no is a candidate key. Usually
primary key is used to denote the candidate key that is chosen by the
database designer to identify an entity from an entity set. A key (super,
11
candidate and primary) is a property of the entity set rather than the
individual entities.
Entity- Relationship (E-R) Diagram
The overall logical structure of a database can be expressed graphically by
an E-R diagram. The diagram consists of the following major components.
Rectangles: represent entity set.
Ellipses: represent attributes.
Diamonds: represents relationship sets.
Lines: links attribute set to entity set and entity set to relationship set.
Double ellipses: represent multi-valued attributes.
Dashed ellipses: denote derived attributes.
For eg: Consider an E-R diagram, which consists of two entity sets
customer and loan.
Addr
Designatio
n
Emp_i
d
Produc
t
Salary
Wor
ks
For
Employee
Location
Company
Fig1.2
A data model is a plan for building a database. The model represents
data conceptually, the way the user sees it, rather than how computers store
it. Data models focus on required data elements and associations; most often
they are expressed graphically using
12
Entity-relationship diagrams. On a more abstract level, the term is also used

in describing a database's overall structure. Mostly used data modeling
techniques are
1. Entity- Relationship model
2. Hierarchical model
3. Network model
4. Object-oriented model
1.9 Hierarchical Model

The hierarchical data model organizes data in a tree structure.
There is a hierarchy of parent and child data segments. This structure implies
that a record can have repeating information, generally in the child data
segments. Data in a series of records have a set of field values attached to it.
It collects all the instances of a specific record together as a record type.
These record types are the equivalent of tables in the relational model, and
with the individual records being the equivalent of rows. To create links
between these record types, the hierarchical model uses Parent Child
Relationships
Hierarchical databases link records like an organization chart.
A record type can be owned by only one owner. In the following example,
orders are owned by only one customer. Hierarchical structures were widely
used with early mainframe systems; however, they are often restrictive in
linking real-world structures.
Customer
Order
13
Fig 1.3
Advantages:
Hierarchical Model is simple to construct and operate on
Corresponds to a number of natural hierarchically organized domains e.g., assemblies in manufacturing, personnel organization in companies
Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT,
GET NEXT WITHIN PARENT etc.
Disadvantages:
Navigational and procedural nature of processing
Database is visualized as a linear arrangement of records
Little scope for "query optimization"
1.10 Network Model

In 1971, the Conference on Data Systems Languages (CODASYL) formally
defined the network model. The basic data modeling construct in the
network model is the set construct. A set consists of an owner record type, a
set name, and a member record type. A member record type can have that
role in more than one set, hence the multiparent concept is supported. An
owner record type can also be a member or owner in another set. In network
databases, a record type can have multiple owners. In the example below,
orders are owned by both customers and products, reflecting their natural
relationship in business.
Customer
Product
Order
14
Fig 1.4
Advantages:
Network Model is able to model complex relationships and represents
semantics of add/delete on the relationships.
Can handle most situations for modeling using record types and
relationship types.
Language is navigational; uses constructs like FIND, FIND member, FIND
owner, FIND NEXT within set, GET etc. Programmers can do optimal
navigation through the database.
Disadvantages:
Navigational and procedural nature of processing
Database contains a complex array of pointers that thread through a set of
records.
Little scope for automated "query optimization"
1.11Object-Oriented Model
Object DBMSs add database functionality to object programming
languages. They bring much more than persistent storage of
programming language objects. Object DBMSs
extend the semantics of the C++, Smalltalk and Java object programming
languages to provide full-featured database programming capability,
while retaining native language compatibility. A major benefit of this
approach is the unification of the application and database development
into a seamless data model and language environment. As a result,
applications require less code, use more natural data modeling, and code
bases are easier to maintain. Object developers can write complete
database applications with a modest
15
Objects
Order
Fig 1.5
1.12 Entity relational model (RDBMS - relational database

management system)
A database based on the relational model developed by E.F. Codd. A
relational database allows the definition of data structures, storage and
retrieval operations and integrity constraints. In such a database the data and
relations between them are organized in tables. A table is a collection of
records and each record in a table contains the same fields.
It permits the database designer to create a consistent,
logical representation of information. Consistency is achieved by including
declared constraints in the database design, which is usually referred to as
the logical schema. The theory includes a process of database normalization
whereby a design with certain desirable properties can be selected from a set
of logically equivalent alternatives. The access plans and other
implementation and operation details are handled by the DBMS engine, and
are not reflected in the logical model. This contrasts with common practice
for SQL DBMSs in which performance tuning often requires changes to the
logical model.
The basic relational building block is the domain or data
type, usually abbreviated nowadays to type. A tuple is an unordered set of
attribute values. An attribute is an ordered pair of attribute name and type
name. An attribute value is a specific valid value for the type of the attribute.
16
This can be either a scalar value or a more complex type. Relational

databases do not link records together physically, but the design of the
records must provide a common field, such as account number, to allow for
matching. Often, the fields used for matching are indexed in order to speed
up the process.
In the following example, customers, orders and products
are linked by comparing data fields and/or indexes when information from
more than one record type is needed. This method is more flexible for ad
hoc inquiries. Many hierarchical and network DBMSs also provide this
capability.
Relational model
Customer
Order
Fig 1.6
17
Product
MODULE 2
2.1 Basic Structure of relational model - The relational model for database
management is a data model based on predicate logic and set theory. It was invented
by Edgar Codd. The fundamental assumption of the relational model is that all data
are represented as mathematical n-ary relations, an n-ary relation being a subset of
the Cartesian product of n sets.
1) Relation The fundamental organizational structure for data in the relational model
is the relation. A relation is a two-dimensional table made up of rows and columns.
Each relation also called a table, stores data about entities.
2) Tuples - The rows in a relation are called tuples. They represent specific
occurrences (or records) of an entity. Each row consists of a sequence of values, one
for each column in the table. In addition, each row (or record) in a table must be
unique. A tuple variable is a variable that stand for a tuple.
3) Attributes The column in a relation is called attribute. The attributes represent
characteristics of an entity.
4) Domain For each attribute there is a set of permitted values called domain of that
attribute. For all relations r, the domain of all attributes of r should be atomic. A
domain is said to be atomic if elements of the domain are considered to be indivisible
units.
2.2 Database Schema Logical design of the database is termed as database

schema.
1) Database instance Database instance is a snapshot of the data in a database at a
given instant of time.
2) Relation schema The concept of relation schema corresponds to the
programming notion of type definition. It can be considered as the definition of a
domain of values. The database schema is the collection of relation schemas that
define a database.
3) Relation instance The concept of a relation instance corresponds to the
programming language notion of a value of a variable. For relation instance, we
actually mean the relation itself.
2.3 Keys A key is the relational means of specifying uniqueness. The keys
applicable in relational model are primary key, candidate key and super key.
18
1.) Primary key - A primary key is a value that can be used to identify a unique
row in a table. Attributes are associated with it.
2.) Candidate key - A candidate key of a relation variable is a set of attributes of that
relation variable such that (1) at all times it holds in the relation assigned to that
variable that there are no two distinct tuples with the same values for these attributes
and (2) there is not a proper subset for which (1) holds.
3.) Super key - A superkey is defined in the relational model as a set of attributes of a
relation variable for which it holds that in all relations assigned to that variable there
are no two distinct tuples that have the same values for the attributes in this set.
4.) Foreign key - A foreign key is a field or group of fields in a database record that
point to a key field or group of fields forming a key of another database record in
some (usually different) table. A relation schema, r1, derived from an E-R schema
may include among its attributes the primary key of another relation schema, r2. This
attribute is the foreign key from r1, referencing r2. The relation r1 is called the
referencing relation of the foreign key dependency and r2 is called the referenced
relation of r2.
2.4 Schema diagram A database schema, along with primary key and foreign
key dependencies, can be depicted pictorially by schema diagrams. Each relation in
the database schema is represented as a box, with the attributes listed inside it and the
relation name above it. If there are primary key attributes, a horizontal line crosses the
box, with the primary key attributes listed above the line. Foreign key dependencies
appear as arrows from the foreign key attributes of the referencing relation to the
foreign key attributes of the referenced relation.
2.5 Relational algebra The relational algebra is a procedural query language.

(A query language is a language in which a user requests information from the
database.) It consists of a set of operations that take one or two relations as input and
produce a new relation as the result. The fundamental operations in relational algebra
are select, project, union, set difference, Cartesian product and rename. There are
several other operations namely, set intersection, natural join, division and
assignment.
Fundamental operations
1. Select operation - The select operation selects tuples that satisfy a given predicate.
The Greek symbol is used to denote selection. The predicate appears as a subscript
to . It is a unary operation.
E.g. Consider the borrow relation and branch relation in the banking example:
19
Borrow relation
Branch name
Loan#
Amount
17
Customer
name
Jones
Downtown
Round Hill
23
Smith
2000
Redwood
13
Hayes
1300
1000
Table 2.1
Branch relation
Branch name
Branch city
Assets
Downtown
Brooklyn
9000000
Round Hill
Horseneck
21000000
Table 2.2
To
select
tuples (rows) of the borrow relation where the branch is Redwood, we would
write
Redwood
Palo Alto
17000000
bname =Redwood (borrow)

The new relation created as the result of this operation consists of one
tuple: (Redwood, 13, Hayes,1300). We allow comparisons using =, , <, , > and in
the selection predicate. We also allow the logical connectives (or) and (and). For
example:
bname = Downtown amount > 800 (borrow)

2. Project operation - The project operation is used to retrieve specific
attributes/columns from a relation. It is denoted using Greek letter pi ( ). It is a unary
operation.
For example, to obtain a relation showing customers and branches, but
ignoring amount and loan#, we write
branchname,customername(borrow)
3) Union operation The union operation is a binary operation since it involves 2
relations. It is used to retrieve tuples appearing in either or both the relations
participating in the UNION. It is denoted as U. For a union operation RUS to be legal,
we require that
o R and S must have the same number of attributes.
o The domains of the corresponding attributes must be the same.
20
4) Set difference The set difference operation is a binary operation. Set difference is
denoted by the minus sign ( ). It finds tuples that are in one relation, but not in
another. Thus R-S results in a relation containing tuples that are in R but not in S.
5) Cartesian product This is a binary operation involving 2 relations. It is used to
obtain all possible combination of tuples from two relations. The cartesian product
of two relations is denoted by a cross ( ), written R1 x R2 for relations R1 and R2.
The result of R1 x R2 is a new relation with a tuple for each possible pairing of tuples
from R1 and R2. In order to avoid ambiguity, the attribute names have attached to
them the name of the relation from which they came. If no ambiguity will result, we
drop the relation name. If R1 has n tuples, and R2 has m tuples, then R=R1 x R2 will
have mxn tuples.
6) Rename The rename operation solves the problems that occur with naming
when performing the cartesian product of a relation with itself.
Suppose we want to find the names of all the customers who live on the same street
and in the same city as Smith.
Customer name
Customer street
Customer city
Jones
Main
Harrison
Smith
North
Rye
Hayes
Main
Harrison
Table 2.3 Customer relation

We can get the street and city of Smith by writing
To find other customers with the same information, we need to reference the customer
relation again:
where p is a selection predicate requiring street and ccity values to be equal.

So we have to distinguish between the two street values appearing in the Cartesian
product, as both come from a single customer relation. For that, we use the rename
operator, denoted by the Greek letter rho ( ).
We write
to get the relation r under the name of x.
21
If we use this to rename one of the two customer relations we are using, the
ambiguities will disappear.
Additional operations
1. Set Intersection - Set intersection is denoted by
, and returns a relation that
contains tuples that are in both of its argument relations. It does not add any power as
Eg: Consider the depositor and borrower relations. If we want to find all customers
who have both a loan and an account, we have to take the intersection of two
relations. It can be written as customer name(borrower) customer name(depositor).
2. Natural join - Natural join is a dyadic operator that is written as R
S where R
and S are relations. The result of the natural join is the set of all combinations of
tuples in R and S that are equal on their common attribute names.
Consider R and S to be sets of attributes. We denote attributes appearing in both
relations by R S. We denote attributes in either or both relations by RUS. Consider
two relations r(R) and s(S). The natural join of r and s, denoted by r
s is a relation
on scheme R S. It is a projection onto R U S of a selection on r x s where the

predicate requires r.a =s.a for each attribute a in R S. Formally,
( r.A1=s.A1 r.A2=s.A2 .r.An=s.An (r x s)) where
s = RUS
R S = {1, 2,., n }
For an example consider the tables Employee and Dept and their natural join:
Employee
Dept DeptName
Name EmpId
DeptName
Harry
3415Manager
Finance
Sally
2241
Sales
22
Harriet 2202
Sales
Sales
Harriet
Production Charles
Finance
George
Table 2.4
Table 2.5
Employee
Dept
Name EmpId DeptName Manager
Harry 3415
Finance
George
Sally
Sales
Harriet
George 3401
Finance
George
Harriet 2202
Sales
Harriet
2241
Table 2.6
3. Equi-join - If we want to combine tuples from two relations where the combination
condition is not simply the equality of shared attributes then it is convenient to have a
more general form of join operator, which is the -join (or theta-join). The -join is a
dyadic operator that is written as

or
where a and b are attribute
names, is a binary relation in the set {<, , =, >, }, v is a value constant, and R and
S are relations. The result of this operation consists of all combinations of tuples in R
and S that satisfy the relation . The result of the -join is defined only if the headers
of S and R are disjoint, that is, do not contain a common attribute.
4. Outer-join - Whereas the result of a join (or inner join) consists of tuples formed
by combining matching tuples in the two operands, an outer join contains those tuples
and additionally some tuples formed by extending an unmatched tuple in one of the
23
operands by "fill" values for each of the attributes of the other operand. Three outer
join operators are defined: left outer join, right outer join, and full outer join.
Left Outer join - The left outer join is written as R =X S where R and S are relations.
The result of the left outer join is the set of all combinations of tuples in R and S that
are equal on their common attribute names, in addition to tuples in R that have no
matching tuples in S. For an example consider the tables Employee and Dept and their
left outer join:
In the resulting relation, tuples in S which have no common values in common

attribute names with tuples in R take a null value, . Since there are no tuples in Dept
with a DeptName of Finance or Executive, s occur in the resulting relation where
tuples in DeptName have tuples of Finance or Executive.
Table2.8
Dept
Employee
Name EmpId DeptName
DeptName Manager
Sales
Harriet
Table 2.9
Harry 3415
Finance
Sally
Sales
2241
George 3401
Finance
Harriet 2202
Sales
Tim
Executive
1123
Production Charles
The left outer join can be simulated using the natural join and
set union as follows:
24
R =X S = R (R
S)
Employee =X Dept
Table 2.10
Harry 3415
Finance
Right outer join - The right outer join

behaves almost identically to the left outer
George 3401
Finance
join, with the exception that all the values

from the "other" relation appear in the
Tim
1123
Executive
Sally
2241
Sales
Harriet 2202
Sales
Harriet
Harriet
resulting relation. The right outer join is

written as R X= S where R and S are
relations. The result of the right outer join is
the set of all combinations of tuples in R and
S that are equal on their common attribute
names, in addition to tuples in S that have no
matching tuples in R. For an eg. consider the
tables Employee and Dept and their right outer join:

In the resulting relation, tuples in R which have no common values in common
attribute names with tuples in S take a null value, . Since there are no tuples in
Employee with a DeptName of Production, s occur in the Name attribute of the
resulting relation where tuples in DeptName had tuples of Production.
25
Table 2.11
Employee
Name EmpId DeptName
Harry 3415
Finance
Sally
Sales
DeptName Manager
George 3401
Finance
Sales
Harriet 2202
Sales
Production Charles
Tim
Executive
2241
1123
Dept
Harriet
Table2.12
Employee X= Dept
Sally
2241
Sales
Harriet
Harriet 2202
Sales
Harriet
Production Charles
Table2.13
Full outer join - The outer join or full outer join in effect combines the results of the
left and right outer joins. The full outer join is written as R =X= S where R and S are
relations. The result of the full outer join is the set of all combinations of tuples in R
26
and S that are equal on their common attribute names, in addition to tuples in S that
have no matching tuples in R and tuples in R that have no matching tuples in S in their
common attribute names.
For an example consider the tables Employee and Dept and their full outer join:
In the resulting relation, tuples in R which have no common values in common
attribute names with tuples in S take a null value, . Tuples in S which have no
common values in common attribute names with tuples in R, also take a null value,
Employee
Name EmpId DeptName
Harry 3415
Finance
Sally
Sales
2241
George 3401
Finance
Harriet 2202
Sales
Tim
Executive
1123
Table 2.14
Dept
DeptName Manager
Sales
Harriet
Production Charles
Table 2.15
27
Table 2.16
Table2.17
Employee =X= Dept
Harry 3415
Sally
2241
George 3401
Finance
Completed
Sales
Harriet
Finance
Harriet 2202
Sales
Harriet
Tim
1123
Executive
Production Charles
Student
Fred
Task
Database1
Fred
Database2
Fred
Compiler1
Eugene
Database1
Eugene
Compiler1
Sara
Database1
Sara
Database2
5. Division operation - The division is a binary operation that is written as R S. The

result consists of the restrictions of tuples in R to the attribute names unique to R, i.e.,
in the header of R but not in the header of S, for which it holds that all their
combinations with tuples in S are present in R. For an example see the tables
Completed, DBProject and their division: Table 2.19
DBProject
Completed DBProject
Task
Database1
Student
Fred
Database2
Sara
28
Table 2.18
Let r(R) and s(S) be relations. Let
. The relation r s is a relation on scheme R
S. A tuple t is in r s if for every tuple ts in s there is a tuple tr in r satisfying both of

the following:
These conditions say that the

are tuples with the
portion of a tuple is in
portion and the
portion in
if and only if there
for every value of the
portion in relation .
6. Assignment operation - Sometimes it is useful to be able to write a relational
algebra expression in
operation, denoted
parts using a temporary relation variable. The assignment
, works like assignment in a programming language.
We could rewrite our division definition as
No extra relation is added to the database, but the relation variable created can be used
in subsequent expressions. Assignment to a permanent relation would constitute a
modification to the database.
2.6 Tuple Relational Calculus - The tuple calculus is a calculus that was
introduced by Edgar F. Codd as part of the relational model in order to give a
declarative database query language for this data model. The tuple relational calculus
is a nonprocedural language. (The relational algebra was procedural.) We must
provide a formal description of the information desired.
A query in the tuple relational calculus is expressed as { t / P(t) } i.e. the set of tuples t
for which predicate P is true. We also use the notation
o t[a] to indicate the value of tuple on attribute.
o t r to show that tuple t is in relation r.
29
Example Queries
For example, to find the branch-name, loan number, customer name and amount for
loans over $1200:
This gives us all attributes, but suppose we only want the customer names. (We would
use project in the algebra.) We need to write an expression for a relation on scheme
(cname).
In English, we may read this equation as the set of all tuples t such that there exists a
tuple s in the relation borrow for which the values of t and s for the cname attribute
are equal, and the value of s for the amount attribute is greater than 1200.
The notation
means that there exists a tuple t in relation r such that
predicate Q(t) is true''. Consider another example: Find all customers having a loan
from the SFU branch, and the cities in which they live:
In English, we might read this as the set of all (cname,ccity) tuples for which cname
is a borrower at the SFU branch, and ccity is the city of cname. Tuple variable s
ensures that the customer is a borrower at the SFU branch. Tuple variable u is
restricted to pertain to the same customer as , and also ensures that ccity is the city of
the customer.
The logical connectives (AND) and (OR) are allowed, as well as (negation). We
also use the existential quantifier and the universal quantifier .
Formal Definition
A tuple relational calculus expression is of the form { t | P(t) } where P is a formula.
Several tuple variables may appear in a formula.
Tuple variable : A tuple variable is said to be a free variable unless it is quantified by
a or a . If it is quantified by a or a , it is said to be bound variable.
Formula : A formula is built of atoms. An atom is one of the following forms:
o
o
o
s r , where s is a tuple variable, and r is a relation ( is not allowed).

s[x] u[y] where s and u are tuple variables, and x and y are attributes, and
is a comparison operator (
).
s[x] c, where c is a constant in the domain of attribute x.
Formulae are built up from atoms using the following rules:
30
o
o
o
o
An atom is a formula.
If P is a formula, then so are and (P).
If P1and P2 are formulae, then so are P1 P2,
,
If P(s) is a formula containing a free tuple variable s, then
and
are also formulae.

Important equivalences:
o
o
o
Safety of Expressions
A tuple relational calculus expression may generate an infinite expression, e.g.
There are infinite number of tuples that are not in borrow. Most of these tuples contain
values that do not appear in the database. So we have to restrict the relational
calculus.
Safe Tuple Expressions
The domain of a formula , denoted dom( ), is the set of all values referenced in P.
We may say an expression { t / P(t) }is safe if all values that appear in the result are
values from dom( ). A safe expression yields a finite number of tuples as its result.
Otherwise, it is called unsafe. The tuple relational calculus restricted to safe
expressions is equivalent in expressive power to the relational algebra.
2.7 Domain Relational Calculus - The domain relational calculus (DRC) is a

calculus that was introduced by Edgar F. Codd as a declarative database query
language for the relational data model. This language uses the same operators as tuple
calculus; Logical operators (and), V(or) and (not). The existential quantifier ()
and the universal quantifier () can be used to bind the variables.
Formal Definition
An expression is of the form
where the
represent domain variables, and
is a formula.
An atom in the domain relational calculus is of the following forms :
31
o
o
o
<x1, x2, ., xn> r where r is a relation on n attributes, and xi, 1 i

n, are domain variables or constants.
x y , where x and y are domain variables, and is a comparison
operator.
x c , where c is a constant.
Formulae are built up from atoms using the following rules:

o
o
o
o
An atom is a formula.
If P is a formula, then so are and (P).
If P1and P2 are formulae, then so are P1 P2,
,
.
If P(s) is a formula containing a free tuple variable s, then
and
are also formulae.

Example Queries
Find branch name, loan number, customer name and amount for loans of over $1200.
Find all customers who have a loan for an amount > than $1200.
Find all customers having a loan from the SFU branch, and the city in which they live.
Find all customers having a loan, an account or both at the SFU branch.
Find all customers who have an account at all branches located in Brooklyn.
Safety of Expressions
We say that an expression
{ < x1, x2,..,xn > | P (x1, x2,.xn)} is safe if all of the following hold:
32
1. All values that appear in tuples of the expression are values from dom(P).
2. For every there exists sub formula of the form x (P1(x)), the subformula is
true if and only if there is a value x in dom(P1) such that P1(x) is true.
3. For every for all subformula of the form Vx (P1(x)), the subformula is true if
and only if P1(x) is true for all values of x.
An expression such as { <b, l, a> | (<b, l, a> loan)} is unsafe because
it allows values in the result that are not in the domain of the expression.
All three of the following are equivalent:
o
o
o
The relational algebra.

The tuple relational calculus restricted to safe expressions.
The domain relational calculus restricted to safe expressions.
2.8 SQL Sql has become the standard relational database language. It has several
parts:
o Data definition language (DDL) - provides commands to
Define relation schemes.
Delete relations.
Create indices.
Modify schemes.
o Interactive data manipulation language (DML) - a query language based on
both relational algebra and tuple relational calculus, plus commands to insert,
delete and modify tuples.
o Embedded data manipulation language - for use within programming
languages like C, PL/1, Cobol, Pascal, etc.
o View Definition - commands for defining views
o Authorization - specifying access rights to relations and views.
o Integrity - a limited form of integrity checking.
o Transaction control - specifying beginning and end of transactions.
Basic Structure
Basic structure of an SQL expression consists of select, from and where clauses.
A typical SQL query has the form :
select A1, A2,.,An
from r1, r2,.,rn
where P
Each Ai represents an attribute, and each ri a relation. P is a predicate. This query is
equivalent to the algebra expression.
A1, A2,.,An( p (r1 x r2 x x rm))
33
If the where clause is omitted, the predicate P is true. The list of attributes can be
replaced with a * to select all. The result of an SQL query is a relation.
The select clause - corresponds to the projection operation of the relational algebra. It
is used to list the attributes desired in the result of a query. If we want to remove
duplicates in a selection procedure, we use the keyword distinct after select. The
keyword all is used to specify explicitly that duplicates are not removed. select *
means select all the attributes. Select clause can also contain arithmetic expressions
involving operators (+, -, *, / ) and operating on constants or attributes of tuples.
Eg: 1. select branch-name
from loan
1. select branch-name, loan-number, amount*100
from loan
The where clause - corresponds to selection predicate in relational algebra. It consists
of a predicate involving attributes of the relations that appear in the from clause. SQL
uses the logical connectives and, or and not - rather than mathematical symbols , V
and in the where clause. The operands of the logical connectives can be expressions
involving the comparison operators <, >, , , = and <>. SQL includes a between
comparison operator to simplify where clauses that specify that a value be less than or
equal to some value or greater than or equal to some other value.
Eg: 1. select loan-number
from loan
where amount between 90000 and 100000
The from clause - corresponds to Cartesian product of the relational algebra. It lists
the relations to be scanned in the evaluation of the expression.
The rename operation SQL provides a mechanism for renaming both relations and
attributes. It uses the as clause, taking the form: old-name as new-name.
String operations - The most commonly used operation on strings is pattern
matching using the operator like. We describe patterns using two special characters:
Percent (%) The % character matches any substring.
Underscore ( _ ) The _ character matches any character.
Patterns are case-sensitive. The keyword escape is used to define the escape character.
We can use not like for string mismatching.
Ordering the display of tuples - SQL allows the user to control the order in which
tuples are displayed.
o
o
o
order by makes tuples appear in sorted order (ascending order by

default).
desc specifies descending order.
asc specifies ascending order.
34
Set operations - SQL has the set operations union, intersect and except. union
eliminates duplicates, being a set operation. If we want to retain duplicates, we may
use union all, similarly for intersect and except.
Not all implementations of SQL have these set operations. except in SQL-92 is called
minus in SQL-86.
Aggregate functions - In SQL we can compute functions on groups of tuples using
the group by clause. Attributes given are used to form groups with the same values.
SQL can then compute
o
o
o
o
o
average value -- avg

minimum value -- min
maximum value -- max
total sum of values -- sum
number in group -- count
These are called aggregate functions. They return a single value. having-clause is
used to state conditions that applies to groups rather than to tuples. Predicates in the
having clause are applied after the formation of groups. If a where clause and a
having clause appear in the same query, the where clause predicate is applied first.
Tuples satisfying where clause are placed into groups by the group by clause. The
having clause is applied to each group. Groups satisfying the having clause are used
by the select clause to generate the result tuples. If no having clause is present, the
tuples satisfying the where clause are treated as a single group.
Null values The keyword null is used to test for a null value(absence of information
about the value of an attribute).
2.9Views in SQL - A view in SQL is defined using the create view command:
create view v as <query expression>
where <query expression> is any legal query expression. The view created is given
the name v. To create a view all-customer of all branches and their customers:
create view all-customer as
(select bname, cname
from depositor, account
where depositor.account# = account.account#)
union
(select bname, cname
from borrower, loan
where borrower.loan# = loan.loan#)
Having defined a view, we can now use it to refer to the virtual relation it creates.
View names can appear anywhere a relation name can.
35
2.10 Data manipulations

Insert It is used to insert a single tuple to a relation. To insert data into a relation,
we either specify a tuple, or write a query whose result is the set of tuples to be
inserted. Attribute values for inserted tuples must be members of the attribute's
domain.
Eg: To insert a tuple for Smith who has $1200 in account A-9372 at the SFU branch.
insert into
account values (SFU, A-9372', 1200)
It is important that we evaluate the select statement fully before carrying out any
insertion. If some insertions were carried out even as the select statement were being
evaluated, the insertion might insert an infinite number of tuples. Evaluating the select
statement completely before performing insertions avoids such problems. It is
possible for inserted tuples to be given values on only some attributes of the schema.
The remaining attributes are assigned a null value denoted by null. We can prohibit
the insertion of null values using the SQL DDL.
Delete The delete command removes tuples from a relation. Deletion is expressed
in much the same way as a query. Instead of displaying, the selected tuples are
removed from the database. We can only delete whole tuples. A deletion in SQL is of
the form delete from r where P. Tuples in r for which P is true are deleted. If
the where clause is omitted, all tuples are deleted. We may only delete tuples from one
relation at a time, but we may reference any number of relations in a select-fromwhere clause embedded in the where clause of a delete. However, if the delete
request contains an embedded select that references the relation from which tuples are
to be deleted, ambiguities may result.
Update - Updating allows us to change some values in a tuple without necessarily
changing all. where clause of update statement may contain any construct legal in a
where clause of a select statement (including nesting). A nested select within an
update may reference the relation that is being updated. As before, all tuples in the
relation are first tested to see whether they should be updated, and the updates are
carried out afterwards.
Update of a view - The view update exists also in SQL. An example will illustrate:
Consider a clerk who needs to see all information in the loan relation except amount.
Let the view branch-loan be given to the clerk:
create view branch-loan as select bname, loan# from loan
Since SQL allows a view name to appear anywhere a relation name may appear, the
clerk can write: insert into branch-loan values (SFU, L-307). This insertion
is represented by an insertion into the actual relation loan, from which the view is
constructed. However, we have no value for amount. This insertion results in (SFU'',
L-307, null) being inserted into the loan relation.
36
MODULE 3
3.1 Transaction and system preliminaries.

The concept of transaction has been devised as a convenient and precise way
of describing the various logical units that form a database system.
We have
transaction systems which are systems that operate on very large databases, on which
several (sometimes running into hundreds) of users concurrently operate i.e. they
manipulate the database transaction. There are several such systems presently in
operation in our country also if you consider the railway reservation system,
wherein thousands of stations each with multiple number of computers operate on a
huge database, the database containing the reservation details of all trains of our
country for the next several days. There are many other such systems like the airlines
reservation systems, distance banking systems, stock market systems etc. In all these
cases apart from the accuracy and integrity of the data provided by the database (note
that money is involved in almost all the cases either directly or indirectly), the
systems should provide instant availability and fast response to these hundreds of
concurrent users. In this block, we discuss the concept of transaction, the problems
involved in controlling concurrently operated systems and several other related
concepts. We repeat a transaction is a logical operation on a database and the users
intend to operate with these logical units trying either to get information from the
database and in some cases modify them.
Before we look into the problem of
concurrency, we view the concept of multiuser systems from another point of view
the view of the database designer.
3.1.1 A typical multiuser system

We remind ourselves that a multiuser computer system is a system that can be
used by a number of persons simultaneously as against a single user system,
37
which is used by one person at a time. (Note however, that the same system can be
used by different persons at different periods of time). Now extending this
concept to a database, a multiuser database is one which can be accessed and
modified by a number of users simultaneously whereas a single user database is
one which can be used by only one person at a time. Note that multiuser
databases essentially mean there is a concept of multiprogramming but the
converse is not true. Several users may be operating simultaneously, but not all of
them may be operating on the database simultaneously.
Now, before we see what problems can arise because of concurrency, we see
what operations can be done on the database. Such operations can be single line
commands or can be a set of commands meant to be operated sequentially. Those
operations are invariably limited by the begin transaction and end transaction
statements and the implication is that all operations in between them are to be done on
a given transaction.
Another concept is the granularity of the transaction. Assume each field in a
database is named. The smallest such named item of the database can be called a field
of a record. The unit on which we operate can be one such grain or a number of
such grains collectively defining some data unit. However, in this course, unless
specified otherwise, we use of single grain operations, but without loss of
generality. To facilitate discussions, we presume a database package in which the
following operations are available.
i)
Read_tr(X: The operation reads the item X and stores it into an assigned
variable. The name of the variable into which it is read can be anything,
but we would give it the same name X, so that confusions are avoided. I.e.
whenever this command is executed the system reads the element required
from the database and stores it into a program variable called X.
ii)
Write tr(X): This writes the value of the program variable currently
stored in X into a database item called X.
Once the read tr(X) is encountered, the system will have to perform the
following operations.
1. Find the address of the block on the disk where X is stored.
38
2. Copy that block into a buffer in the memory.

3. Copy it into a variable (of the program) called X.
A write tr (x) performs the converse sequence of operations.
1. Find the address of the diskblock where the database variable X is stored.
2. Copy the block into a buffer in the memory.
3. Copy the value of X from the program variable to this X.
4. Store this updated block back to the disk.
Normally however, the operation (4) is not performed every time a write tr is
executed. It would be a wasteful operation to keep writing back to the disk every
time. So the system maintains one/more buffers in the memory which keep getting
updated during the operations and this updated buffer is moved on to the disk at
regular intervals. This would save a lot of computational time, but is at the heart of
some of the problems of concurrency that we will have to encounter.
3.1.2 The need for concurrency control

Let us visualize a situation wherein a large number of users (probably spread
over vast geographical areas) are operating on a concurrent system. Several problems
can occur if they are allowed to execute their transactions operations in an
uncontrolled manner.
Consider a simple example of a railway reservation system. Since a number
of people are accessing the database simultaneously, it is obvious that multiple copies
of the transactions are to be provided so that each user can go ahead with his
operations.
Let us make the concept a little more specific.
Suppose we are
considering the number of reservations in a particular train of a particular date. Two

persons at two different places are trying to reserve for this train. By the very
definition of concurrency, each of them should be able to perform the operations
irrespective of the fact that the other person is also doing the same. In fact they will
not even know that the other person is also booking for the same train. The only way
of ensuring the same is to make available to each of these users their own copies to
operate upon and finally update the master database at the end of their operation.
Now suppose there are 10 seats are available. Both the persons, say A and B
want to get this information and book their seats. Since they are to be accommodated
concurrently, the system provides them two copies of the data. The simple way is to
39
perform a read tr (X) so that the value of X is copied on to the variable X of person
A (let us call it XA) and of the person B (XB). So each of them know that there are 10
seats available.
Suppose A wants to book 8 seats. Since the number of seats he wants is (say
Y) less than the available seats, the program can allot him the seats, change the
number of available seats (X) to X-Y and can even give him the seat numbers that
have been booked for him.
The problem is that a similar operation can be performed by B also. Suppose
he needs 7 seats. So, he gets his seven seats, replaces the value of X to 3 (10 7) and
gets his reservation.
The problem is noticed only when these blocks are returned to main database
(the disk in the above case).
Before we can analyse these problems, we look at the problem from a more
technical view.
1 The lost update problem: This problem occurs when two transactions that access
the same database items have their operations interleaved in such a way as to make
the value of some database incorrect.
Suppose the transactions T1 and T2 are
submitted at the (approximately) same time. Because of the concept of interleaving,

each operation is executed for some period of time and then the control is passed on to
the other transaction and this sequence continues. Because of the delay in updatings,
this creates a problem. This was what happened in the previous example. Let the
transactions be called TA and TB.
TA
TB
Read tr(X)
Read tr(X)
X = X NA
X = X - NB
Write tr(X)
write tr(X)
fig1
fig2
40
Time
Note that the problem occurred because the transaction TB failed to record the
transactions TA. I.e. TB lost on TA. Similarly since TA did the writing later on, TA lost
the updatings of TB.
2 The temporary update (Dirty read) problem

This happens when a transaction TA updates a data item, but later on (for some
reason) the transaction fails.
It could be due to a system failure or any other
operational reason. Or the system may have later on noticed that the operation should
not have been done and cancels it. To be fair, it also ensures that the original value is
restored.
But in the meanwhile, another transaction TB has accessed the data and since it
has no indication as to what happened later on, it makes use of this data and goes
ahead. Once the original value is restored by TA, the values generated by TB are
obviously invalid.
TA
TB
Read tr(X)
Time
X=XN
Write tr(X)
Read tr(X)
X=X-N
write tr(X)
Failure
X=X+N
Write tr(X)
Fig3
The value generated by TA out of a non-sustainable transaction is a dirty
data which is read by TB, produces an illegal value. Hence the problem is called a
dirty read problem.
41
3 The Incorrect Summary Problem: Consider two concurrent operations, again

called TA and TB. TB is calculating a summary (average, standard deviation or some
such operation) by accessing all elements of a database (Note that it is not updating
any of them, only is reading them and is using the resultant data to calculate some
values). In the meanwhile TA is updating these values. In case, since the Operations
are interleaved, TA, for some of its operations will be using the not updated data,
whereas for the other operations will be using the updated data. This is called the
incorrect summary problem.
TA
TB
Sum = 0
Read tr(A)
Sum = Sum + A
Read tr(X)
X=XN
Write tr(X)
Read tr(X)
Sum = Sum + X
Read tr(Y)
Sum = Sum + Y
Read (Y)
Y=YN
Write tr(Y)
Fig4
In the above example, both TA will be updating both X and Y. But since it first
updates X and then Y and the operations are so interleaved that the transaction TB
uses both of them in between the operations, it ends up using the old value of Y
with the new value of X. In the process, the sum we got does not refer either to
the old set of values or to the new set of values.
4 Unrepeatable read: This can happen when an item is read by a transaction twice,
(in quick succession) but the item has been changed in the meanwhile, though the
transaction has no reason to expect such a change. Consider the case of a reservation
system, where a passenger gets a reservation detail and before he decides on the
aspect of reservation the value is updated at the request of some other passenger at
another place.
42
3.1.4 The concept of failures and recovery

Any database operation can not be immune to the system on which it operates
(both the hardware and the software, including the operating systems). The system
should ensure that any transaction submitted to it is terminated in one of the following
ways.
a) All the operations listed in the transaction are completed, the
changes are recorded permanently back to the database and the
database is indicated that the operations are complete.
b) In case the transaction has failed to achieve its desired objective,
the system should ensure that no change, whatsoever, is reflected
onto the database. Any intermediate changes made to the database
are restored to their original values, before calling off the
transaction and intimating the same to the database.
In the second case, we say the system should be able to Recover from the
failure. Failures can occur in a variety of ways.
i)
A System Crash: A hardware, software or network error can make the

completion of the transaction an impossibility.
ii)
A transaction or system error: The transaction submitted may be faulty

like creating a situation of division by zero or creating a negative
numbers which cannot be handled (For example, in a reservation
system, negative number of seats convey no meaning). In such cases,
the system simply discontinuous the transaction by reporting an error.
iii)
Some programs provide for the user to interrupt during execution. If

the user changes his mind during execution, (but before the
transactions are complete) he may opt out of the operation.
iv)
Local exceptions: Certain conditions during operation may force the

system to raise what are known as exceptions. For example, a bank
account holder may not have sufficient balance for some transaction to
be done or special instructions might have been given in a bank
transaction that prevents further continuation of the process. In all
such cases, the transactions are terminated.
43
v)
Concurrency control enforcement: In certain cases when concurrency

constrains are violated, the enforcement regime simply aborts the
process to restart later.
The other reasons can be physical problems like theft, fire etc or system
problems like disk failure, viruses etc. In all such cases of failure, a recovery
mechanism is to be in place.
3.2 Transaction States and additional operations

Though the read tr and write tr operations described above the most
fundamental operations, they are seldom sufficient.
Though most operations on
databases comprise of only the read and write operations, the system needs several
additional operations for its purposes.
One simple example is the concept of
recovery discussed in the previous section. If the system were to recover from a crash
or any other catastrophe, it should first be able to keep track of the transactions
when they start, when they terminate or when they abort. Hence the following
operations come into picture.
i)
Begin Trans: This marks the beginning of an execution process.
ii)
End trans: This marks the end of a execution process.
iii)
Commit trans: This indicates that transaction is successful and the

changes brought about by the transaction may be incorporated onto the
database and will not be undone at a later date.
iv)
Rollback: Indicates that the transaction is unsuccessful (for whatever

reason) and the changes made to the database, if any, by the transaction
need to be undone.
Most systems also keep track of the present status of all the transactions at the present
instant of time (Note that in a real multiprogramming environment, more than one
transaction may be in various stages of execution). The system should not only be
able to keep a tag on the present status of the transactions, but also should know what
are the nextpossibilities for the transaction to proceed and in case of a failure, how to
roll it back. The whole concept takes the state transition diagram. A simple state
transition diagram, in view of what we have seen so for can appear as follows:
44
Terminate
Failure
Terminated
Abort
Terminate
Committe
d
Begin
Active
Transaction
End
Partially
committed
Transaction
Commit
Read/Write
Fig5
The arrow marks indicate how a state of a transaction can change to a next
state. A transaction is in an active state immediately after the beginning of execution.
Then it will be performing the read and write operations. At this state, the system
protocols begin ensuring that a system failure at this juncture does not make
erroneous recordings on to the database. Once this is done, the system Commits
itself to the results and thus enters the Committed state. Once in the committed
state, a transaction automatically proceeds to the terminated state.
The transaction may also fail due to a variety of reasons discussed in a
previous section. Once it fails, the system may have to take up error control exercises
like rolling back the effects of the previous write operations of the transaction. Once
this is completed, the transaction enters the terminated state to pass out of the system.
A failed transaction may be restarted later either by the intervention of the
user or automatically.
The concept of system log:
To be able to recover from failures of the transaction operations the
system needs to essentially maintain a track record of all transaction operations that
are taking place and that are likely to affect the status of the database.
This
information is called a System log (Similar to the concept of log books) and may
become useful when the system is trying to recover from failures.
The log
information is kept on the disk, such that it is not likely to be affected by the normal
45
system crashes, power failures etc. (Otherwise, when the system crashes, if the disk
also crashes, then the entire concept fails). The log is also periodically backed up into
removable devices (like tape) and is kept in archives.
The question is, what type of data or information needs to be logged into the
system log?
Let T refer to a unique transaction id, generated automatically whenever a
new transaction is encountered and this can be used to uniquely identify the
transaction. Then the following entries are made with respect to the transaction T.
i)
[Start-Trans, T] : Denotes that T has started execution.
ii)
[Write-tr, T, X, old, new]: denotes that the transaction T has changed the
old value of the data X to a new value.
iii)
[read_tr, T, X] : denotes that the transaction T has read the value of the X
from the database.
iv)
[Commit, T] : denotes that T has been executed successfully and confirms

that effects can be permanently committed to the database.
v)
[abort, T] : denotes that T has been aborted.
These entries are not complete. In some cases certain modification to their purpose
and format are made to suit special needs.
(Note that though we have been talking that the logs are primarily useful for recovery
from errors, they are almost universally used for other purposes like reporting,
auditing etc).
The two commonly used operations are undo and redo operations. In the undo, if
the transaction fails before permanent data can be written back into the database, the
log details can be used to sequentially trace back the updatings and return them to
their old values. Similarly if the transaction fails just before the commit operation is
complete, one need not report a transaction failure. One can use the old, new values
of all write operation on the log and ensure that the same is entered onto the database.
Commit Point of a Transaction:

The next question to be tackled is when should one commit to the results of a
transaction? Note that unless a transaction is committed, its operations do not get
reflected in the database. We say a transaction reaches a Commit point when all
46
operations that access the database have been successfully executed and the effects of
all such transactions have been included in the log. Once a transaction T reaches a
commit point, the transaction is said to be committed i.e. the changes that the
transaction had sought to make in the database are assumed to have been recorded
into the database. The transaction indicates this state by writing a [commit, T] record
into its log. At this point, the log contains a complete sequence of changes brought
about by the transaction to the database and has the capacity to both undo it (in case
of a crash) or redo it (if a doubt arises as to whether the modifications have actually
been recorded onto the database).
Before we close this discussion on logs, one small clarification. The records
of the log are on the disk (secondary memory). When a log record is to be written, a
secondary device access is to be made, which slows down the system operations. So
normally a copy of the most recent log records are kept in the memory and the
updatings are made there. At regular intervals, these are copied back to the disk. In
case of a system crash, only those records that have been written onto the disk will
survive.
Thus, when a transaction reaches commit stage, all records must be
forcefully written back to the disk and then commit is to be executed. This concept is
called forceful writing of the log file.
3.3 Desirable Transaction properties. (ACID properties)

For the effective and smooth database operations, transactions should possess
several properties.
These properties are Atomicity, consistency preservation,
isolation and durability. Often by combining their first letters, they are called ACID
properties.
i)
Atomicity: A transaction is an atomic unit of processing i.e. it cannot be

broken down further into a combination of transactions.
Looking
otherway, a given transaction will either get executed or is not performed

at all. There cannot be a possibility of a transaction getting partially
executed.
ii)
Consistency preservation:
A transaction is said to be consistency
preserving if its complete execution takes the database from one

consistent state to another.
47
We shall slightly elaborate on this. In steady state a database is expected to be

consistent i.e. there are not anomalies in the values of the items. For example if a
database stores N values and also their sum, the database is said to be consistent if the
addition of these N values actually leads to the value of the sum. This will be the
normal case.
Now consider the situation when a few of these N values are being changed.
Immediately after one/more values are changed, the database becomes inconsistent.
The sum value no more corresponds to the actual sum. Only after all the updatings
are done and the new sum is calculated that the system becomes consistent.
A transaction should always ensure that once it starts operating on a database,
its values are made consistent before the transaction ends.
iii)
Isolation: Every transaction should appear as if it is being executed in

isolation. Though, in a practical sense, a large number of such transactions
keep executing concurrently no transaction should get affected by the
operation of other transactions. Then only is it possible to operate on the
transaction accurately.
iv)
Durability; The changes effected to the database by the transaction should

be permanent should not vanish once the transaction is removed. These
changes should also not be lost due to any other failures at later stages.
Now how does one enforce these desirable properties on the transactions? The
atomicity concept is taken care of, while designing and implementing the transaction.
If, however, a transaction fails even before it can complete its assigned task, the
recovery software should be able to undo the partial effects inflicted by the
transactions onto the database.
The preservation of consistency is normally considered as the duty of the
database programmer. A consistent state of a database is that state which satisfies
the constraints specified by the schema.
Other external constraint may also be
included to make the rules more effective. The database programmer writes his
programs in such a way that a transaction enters a database only when it is in a
consistent state and also leaves the state in the same or any other consistent state.
This, of course implies that no other transaction interferes with the action of the
transaction in question.
This leads us to the next concept of isolation i.e. every transaction goes about
doing its job, without being bogged down by any other transaction, which may also
48
be working on the same database. One simple mechanism to ensure this is to make
sure that no transaction makes its partial updates available to the other transactions,
until the commit state is reached. This also eliminates the temporary update problem.
However, this has been found to be inadequate to take care of several other problems.
Most database transaction today come with several levels of isolation. A transaction
is said to have a level zero (0) isolation, if it does not overwrite the dirty reads of
higher level transactions (level zero is the lowest level of isolation). A transaction is
said to have a level 1 isolation, if it does not lose any updates. At level 3, the
transaction neither loses updates nor has any dirty reads. At level 3, the highest level
of isolation, a transaction does not have any lost updates, does not have any dirty
reads, but has repeatable reads.
3.4 The Concept of Schedules

When transactions are executing concurrently in an interleaved fashion, not
only does the action of each transaction becomes important, but also the order of
execution of operations from each of these transactions. As an example, in some of
the problems that we have discussed earlier in this section, the problem may get itself
converted to some other form (or may even vanish) if the order of operations becomes
different. Hence, for analyzing any problem, it is not just the history of previous
transactions that one should be worrying about, but also the schedule of operations.
Schedule (History of transaction):
We formally define a schedule S of n transactions T 1, T2 Tn as on ordering of
operations of the transactions subject to the constraint that, for each transaction, Ti
that participates in S, the operations of Ti must appear in the same order in which they
appear in Ti. I.e. if two operations Ti1 and Ti2 are listed in Ti such that Ti1 is earlier to
Ti2, then in the schedule also Ti1 should appear before Ti2. However, if Ti2 appears
immediately after Ti1 in Ti, the same may not be true in S, because some other
operations Tj1 (of a transaction Tj) may be interleaved between them. In short, a
schedule lists the sequence of operations on the database in the same order in which it
was effected in the first place.
49
For the recovery and concurrency control operations, we concentrate mainly on readtr
and writetr operations, because these operations actually effect changes to the
database. The other two (equally) important operations are commit and abort, since
they decide when the changes effected have actually become active on the database.
Since listing each of these operations becomes a lengthy process, we make a notation
for describing the schedule. The operations of readtr, writetr, commit and abort, we
indicate by r, w, c and a and each of them come with a subscript to indicate the
transaction number
For example SA : r1(x); y2(y); w2(y); r1(y), W1 (x); a1
Indicates the following operations in the same order:
Readtr(x)
transaction 1
Read tr (y)
transaction 2
Write tr (y)
transaction 2
Read tr(y)
transaction 1
Write tr(x)
transaction 1
Abort
transaction 1
Conflicting operations: Two operations in a schedule are said to be in conflict if they

satisfy these conditions
i)
The operations belong to different transactions
ii)
They access the same item x
iii)
Atleast one of the operations is a write operation.
For example : r1(x); w2 (x)

W1 (x); r2(x)
w1 (y); w2(y)
Conflict because both of them try to write on the same item.
50
But r1 (x); w2(y) and r1(x) and r2(x) do not conflict, because in the first case the read
and write are on different data items, in the second case both are trying read the same
data item, which they can do without any conflict.
A Complete Schedule: A schedule S of n transactions T1, T2.. Tn is said to be a
Complete Schedule if the following conditions are satisfied.
i)
The operations listed in S are exactly the same operations as in T 1, T2

Tn, including the commit or abort operations. Each transaction is
terminated by either a commit or an abort operation.
ii)
The operations in any transaction. Ti appear in the schedule in the same

order in which they appear in the Transaction.
iii)
Whenever there are conflicting operations, one of two will occur before
the other in the schedule.
A Partial order of the schedule is said to occur, if the first two conditions of the
complete schedule are satisfied, but whenever there are non conflicting operations in
the schedule, they can occur without indicating which should appear first.
This can happen because non conflicting operations any way can be executed in any
order without affecting the actual outcome.
However, in a practical situation, it is very difficult to come across complete
schedules. This is because new transactions keep getting included into the schedule.
Hence, often one works with a committed projection C(S) of a schedule S. This set
includes only those operations in S that have committed transactions i.e. transaction Ti
whose commit operation Ci is in S.
Put in simpler terms, since non committed operations do not get reflected in the actual
outcome of the schedule, only those transactions, who have completed their commit
operations contribute to the set and this schedule is good enough in most cases.
51
3.5 Schedules and Recoverability :

Recoverability is the ability to recover from transaction failures. The success
or otherwise of recoverability depends on the schedule of transactions. If fairly
straightforward operations without much interleaving of transactions are involved,
error recovery is a straight forward process. On the other hand, if lot of interleaving
of different transactions have taken place, then recovering from the failure of any one
of these transactions could be an involved affair. In certain cases, it may not be
possible to recover at all. Thus, it would be desirable to characterize the schedules
based on their recovery capabilities.
To do this, we observe certain features of the recoverability and also of
schedules. To begin with, we note that any recovery process, most often involves a
roll back operation, wherein the operations of the failed transaction will have to be
undone. However, we also note that the roll back need to go only as long as the
transaction T has not committed. If the transaction T has committed once, it need not
be rolled back.
The schedules that satisfy this criterion are called recoverable
schedules and those that do not, are called non-recoverable schedules. As a rule,
such non-recoverable schedules should not be permitted.
Formally, a schedule S is recoverable if no transaction T which appears is S
commits, until all transactions T1 that have written an item which is read by T have
committed.
The concept is a simple one. Suppose the transaction T reads an item X from
the database, completes its operations (based on this and other values) and commits
the values. I.e. the output values of T become permanent values of database.
But suppose, this value X is written by another transaction T (before it is read
by T), but aborts after T has committed. What happens? The values committed by T
are no more valid, because the basis of these values (namely X) itself has been
changed. Obviously T also needs to be rolled back (if possible), leading to other
rollbacks and so on.
The other aspect to note is that in a recoverable schedule, no committed
transaction needs to be rolled back. But, it is possible that a cascading roll back
scheme may have to be effected, in which an uncommitted transaction has to be rolled
52
back, because it read from a value contributed by a transaction which later

aborted. But such cascading rollbacks can be very time consuming because at any
instant of time, a large number of uncommitted transactions may be operating. Thus,
it is desirable to have cascadeless schedules, which avoid cascading rollbacks.
This can be ensured by ensuring that transactions read only those values which
are written by committed transactions i.e. there is no fear of any aborted or failed
transactions later on. If the schedule has a sequence wherein a transaction T1 has to
read a value X by an uncommitted transaction T2, then the sequence is altered, so that
the reading is postponed, till T2 either commits or aborts.
This delays T1, but avoids any possibility of cascading rollbacks.
The third type of schedule is a strict schedule, which as the name suggests is highly
restrictive in nature. Here, transactions are allowed neither to read or write a value X
until the last transaction that wrote X has committed or aborted. Note that the strict
schedules largely simplifies the recovery process, but the many cases, it may not be
possible device strict schedules.
It may be noted that the recoverable schedule, cascadeless schedules and strict
schedules each is more stringent than its predecessor. It facilitates the recovery
process, but sometimes the process may get delayed or even may become impossible
to schedule.
3.6 Serializability
Given two transaction T 1 and T2 are to be scheduled, they can be scheduled in
a number of ways. The simplest way is to schedule them without in that bothering
about interleaving them. I.e. schedule all operation of the transaction T 1 followed by
all operations of T2 or alternatively schedule all operations of T2 followed by all
operations of T1.
53
T1
T2
read_tr(X)
X=X+N
write_tr(X)
read_tr(Y)
Y=Y+N
Write_tr(Y)
Time
read_tr(X)
X=X+P
Write_tr(X)
Fig 6 Non-interleaved (Serial Schedule) :A
T1
T2
read_tr(X)
T2
T2
read_tr(X)
read_tr(X)
X=X+N
X=X+P
X=X+P
write_tr(X)
Write_tr(X)
write_tr(X)
read_tr(Y)
readtr(X)
Y=Y+N
Write_tr(Y)
Fig 7
Non-interleaved (Serial Schedule):B
These now can be termed as serial schedules, since the entire sequence of operation in
one transaction is completed before the next sequence of transactions is started.
In the interleaved mode, the operations of T1 are mixed with the operations of T2.
This can be done in a number of ways. Two such sequences are given below:
54
T1
T2
read_tr(X )
X=X+N
read_tr(X)
X=X+P
write_tr(X)
read_tr(Y)
Write_tr(X)
Y=Y+N
Write_tr(Y)
Fig 8 Interleaved (non-serial schedule):C
T1
T2
read_tr(X)
X=X+N
write_tr(X)
read_tr(X)
X=X+P
Write_tr(X)
read_tr(Y)
Y=Y+N
Write_tr(Y)
Fig 9 Interleaved (Nonserial) Schedule D.
Formally a schedule S is serial if, for every transaction, T in the schedule, all
operations of T are executed consecutively, otherwise it is called non serial. In such a
non-interleaved schedule, if the transactions are independent, one can also presume
that the schedule will be correct, since each transaction commits or aborts before the
55
next transaction begins. As long as the transactions individually are error free, such a
sequence of events are guaranteed to give a correct results.
The problem with such a situation is the wastage of resources. If in a serial
schedule, one of the transactions is waiting for an I/O, the other transactions also
cannot use the system resources and hence the entire arrangement is wasteful of
resources. If some transaction T is very long, the other transaction will have to keep
waiting till it is completed. Moreover, wherein hundreds of machines operate
concurrently becomes unthinkable. Hence, in general, the serial scheduling concept is
unacceptable in practice.
However, once the operations are interleaved, so that the above cited problems
are overcome, unless the interleaving sequence is well thought of, all the problems
that we encountered in the beginning of this block become addressable. Hence, a
methodology is to be adopted to find out which of the interleaved schedules give
correct results and which do not.
A schedule S of N transactions is serialisable if it is equivalent to some
serial schedule of the some N transactions. Note that there are n! different serial
schedules possible to be made out of n transaction. If one goes about interleaving
them, the number of possible combinations become unmanageably high. To ease our
operations, we form two disjoint groups of non serial schedules- these non serial
schedules that are equivalent to one or more serial schedules, which we call
serialisable schedules and those that are not equivalent to any serial schedule and
hence are not serialisable once a nonserial schedule is serialisable, it becomes
equivalent to a serial schedule and by our previous definition of serial schedule will
become a correct schedule. But now can one prove the equivalence of a nonserial
schedule to a serial schedule?
The simplest and the most obvious method to conclude that two such
schedules are equivalent is to find out their results. If they produce the same results,
then they can be considered equivalent. i.e. it two schedules are result equivalent,
then they can be considered equivalent. But such an oversimplification is full of
problems. Two sequences may produce the same set of results of one or even a large
56
number of initial values, but still may not be equivalent. Consider the following two
sequences:
S1
S2
read_tr(X)
read_tr(X)
X=X+X
X=X*X
write_tr(X)
Write_tr(X)
fig 10
For a value X=2, both produce the same result. Can be conclude that they are
equivalent? Though this may look like a simplistic example, with some imagination,
one can always come out with more sophisticated examples wherein the bugs of
treating them as equivalent are less obvious. But the concept still holds -result
equivalence cannot mean schedule equivalence. One more refined method of finding
equivalence is available. It is called conflict equivalence. Two schedules can be
said to be conflict equivalent, if the order of any two conflicting operations in both the
schedules is the same (Note that the conflicting operations essentially belong to two
different transactions and if they access the same data item, and atleast one of them is
a write_tr(x) operation). If two such conflicting operations appear in different orders
in different schedules, then it is obvious that they produce two different databases in
the end and hence they are not equivalent.
1 Testing for conflict serializability of a schedule:
We suggest an algorithm that tests a schedule for conflict serializability.
1. For each transaction Ti, participating in the schedule S, create a node
labeled T1 in the precedence graph.
2. For each case where Tj executes a readtr(x) after Ti executes write_tr(x),
create an edge from Ti to Tj in the precedence graph.
3. For each case where Tj executes write_tr(x) after Ti executes a read_tr(x),
create an edge from Ti to Tj in the graph.
4. For each case where Tj executes a write_tr(x) after Ti executes a
write_tr(x), create an edge from Ti to Tj in the graph.
57
5. The schedule S is serialisable if and only if there are no cycles in the

graph.
If we apply these methods to write the precedence graphs for the four cases of
section 1.8, we get the following precedence graphs.
X
T1
T2
T1
T2
X
Schedule A
Schedule B
X
T1
T1
T2
T2
X
Schedule C
Schedule D
Fig 11
We may conclude that schedule D is equivalent to schedule A.

2.View equivalence and view serializability:
Apart from the conflict equivalence of schedules and conflict serializability, another
restrictive equivalence definition has been used with reasonable success in the context
of serializability. This is called view serializability.
Two schedules S and S1 are said to be view equivalent if the following conditions
are satisfied.
i)
The same set of transactions participates in S and S1 and S and S1

include the same operations of those transactions.
58
ii)
For any operation ri(X) of Ti in S, if the value of X read by the

operation has been written by an operation wj(X) of Tj(or if it is the
original value of X before the schedule started) the same condition
must hold for the value of x read by operation ri(X) of Ti in S1.
iii)
If the operation Wk(Y) of Tk is the last operation to write, the item Y in

S, then Wk(Y) of Tk must also be the last operation to write the item y
in S1.
The concept being view equivalent is that as long as each read operation of the
transaction reads the result of the same write operation in both the schedules, the write
operations of each transaction must produce the same results.
Hence, the read
operations are said to see the same view of both the schedules. It can easily be
verified when S or S1 operate independently on a database with the same initial state,
they produce the same end states. A schedule S is said to be view serializable, if it is
view equivalent to a serial schedule.
It can also be verified that the definitions of conflict serializability and view
serializability are similar, if a condition of constrained write assumption holds on
all transactions of the schedules. This condition states that any write operation wi(X)
in Ti is preceded by a ri(X) is Ti and that the value written by wi(X) in Ti depends
only on the value of X read by ri(X). This assumes that computation of the new value
of X is a function f(X) based on the old value of x read from the database. However,
the definition of view serializability is less restrictive than that of conflict
serializability under the unconstrained write assumption where the value written by
the operation Wi(x) in Ti can be independent of its old value from the database. This
is called a blind write.
But the main problem with view serializability is that it is extremely complex
computationally and there is no efficient algorithm to do the same.
3.Uses of serializability:
If one were to prove the serializability of a schedule S, it is equivalent to saying that S
is correct. Hence, it guarantees that the schedule provides correct results. But being
serializable is not the same as being serial. A serial scheduling inefficient because of
59
the reasons explained earlier, which leads to under utilization of the CPU, I/O devices
and in some cases like mass reservation system, becomes untenable. On the other
hand, a serializable schedule combines the benefits of concurrent execution( efficient
system utilization, ability to cater to larger no of concurrent users) with the guarantee
of correctness.
But all is not well yet. The scheduling process is done by the operating system
routines after taking into account various factors like system load, time of transaction
submission, priority of the process with reference to other process and a large number
of other factors. Also since a very large number of possible interleaving combinations
are possible, it is extremely difficult to determine before hand the manner in which
the transactions are interleaved. In other words getting the various schedules itself is
difficult, let alone testing them for serializability.
Hence, instead of generating the schedules, checking them for serializability and then
using them, most DBMS protocols use a more practical method impose restrictions
on the transactions themselves.
These restrictions, when followed by every
participating transaction, automatically ensure serializability in all schedules that are

created by these participating schedules.
Also, since transactions are being submitted at different times, it is difficult to
determine when a schedule begins and when it ends. Hence serializability theory can
be used to deal with the problem by considering only the committed projection C(CS)
of the schedule.
Hence, as an approximation, we can define a schedule S as
serializable if its committed C(CS) is equivalent to some serial schedule.
3.7.Locking techniques for concurrency control

Many of the important techniques for concurrency control make use of the concept
of the lock. A lock is a variable associated with a data item that describes the status of
the item with respect to the possible operations that can be done on it. Normally
every data item is associated with a unique lock. They are used as a method of
synchronizing the access of database items by the transactions that are operating
60
concurrently. Such controls, when implemented properly can overcome many of the
problems of concurrent operations listed earlier. However, the locks themselves may
create a few problems, which we shall be seeing in some detail in subsequent sections.
Types of locks and their uses:

Binary locks: A binary lock can have two states or values ( 1 or 0) one of them
indicate that it is locked and the other says it is unlocked. For example if we presume
1 indicates that the lock is on and 0 indicates it is open, then if the lock of item(X) is 1
then the read_tr(x) cannot access the time as long as the locks value continues to be
1. We can refer to such a state as lock (x).
The concept works like this. The item x can be accessed only when it is free
to be used by the transactions. If, say, its current value is being modified, then X
cannot be (infact should not be) accessed, till the modification is complete. The
simple mechanism is to lock access to X as long as the process of modification is on
and unlock it for use by the other transactions only when the modifications are
complete.
So we need two operations lockitem(X) which locks the item and
unlockitem(X) which opens the lock. Any transaction that wants to makes use of the
data item, first checks the lock status of X by the lockitem(X). If the item X is
already locked, (lock status=1) the transaction will have to wait. Once the status
becomes = 0, the transaction accesses the item, and locks it (makes its status=1).
When the transaction has completed using the item, it issues an unlockitem (X)
command, which again sets the status to 0, so that other transactions can access the
item.
Notice that the binary lock essentially produces a mutually exclusive type of
situation for the data item, so that only one transaction can access it. These operations
can be easily written as an algorithm as follows:
61
The Locking algorithm

Lockitem(X):
Start: if Lock(X)=0, /* item is unlocked*/
Then Lock(X)=1 /*lock it*/
Else
{
wait(until Lock(X)=0) and
the lock manager wakes up the transaction)
go to start
}
The Unlocking algorithm:
Unlock item(X):
Lock(X) 0; ( unlock the item)
{
If any transactions are waiting,

Wakeup one of the waiting transactions}
The only restrictions on the use of the binary locks are that they should be
implemented as indivisible units (also called critical sections in operating systems
terminology). That means no interleaving operations should be allowed, once a lock
or unlock operation is started, until the operation is completed. Otherwise, if a
transaction locks a unit and gets interleaved with many other transactions, the locked
unit may become unavailable for long times to come with catastrophic results.
To make use of the binary lock schemes, every transaction should follow certain
protocols:
1. A transaction T must issue the operation lockitem(X), before issuing a
readtr(X) or writetr(X).
2. A transaction T must issue the operation unlockitem(X) after all readtr(X)
and write_tr(X) operations are complete on X.
62
3. A transaction T will not issue a lockitem(X) operation if it already holds

the lock on X (i.e. if it had issued the lockitem(X) in the immediate
previous instance)
4. A transaction T will not issue an unlockitem(X) operation unless it holds
the lock on X.
Between the lock(X) and unlock(X) operations, the value of X is held only
by the transaction T and hence no other transaction can operate on X, thus
many of the problems discussed earlier are prevented.
Shared/Exclusive locks
While the operation of the binary lock scheme appears satisfactory, it suffers
from a serious drawback.
Once a transaction holds a lock (has issued a lock
operation), no other transaction can access the data item. But in large concurrent
systems, this can become a disadvantage. It is obvious that more than one transaction
should not go on writing into X or while one transaction is writing into it, no other
transaction should be reading it, no harm is done if several transactions are allowed to
simultaneously read the item. This would save the time of all these transactions,
without in anyway affecting the performance.
This concept gave rise to the idea of shared/exclusive locks. When only read
operations are being performed, the data item can be shared by several transaction,
only when a transaction wants to write into it that the lock should be exclusive. Hence
the shared/exclusive lock is also sometimes called multiple mode lock. A read lock is
a shared lock (which can be used by several transactions), whereas a writelock is an
exclusive lock. So, we need to think of three operations, a read lock, a writelock and
unlock. The algorithms can be as follows:
Read Lock Operation:
Readlock(X):
Start: If Lock (X) = unlocked
Then {
Lock(X)
read locked,
63
No of reads(X)
}
else if Lock(X) = read locked
then no. of reads(X) = no of reads(X)0+1;
else { wait until Lock(X)
unlocked and the lock
manager
wakes up the transaction) }
go to start
end.
The writelock operation:
Writelock(X)
Start: If lock(X) = unlocked
Then Lock(X)
write-locked.
Else { wait until Lock(X) = unlocked and

The lock manager wakes up the transaction}
Go to start
End;
The Unlock Operation:

Unlock(X)
If lock(X) = write locked
Then { Lock(X)
unlocked
Wakeup one of the waiting transaction, if any

}
else if Lock(X) = read locked
then { no of reads(X)
no of reads 1
if no of reads(X)=0
then { Lock(X) = unlocked
wakeup one of the waiting transactions, if any
}
64
The algorithms are fairly straight forward, except that during the unlocking
operation, if a number of read locks are there, then all of them are to be unlocked
before the unit itself becomes unlocked.
To ensure smooth operation of the shared / exclusive locking system, the
system must enforce the following rules:
1. A transaction T must issue the operation readlock(X) or writelock(X)
before any read or write operations are performed.
2. A transaction T must issue the operation writelock(X) before any
writetr(X) operation is performed on it.
3. A transaction T must issue the operation unlock (X) after all readtr(X) are
completed in T.
4. A transaction T will not issue a readlock(X) operation if it already holds a
readlock or writelock on X.
5. A transaction T will not issue a writelock(X) operation if it already holds a
readlock or writelock on X.
Conversion Locks
In some cases, it is desirable to allow lock conversion by relaxing the
conditions (4) and (5) of the shared/ exclusive lock mechanism. I.e. if a transaction T
already holds are type of lock on a item X, it may be allowed to convert it to other
types. For example, it is holding a readlock on X, it may be allowed to upgrade it to a
writelock. All that the transaction does is to issue a writelock(X) operation. If T is
the only transaction holding the readlock, it may be immediately allowed to upgrade
itself to a writelock, otherwise it has to wait till the other readlocks (of other
transactions) are released. Similarly if it is holding a writelock, T may be allowed to
downgrade it to readlock(X). The algorithms of the previous sections can be amended
65
to accommodate the conversion locks and this has been left as on exercise to the
students.
Before we close the section, it should be noted that use of binary locks does
not by itself guarantee serializability. This is because of the fact that in certain
combinations of situations, a key holding transaction may end up unlocking the unit
too early. This can happen because of a variety of reasons, including a situation
wherein a transaction feels it is no more needing a particular data unit and hence
unlocks, it but may be indirectly writing into it at a later time (through some other
unit). This would result in ineffective locking performance and the serializability is
lost. To guarantee such serializability, the protocol of two phase locking is to be
implemented, which we will see in the next section.
Two phase locking:

A transaction is said to be following a two phase locking if the operation of the
transaction can be divided into two distinct phases. In the first phase, all items that
are needed by the transaction are acquired by locking them. In this phase, no item is
unlocked even if its operations are over. In the second phase, the items are unlocked
one after the other. The first phase can be thought of as a growing phase, wherein the
store of locks held by the transaction keeps growing. In the second phase, called the
shrinking phase, the no. of locks held by the transaction keep shrinking.
readlock(Y)
readtr(Y)
Phase I
writelock(X)
----------------------------------unlock(Y)
Readtr(X)
Phase II
X=X+Y
writetr(X)
unlock(X)
fig12
66
3.8Query Optimization Techniques:

1. Heuristic-based query optimization This is based on heuristic rules for ordering
the operations in a query execution strategy. In general, many different relational
algebra expressions-and hence many different query trees can be equivalent.i.e they
can correspond to the same query. The query parser will typically generate a standard
initial query tree to correspond to an SQL query without doing an optimization. The
optimizer must include rules for equivalence among relational algebra expressions
that can be applied to the query. The heuristic query optimization rules then utilize
these equivalence expressions to transform the initial tree, into the final optimized
query tree.
General transformation rules for relational algebra operations:
1. Cascade of : A conjunctive selection condition can be broken up into a cascade
of individual operations.
2. Commutativity of : The operation is commutative.
3. Cascade of : In a cascade of operations, all but the last one can be ignored.
4. Commutating with : If the selection condition c involves only those attributes
A1, A2,An in the projection list, the 2 operations can be commuted:
A1, A2,..An (c ( R) ) = c ( A1, A2,..An ( R))
5. Commutativity of
operation. i.e. R
(and X ) : The
S=S
operation is commutative as is the X
RXS=SXR
6. Commuting with
(or X) : If all the attributes in the selection condition c
involve only the attributes of one of the relations being joined, say R, the two
operations can be commuted as follows:
c (R
S) = (c(R) )
Alternatively, if the selection condition c can be written as c1 and c2, where condition
c1 involves only the attributes of R and condition c2 involves only the attributes of S,
the operations commute as follows:
c (R
S) = (c1(R) )
The same rule apply if the
(c2(S) )
is replaced by a X operation.
67
7. Commuting with
(or X) : Suppose that the projection list is L = {A1, A2,
.An, B1, B2,.Bm} where A1, A2, ..An are attributes of R and B1, B2,
Bm are attributes of S. If the join condition c involves only attributes in L, the two
operations can be commuted as follows: L ( R
B2,..Bm
c S) = ( A1, A2,..An (R) )
c ( B1,
(S) )
If the join condition c contains additional attributes not in L, these must be added to
the projection list, and a final operation is needed. i.e. if attributes An+1,,An+k
of R and Bm+1,,Bm+p of S are involved in the join condition c but are not in the
projection list L, the operations commute as follows:
L(R
c S) = L ( ( A1, A2,..An,An+1,..An+k (R) )
c ( B1, B2,..Bm,Bm+1,.Bm+p(S) )).
For X, there is no condition c, so the first transformation rule

always by replacing
c with X.
8. Commutativity of set operations: The set operations U and are commutative

but is not.
9. Associativity of
, X , U and : These 4 operations are individually associative.
i.e if stands for any of these four operations then (R S) T = R (S T).

10. Commuting with set operations: The operation commutes with U, and - .
If stands for any of these three operations then c(R S) = (c(R ) (c( S)).
11. The operation commutes with U: L ( R U S) = ( L ( R)) U ( L ( S)).
12. Converting a ( , X) sequence into
: If the condition c of a that follows a X
corresponds to a join condition, convert the (, X) sequence into a

(c (R X S) = (R
as follows:
c S)
Outline Of Heuristic Algebraic Optimization Algorithm

Based on the above mentioned rules we can now outline the steps of the algorithm as :
1. Using rule1, break up any SELECT operations with conjunctive conditions
into a cascade of SELECT operations.
2. Using rules 2, 4, 6 and 10 concerning the commutativity of SELECT with
other operations, move each SELECT operations as far down the query tree as
is permitted by the attributes involved in the select condition.
3. Using rules 5 and 9 concerning commutativity and associativity of binary
operations, rearrange the leaf nodes of the tree using the following criteria.
First, position the leaf node relations with the most restrictive SELECT
operations so they are executed first in the query tree representation. The
68
definition of most restrictive SELECT can mean either the ones that produce a
relation with the fewest tuples or with the smallest absolute size. Another
possibility is to define the most restrictive SELECT as the one with the
smallest selectivity. Second, make sure that the ordering of leaf nodes does not
cause CARTESIAN PRODUCT operations. For e.g. if the two relations with
the most restrictive SELECT do not have a direct join condition between
them, it may be desirable to change the order of leaf nodes to avoid Cartesian
products.
4. Using rule 12, combine a CARTESIAN PRODUCT operation with a
subsequent SELECT operation in the tree into a JOIN operation, if the
condition represents a join condition.
5. Using rules 3, 4, 7 and 11 concerning the cascading of PROJECT and the
commuting of PROJECT with other operations, break down and move lists of
projection attributes down the tree as far as possible by creating new
PROJECT operations as needed. Only those attributes needed in the query
result and in subsequent operations in the query tree should be kept after each
PROJECT operation.
6. Identify subtrees that represent groups of operations that can be executed by a
single algorithm.
2. Cost Based optimization A query optimizer should not solely depend on
heuristic rules; it should also estimate and compare the costs of executing a query
using different execution strategies and should choose the strategy with the lowest
cost estimate. This approach is more suitable for compiled queries where the
optimization is done at compile time and the resulting execution strategy code is
stored and executed directly at run-time.
Cost Components for Query Execution
The cost of executing a query includes the following components:
1. Access cost to secondary storage: This is the cost of searching for, reading and
writing data blocks that reside on secondary storage, mainly on disk. The cost of
searching for records in a file depends on the type of access structures on that file,
such as ordering, hashing and primary or secondary indices. In addition, factors
such as whether the file blocks are allocated contiguously on the same disk
cylinder or scattered on the disk affect the access cost.
69
2. Storage cost: This is the cost of storing any intermediate files that are generated by
an execution strategy for the query.
3. Computation cost: This is the cost of performing in memory operations on the
data buffers during query execution. Such operations include searching for and
sorting records, merging records for a join and performing computations on field
values.
4. Memory usage cost: This is the cost pertaining to the number of memory buffers
needed during query execution.
5. Communication cost: This is the cost of shipping the query and its result from the
database site to the site or terminal where the query originated.
These components are used for cost function that is used to estimate query execution
cost. To estimate the costs of various execution strategies, we must keep track of
information that is needed for the cost functions. This information may be stored in
the DBMS catalog, where it is accessed by the query optimizer. First, we must know
the size of each file. For a file whose records are all of the same type, the number of
records(tuples), the (average) record size and the number of blocks are needed. The
blocking factor of the file may also be needed.
3.10 Assertions
An assertion is a predicate expressing a condition that we wish the database always to
satisfy. Domain constraints and referential-integrity constraints are special forms of
assertions. There are many constraints that we cannot express using only these special
forms. Examples of such constraints include
1. The sum of all loan numbers for each branch must be less than the sum of all
account balances at the branch.
2. Every loan has at least one customer who maintains an account with a minimum
balance of $1000.00
An assertion in SQL-92 takes the form
Create assertion <assertion-name> check <predicate>
The two constraints mentioned can be written as shown next. Since SQL does not
provide a for all X, P(X) construct (where P is a predicate), we are forced to
implement the construct using the equivalent not exists X such that not P(X)
construct , which can be written in SQL.
70
1. Create assertion sum-constraint check (not exists (select * from branch

where
(select
sum(amount)
from
loan
where
loan.branch-
name=branch.branch-name) >= (select sum(amount) from account where

loan.branch-name=branch.branch-name)))
2. Create assertion balance-constraint check (not exists (select * from loan
where not exists (select * from borrower, depositor, account where
loan.loan-number=borrower.loan-number and
borrower.customer-name=depositor.customer-name and
depositor.account-number=account.account.number and
account.balance>=1000)))
When an assertion is created, the system tests it for validity. If the assertion is valid,
then any future modification to the database is allowed only if it does not cause that
assertion to be violated.
3.10 Triggers
A trigger is a statement that is executed automatically by the system as a side
effect of a modification to the database. To design a trigger mechanism, we must meet
two requirements:
1. Specify the conditions under which the trigger is to be executed.
2. Specify the actions to be taken when the trigger executes
3.11 The basic structure of the oracle system

An Oracle server consists of an Oracle database the collection of
stored data, including log and control files and the Oracle instance the processes,
including Oracle (system) processes and user processes taken together, created for a
specific instance of the database operation.
Oracle Database Structure
The Oracle database has two primary structures:
1. A physical structure referring to the actual stored data.
2. A logical structure corresponding to an abstract representation of stored data,
which roughly corresponds to the conceptual schema of the databases.
The database contains the following types of files:
1. One or more data files; these contain the actual data.
71
2. Two or more log files called redo log files; these record all changes made to
data and are used in the process of recovering, if certain changes do not get
written to permanent storage.
3. One or more control files; these contain control information such as database
name, file names and locations and a database creation timestamp.
4. Trace files and an alert log; background processes have a trace file associate
with them and the alert log maintains major database events.
The structure of an Oracle database consists of the definition of database in terms of
schema objects and one or more tablespaces. The schema objects contain definitions
of tables, views, sequences, stored procedures, indexes, clusters and database links.
Oracle instance: The set of processes that constitute an instance of the servers
operation is called an Oracle instance, which consists of a System Global Area and a
set of background processes.
System Global Area (SGA) : This area of memory is used for database
information shared by users. Oracle assigns an SGA area when an instance starts.
The SGA in turn is divided into several types of memory structures:
1. Database buffer cache: This keeps the most recently accessed data blocks from
the database. This helps in reducing the disk I/O activity.
2. Redo log buffer, which is the buffer for the redo log file and is used for
recovery purposes.
3. Shared pool, which contains shared memory constructs.
User processes : Each user process corresponds to the execution of some

application or some tool.
Program Global Area (PGA): This is a memory buffer that contains data and
control information for a server process.
Oracle processes: A process is a thread of control or a mechanism in an operating

system that can execute a series of steps. A process has its own private memory
area where it runs.
Oracle Processes: Oracle creates server processes to handle requests from connected
user processes. The background processes are created for each instance of Oracle;
they perform I/O asynchronously and provide parallelism for better performance.
72
Oracle Startup and Shutdown: An Oracle database is not available to users until the
Oracle server has been started up and the database has been opened. Starting a
database and making it available system wide requires the following steps:
1. Starting an instance of the database: The SGA is allocated and background
processes are created in this step.
2. Mounting a database: This associates a previously started Oracle instance with a
database. Until then it is available only to administrators. The database
administrator chooses whether to run the database in exclusive or parallel mode.
When an oracle instance mounts a database in an exclusive mode, only that
instance can access the database. On the other hand, if the instance is started in a
parallel or share mode, other instances that are started in parallel mode can also
mount the database.
3. Opening a database: Opening a database makes it available for normal database
operations by having oracle open the on-line data files and log files.
The reverse of the above operations will shut down an Oracle instance as follows:
1. Close the window.
2. Dismount the database.
3. Shut down the Oracle instance.
3.12 Database structure and its manipulation in oracle

Schema Objects: In Oracle schema refers to a collection of data definition objects.
Schema objects are the individual objects that describe tables, views etc. Tables are
the basic units of data. Synonyms are direct reference to objects. Program units
include function, stored procedure or package.
Oracle Data Dictionary: This is a read-only set of tables that keeps the metadata
schema description for a database. Oracle dictionary, has the following components:
Names of users
Security information
Schema objects information
Integrity constraints
Space allocation and utilization of database objects
Statistics on attributes, tables and predicates
Access audit trail information
73
3.13 Storage organization in oracle

A database is divided into logical storage units called tablespaces, with the following
characteristics:
Each database is divided into one or more tablespaces.
There is a system tablespace and users tablespace.
One or more datafiles are created in each tablespace.
The combined storage capacity of a databases tablespace is the total storage

capacity of the database.
Data Blocks: Data Block represents the smallest unit of I/O. A data block has the
following components:
Header: Contains general block information such as block address and type of
segment.
Table directory: Contains information about tables that have data in the data
block.
Row directory: Contains information about the actual rows.
Row data: Uses the bulk of space in the data block.
Free space: Space allocated for row updates and new rows.
Extents: When a table is created, Oracle allocates it an initial extent. Incremental

extents are automatically allocated when the initial extent becomes full. All extents
allocated in index segments remain allocated as long as the index exists. When an
index associated with a table or cluster is dropped, Oracle reclaims the space.
Segments: A segment is made up of a number of extents and belongs to a tablespace.
Oracle uses the following types of 4 segments:
Data segments: Each nonclustered table and each cluster has a single data segment to
hold all its data, which is created when the application creates the table or cluster with
the CREATE command.
Index segments: Each index in an Oracle database has a single index segment, which
is created with the CREATE INDEX command.
Temporary segments: These are created by Oracle for use by SQL statements that
need a temporary work area.
74
Rollback segments: Each database must contain one or more rollback segments,
which are used for undoing transactions.
3.14 Programming in PL/SQL:
BLOCK PL/SQL STRUCTURE:
PL/SQL is a block-structured language. A PL/SQL block defines a unit of
processing, which can include its own local variables, SQL statements, cursors, and
exception handlers. The blocks can be nested. The simplest block structure is given
below.
DECLARE
Variable declarations
BEGIN
Program statements
EXCEPTION
WHEN exception THEN
Program Statements
In the above PL/SQL block, block parts are logical. Blocks starts with
DECLARATION section in which memory variables and other oracle objects can
be declared. The next section contains SQL executable statements for
manipulating table data by using the variables and constants declared in the
DECLARE section. EXCEPTIONS is the last sections of the PL/SQL block which
contains SQL and/or PL/SQL code to handle errors that may crop up during the
execution of the above code block. EXCEPTION section is optional.
Each block can contain other blocks, i.e. blocks can be nested. Blocks of the
code cannot be nested in the DECLARATION section.
PL/SQL CHARACTER SET
PL/SQL uses the standard ASCII set. The basic character set includes the
following.
75
Uppercase alphabets
A to Z.
Lowercase alphabets
a to z.
Numbers
0 to 9
Words
in a PL/SQL blocks are called
usedSymbols
( ) + lexical
- * /units.
< >We= can
! ;freely
: , insert
. @
# $ blocks.
^ & The
_ \ spaces
{ } have
? [ no
] effect
blank spaces between lexical units in%a PL/SQL
on the PL/SQL block.
The ordinary symbols used in PL/SQL blocks are
( )
/ < >
= ;
**
||
<<
Compound symbols used in PL/SQL block are

<>
!= -= ^= <= >= : =
>>
VARIABLES
Variables may be used to store the result of a query or calculations. Variables
must be declared before being used. Variables in PL/SQL block are named variables.
A variable name must begin with a character and can be followed by a maximum of
29 other characters (variable length is 30 characters).
Reserved words cannot be used as variable names unless enclosed within the
double quotes. Variables must be separated from each other by at least one space or by
a punctuation mark.
The case (upper/lower) is insignificant when declaring variable names. Space
cannot be used in a variable name.
LITERALS
A literal is a numeric value or a character string used to represent itself. So, literals
can be classified into two types.
Numeric literals
Non- numeric literals (string literals)
76
Numeric literals:
These can be either integers or floating point numbers. If a float is being

represented, then the integer part must be separated from the float part by a period
( . ).
Integers
25
43
437
-57
etc
Floats
6.34
25E-03
0.1
+17.1
etc
Non numeric literals:

These are represented by one or more legal characters and must be enclosed
within single quotes.

Ex:
Hello world
EMPLOYEE NAME
*******
A
*
We can represent single quote character itself in a non-numeric literal by writing it
twice.
Ex:
Dont go without saving the program
PL/SQL will also have literals, which are called as logical ( boolean) literals.
These are predetermined constants. The value it can take are TRUE, FALSE, and
NULL.
COMMENTS
A comment line begins with a double hyphen (--). In this case the entire
line will be treated as a comment.
Ex:
-- This section performs salary updation.
The comment line begins with a slash followed by an asterisk (/*) till the
occurrence of an asterisk followed by a slash (*/). In this case comment
lines can be extended to more than one lines.
77
Ex-1:
/* this is only for user purpose

which calculates the total salary temporarily
and stores the value in temp_sal */
Ex-2:
/* This takes rows from /* table EMPLOYEE */

and put on another table */
In the above comment, there is a comment within an another comment

line,
this is not allowed in PL/SQL.
PL/SQL DATA TYPES AND DECLARATIONS:
PL/SQL supports the standard ORACLE SQL data types. The default data
types that
can be declared in PL/SQL are
NUMBER:
For storing numeric data

Syntax:
variable name NUMBER (precision, [scale])
precision determines the number of significant digits that

NUMBER
can contain. Scale determines the number of digits to the right of
the
decimal point.
Ex:
CHAR:
NUMBER (4,2)
stores
4234.60
NUMBER (10)
stores
3289473348
This data type stores fixed length character data.

Syntax:
Variable name CHAR (size)
where size specifies fixed length of the variable name.

Ex:
VARCHAR2:
CHAR (10)
stores
MASTERFILE
It stores variable length character string data.

78
Syntax:
Variable name VARCHAR2 (size)
Where size specifies the maximum length of the variable name.

Ex:
DATE:
VARCHAR2 (20)
stores TRANSACTIONFILE
The date data types store a date and time.

Syntax:
variable name
DATE
Ex: date_of_birth DATE
BOOLEAN: This data type stores only TRUE, FALSE or NULL values.
Syntax: variable name
Ex:
flag
BOOLEAN
BOOLEAN.
%TYPE declares a variable or constant to have the same data type as that of a
previously defined variable or of a column in a table or in a view.
NOT NULL causes creation of a variable or a constant that cannot have a NULL
value. If you attempt to assign the value NULL to a variable or a constant that has
been assigned a NOT NULL constraint, causes an error.
NOTE: As soon as a variable or constant has been declared as NOT NULL, it must be
assigned a value. Hence every NOT NULL declaration of a variable or constant needs
to be followed by PL/SQL expression that loads a value into the variable or constant
declared.
DECLARING VARIABLES
We can declare a variable of any data type either native to the ORACLE or native to
PL/SQL. Variables are declared in the DECLARE section of the PL/SQL block.
Declaration involves the name of the variable followed by its data type. All statement
must end with a semicolon (;) which is the delimiter in PL/SQL. To assign a value to
the variable the assignment operator (:=) is used.
79
The general syntax is
<Variable name> <type> [ :=<value> ];
Ex: pay
NUMBER (6,2);
in_stack
BOOLEAN;
name
VARCHAR2 (30);
room
CHAR (2);
date_of_purchase
DATE;
ASSIGNING A VALUE TO A VARIABLE:

A value can be assigned to the variable in any one of the following two ways.
Using the assignment operator :=
Ex:
tax := price * tax_rate

pay := basic + da.
Selecting or fetching table data values in to variables.

Ex: SELECT sal INTO pay
FROM Employee
WHERE
emp_name = SMITH;
DECLARING A CONSTANT:
Declaring a constant is similar to declaring a variable except that you have
to add
the key word CONSTANT and immediately assign a value to it. Thereafter, no further
assignment to the constants is possible.
Ex:
pf_percent
CONSTANT
NUMBER (3,2) := 8.33;
PICKING UP A VARIABLES PARAMETERS FROM A TABLE CELL

The basic building block of a table is a cell (i.e. tables column). While
creating a table user attaches certain attributes like data type and constraints. These
attributes can be passed on to the variables being created in PL/SQL. This simplifies
the declaration of variables and constants.
80
For this purpose, the %TYPE attribute is used in the declaration of a

variable when the variables attributes must be picked from a table field (i.e. column).
Ex:
current_sal
employee.sal % TYPE
In the above example, current_sal is the variable of PL/SQL block. It gets the data
type
and constraints of the column (field) sal belong to the table Employee. Declaring a variable
with the %TYPE attribute has two advantages
You do not need to know the data type of the table column
If you change the parameters of the table column, the variables parameters will
change as well.
PL/SQL allows you to use the %TYPE attribute in a nesting variable declaration.
The following example illustrates several variables defined on earlier %TYPE
declarations in a nesting fashion.
Ex:
Dept_sales
INTEGER;
Area_sales
dept_sales %TYPE0;
Group_sales
area_sales %TYPE;
Regional_sales
area_sales %TYPE;
Corporate_sales
regional_sales %TYPE;
In case, variables for the entire row of a table need to be declared, then
instead
of declaring them individually, %ROWTYPE is used.
Ex: emp_row_var
employee %ROWTYPE;
Here, the variable emp_row_var will be a composite variable, consisting of

the column
names of the table as its member. To refer to a specific, say sal; the following
statement will be used.
emp_row_var.sal := 5000;
81
AN IDENTIFIER IN PL/SQL BLOCK:

The name of any ORACLE object (variable, memory variable, constant, record,
cursor etc) is known as an Identifier. The following laws have to be followed while
working with identifiers.
An identifier cannot be declared twice in the same block
The same identifier can be delcared in two different blocks.
In the second law, the two identifiers are unique and any change in one does
not affect the other.
PL/SQL OPERATORS
Operators are the glue that holds expressions together. PL/SQL operators can be
divided into
the following categories.
Arithmetic operators
Comparison operators
Logical operators
String operators
PL/SQL operators are either unary (i.e. they act on one value/variable) or binary
(they act on two values/variables)
1) ARITHMETIC OPERATORS:
Arithmetic operators are used for mathematical computations. They are
+
Addition
Subtraction or Negation ( Ex: -5)
Multiplication
Division
**
Exponentiation operator (example 10**5 = 10^5)
82
2) COMPARISON OPERATORS:
Comparison operators return a BOOLEAN result, either TRUE or FALSE.
They are
Equality operator
5=3
!=
Inequality operator
a!=b
<>
Inequality operator
5<>3
-=
Inequality operator
john -= johny
<
Less than operator
a<b
>
Greater than operator
a>b
<=
Less than or equal to
a<=b
>=
Greater than or equal to
a>=b
In addition to this PL/SQL also provides some other comparison operators like LIKE,
IN,
BETWEEN, IS NULL etc.
LIKE:
Pattern-matching operator.
It is used to compare a character string against a pattern. Two wild card

characters are defined for use with LIKE, the % (percentage sign) and ( _ )
underscore. The % sign matches any number of characters in a string and ( _ )
matches exactly one.
Ex-1:
new%matches with newyork, newjersey etc (i.e. any string
beginning with
83
the word new).

Ex-2: _ _ _day matches with Sunday, Monday and Friday and It will not
match with other days like Tuesday, Wednesday, Thursday
and
Saturday.
IN:
Checks to see if a value lies within a specified list of values. The syntax is
Syntax:
The_value [NOT] IN (value1, value2, value3)
Ex: 3 IN (4, 8, 7, 5, 3, and 2)
Returns TRUE.
Sun NOT IN ( sat, mon, tue, wed, sun) Returns TRUE.

BETWEEN: Checks to see if a value lies with in a specified range of value.
Syntax:
the_value [NOT]
BETWEEN low_end AND high_end.
Ex: 5 BETWEEN 5 AND 10.

4 NOT BETWEEN 3 AND 4
IS NULL:
Returns TRUE
Returns FALSE.
Checks to see if a value is NULL.

Syntax: the_value IS [NOT] NULL
Ex: If balance IS NULL then
If acc_id IS NOT NULL then
3) LOGICAL OPERATORS.
PL/SQL implements 3 logical operations AND, OR and NOT. The NOT
operator is unary operator and is typically used to negate the result of a comparison
expression, where as the AND and OR operators are typically used to link together
multiple comparisons.
A AND B is true only if A returns TRUE and B returns TRUE else it is
FALSE.
A OR B is TRUE if either of A or B is TRUE. And it is FALSE if both A
and B
are FASLE.
84
NOT A
Returns TRUE if A is FALSE

Returns FALSE if A is TRUE.
Ex: (5 = 5) AND (4<20) AND (2>=2)
(5=5) OR (5!=4)
Returns TRUE
Returns TRUE.
mon IN ( sun, sat) OR (2 = 2) Returns TRUE.

4) STRING OPERATORS:
PL/SQL has two operators specially designed to operate only on character string type
data. These are LIKE and ( || ) Concatenation operator. LIKE is a comparison
operator and is used to compare strings and it is discussed in the previous session.
Concatenation operator has following syntax.
Synatx:
String_1 || string_2
String_1 and string_2 both are strings and can be a string constants, string variables or
string expressions. The concatenation operator returns a resultant string consisting of
all the characters in string_1 followed by all the characters in string_2.
Ex : Chandra || shekhar
Returns Chandrashekhar
A=Engineering
B=College
C=VARCHAR2 (50)
C=A || || B
Returns a value to variable C as Engineering
College.
NOTE-1: PL/SQL string comparisons are always case sensitive, i.e. aaa not
equal to
AAA.
NOTE-2: ORACLE has some built in functions that are designed to convert
from one
data type to another data type.
To_date:
Converts a character string into date
To_number: Converts a character string to a number.

To_char:
Converts either a number or date to character string.

85
Ex:
To_date (1/1/92, mm/dd/yy/);
Returns 01-jan-1992.
To_date (1-1-1998, mm-dd-yyyy);
Returns 01/01/1998.
To_date (Jan 1, 2001,mm dd, yyyy);
Returns Saturday, January 01,
2001.
To_date (1/1/02, mm/dd/rr);
Returns Tue, Jan 01, 2002.
To_number (123.99, 999D99);
Returns 123.99
To_number ( $1,232.95, $9G999D99);
Returns $1, 232.99
To_char (123.99, 999D99);
Returns
123.99.
CONDITIONAL CONTROL IN PL/SQL :

In PL/SQL, the if statement allows you to control the execution of a
block of
code. In PL/SQL we can use the following if forms
IF condition THEN statements END IF;
IF condition THEN statements
ELSE statements
END IF;
IF condition THEN statements
ELSE IF condition THEN

Statements
ELSE statements
END IF
END IF;
ITERATIVE CONTROL IN PL/SQL :
PL/SQL provides iterative control and execution of PL/SQL statements
in the
block. This is the ability to repeat or skip sections of a code block.
Following are
the four types of iterative statements provided by the PL/SQL
The Loop statement
The WHILE Loop statement
86
i.
The GOTO statement
FOR Loop
LOOP STATEMENT:
A loop repeats a sequence of statements. The format is as follows.

LOOP
Statements
END LOOP;
The one or more PL/SQL statements can be written between the key
words LOOP and END LOOP. Once a LOOP begins to run, it will go on
forever. Hence loops are always accompanied by a conditional statements that
keeps control on the number of times it is executed. We can also build user
defined exists from a loop, where required.
Ex: LOOP
Cntr : = cntr + 1;
IF cntr > 100
EXIT;
END IF;
END LOOP;
EXIT statement brings the control out of loop if the condition is
satisfied.
ii.
WHILE LOOP :
The WHILE loop enables you to evaluate a condition before a
sequence of statements would be executed.
If condition is TRUE then
sequence of statements are executed. This is different from the FOR loop
where you must execute the loop atleast once. The syntax for the WHILE loop
is as follows:
Syntax:
WHILE < Condition is TRUE >

LOOP
87
< Statements >

END LOOP;
Ex :
DECLARE
Count NUMBER(2) : = 0;
BEGIN
WHILE count < = 10
LOOP
Count : = count + 1;
Message('while loop executes');
END LOOP;
END;
EXIT and EXIT WHEN statement:

EXIT and EXIT WHEN statements enable you to escape out of the control
of a loop. The format of the EXIT statement is as follows :
Syntax:
EXIT;
EXIT WHEN statements has following syntax

Syntax:
EXIT WHEN <condition is true >;
EXIT WHEN statement enables you to specify the condition required to exit the
execution of the loop. In this case no if statement is required.
Ex-1: IF count > = 10
EXIT;
Ex-2: EXIT WHEN count > = 10;

iii.
THE GOTO STATEMENT :

The GOTO statement allows you to change the flow of control within a
PL/SQL
block. The syntax is as follows
Syntax:
GOTO
<label name> ;
The label is surrounded by double brackets (<< >>) and label must not have a
semi colon after the label name. The label name does not contain a semi colon
because it is not a PL/SQL statement. But rather an identifier of a block of PL/SQL
code. You must have at least one statement after the label otherwise an error will
88
result. The GOTO destination must be in the same block, at the same level as or
higher than the GOTO statement itself.
Ex:
IF result = 'fail' THEN

GOTO failed_stud
END IF;
<<failed_stud>>
Message ('student is failed');
The entry point of the destination block is defined within << >> as
shown above, i.e. labels are written within the symbol
<< >>. Notice that
<<failed_stud>> is a label and it is not ended with semicolon ( ; ).

iv.
FOR LOOP:
FOR loop will allow you to execute a block of code repeatedly until
some condition occurs. The syntax of FOR loop is as follows.

Syntax:
FOR loop_index IN [ REVERSE] low_value ..
High_value LOOP
Statements to execute
END LOOP;
The loop_index is defined by oracle as a local variable of type integer.
REVERSE allows you to execute the loop in reverse order. The low_value ..
High_value is the range to execute the loop. These can be constants or
variables. The line must be terminated with loop with no semicolon at the end
of this line. You can list the statements to be executed until the loop is
executed is evaluated to false.
Ex: FOR v_count IN 1 .. 5 LOOP
Message ('for loop executes');
END LOOP;
In the above example the message 'for loop executes' is displayed five
times.
We can terminate the FOR loop permanently using EXIT statement
based on some BOOLEAN condition. Nesting of FOR loop can also be
89
allowed in PL/SQL. The outer loop executed once, then the inner loop is
executed as many times as the range indicates, and then the control is returned
to the outer loop until its range expires.
Ex: FOR out_count IN 1..2 LOOP
FOR in_count IN 1..2 LOOP
Message ('nested for loop');
END LOOP;
END LOOP;
In the above example the message 'nested for loop' is displayed four
times.
Let us discuss some examples from the understanding how to write a
PL/SQL block structure. Here we assume that a table called "EMP" is created
and the datas are already inserted into it.
Table name : EMP
Create table EMP
( emp_no
NUMBER (3),
name
VARCHAR2 (15),
salary
NUMBER (6,2),
dept
VARCHAR2 (15),
div
VARCHAR2 (2) );
EXAMPLE-1:
DECLARE
num
NUMBER (3);
sal
emp.salary %TYPE;
emp_name
emp.name %TYPE;
count
NUMBER (2) : = 1;
starting_emp CONSTANT NUMBER(3) : = 134;

BEGIN
SELECT name, salary INTO emp_name, sal
FROM EMP
WHERE emp_no = starting_emp;
90
WHILE
sal < 4000.00
LOOP
Count : = count + 1;
SELECT emp_no, name, salary INTO
Num, emp_name, sal FROM EMP
WHERE emp_no > 2150;
END LOOP;
Commit;
END;
In the above example there are five statements in the declaration part.
The num is a integer type, sal and emp_name takes the similar data type of
the salary and name columns of EMP table respectively. Count is a variable
of type integer and takes initial value 1. Starting_emp is a constant and it is of
integer type with immediately assigned value 134.
Between BEGIN and END key words, there are some SQL executable
statements used for manipulating the table data. The SELECT statement
extracts data stored in name and salary columns of EMP table corresponding
to the employee having employee number 134. It stores those values In the
variables emp_name and sal respectively.
If sal less than 4000 then the statements within the loop will be
executed. Within the loop, there are two SQL statements, the first one
increments the count value by 1 and the second statement is a SELECT
statement. The commit statement commits the changes made to that table. The
END statement terminates the PL/SQL block.
EXAMPLE-2:
This example assumes the existence of table accounts created by using
the following SQL statements.
Create table Accounts
(accnt_id
NUMBER(3),
91
name
VARCHAR2(25),
bal
NUMBER(6,2) );
PL/SQL block:
DECLARE
acct_balance NUMBER(6,2);
acct
CONSTANT NUMBER(3) : = 312;
debit_amt
CONSTANT NUMBER(5,2) : = 500.00;
BEGIN
SELECT bal INTO acct_balance FROM Accounts
WHERE accnt_id = acct;
IF acct_balance = debit_amt THEN
UPDATE Accounts
SET bal : = bal - debit_amt WHERE accnt_id = acct;
ELSE
Message ('insufficient amount in account');
END IF;
END;
The above example illustrates the use of IF .. THEN .. ELSE.. END IF
condition control statements.
Declaration part declares one variable and two constants. The
SELECT statement extracts the amount in the bal column of Accounts table
corresponding to account number 312, and stores that in a variable
acct_balance.
If statement checks acct_balance for sufficient amount before
debiting. It updates the table Accounts if it has sufficient amount in the
balance, else it displays a message intimating insufficient fund in the account
of specified accnt_id.
EXAMPLE-3:
92
This example assumes two tables, which are created by following

statements.
Create table Inventory
( prod_no
NUMBER (6),
product
VARCHAR2 (15),
quantity
NUMBER (5) );
Create table Purchase_record

( mesg
VARCHAR2 (50),
d_ate
DATE );
PL/SQL block :
DECLARE
num_in_stack
NUMBER(5);
BEGIN
SELECT quantity INTO num_in_stack
FROM Inventory WHERE product = 'gasket';
IF num_in_stack > 0 THEN
UPDATE Inventory SET quantity : = quantity - 1
WHERE product = 'gasket';
INSERT INTO Purchase_record
VALUES (' One gasket purchased', sysdate);
ELSE
INSERT INTO Purchase_record
VALUES ('no gasket availabel',sysdate);
Message ( 'there are no more gasket in stack' );
END IF;
Commit;
END;
The above block of PL/SQL code does the following;
It determines how many gaskets are left in stack.
If the number left in staff is greater than zero, it updates the inventory
to reflect the sale of a gasket.
93
It stores the fact that a gasket was purchased on a certain date.

If the stock available is zero, it stores the fact that there are no more
gaskets for sale on the date on which such a situation occurred.
ERROR HANDLING IN PL/SQL :

PL/SQL has the capability of dealing with the errors that arise while
executing a PL/SQL block of code. It has a number of conditions that are pre
programmed in to it that are recognized as error conditions. These are called
internally defined exceptions. You can also program PL/SQL to recognize userdefined exceptions.
There are two different types Error conditions ( Exceptions).
user defined error conditions / exceptions.
Predetermined internal PL/SQL exceptions.
1) USER DEFINED EXCEPTIONS:

User can write a set of code, which is to be executed while error occurs when
executing a PL/SQL block of code. These set of code are called user-defined
exceptions, and these are placed in the last section of PL/SQL block called
EXCEPTIONS.
The method used to recognise user-defined exceptions is as follows
Declare a user defined exception in the declaration section of
PL/SQL block.
In the main program block for the conditions that needs special
attention, execute a RAISE statement.
Reference the declared exception with an error handling routine in
EXCEPTION section of PL/SQL block.
94
RAISE statement acts like CALL statement of high level languages. It has
general format
RAISE < name of exception >
When RAISE statement is executed, it stops the normal processing of
PL/SQL block
of code and control passes to an error handler block of the code at the end
of PL/SQL
program block ( EXCEPTION section ).
An exception declaration declares a name for user defined error conditions that
the PL/SQL code block recognizes. It can only appear in the DECLARE section of the
PL/SQL code which preceedes the key word BEGIN.
EXAMPLE :
DECLARE
--------------zero_commission
Exception;
--------------BEGIN
----------------IF commission = 0 THEN
RAISE zero_commission;
-----------------------EXCEPTION
WHEN zero_commission THEN
Process the error
END;
Exception handler (error handler block ) is written between the key words
EXCEPTION and END. The exception handling part of a PL/SQL code is
optional. This block of code specifies what action has to be taken when the named
exception condition occurs.
95
The naming convention for exception name are exactly the same as those for
variables or constants. All the rules for accessing an exception from PL/SQL
blocks are same as those for variables and constants. However, it should be noted
that exceptions cannot be passed as arguments to functions or procedures like
variables or constants.
2) PREDETERMINED INTERNAL PL/SQL EXCEPTIONS:

The ORACLE server defines several errors with standard names. Although
every ORACLE error has a number, the error must be referred by name. PL/SQL
has predefined some common ORACLE errors and exceptions. Some of them are
given below:
NO_DATA_FOUND
Raised when a select statement returns zero rows.
TOO_MANY_ROWS
Raised when a select statement returns more than
one rows.
VALUE_ERROR
Raised when there is either a data type mismatch or if the
size is smaller than required size.
INVALID_NUMBER
Raised when conversion of a number to a
character string failed.
ZERO_DIVIDE
Raised when attempted to divide by zero.
PROGRAM_ERROR
Raised if PL/SQL encounters an internal problem.
STORAGE_ERROR
Raised if PL/SQL runs out of memory or if
memory if corrupted.
DUP_VAL_ON_INDEX
Raised when attempted to insert or update a
duplicate into a column that has unique index.
INVALID_CURSOR
Raised when illegal cursor operation was
attempted.
CURSOR_ALREADY_OPEN
Raised when attempted to open a cursor that was
previously opened.
96
NOT_LOGGED_ON
Raised when a database call was made without
being logged into ORACLE.
LOGIN_DENIED
Raised when login to ORACLE failed because of invalid
username and password.
OTHERS
This will be raised when the all other exceptions failed to
catch the errors.

It is possible to use WHEN OTHERS clause in the exception handling part of the
PL/SQL block. It will take care of all exceptions that are not taken care of in the
code.
The syntax for exception handling portion of PL/SQL block is as follows:
EXCEPTION
WHEN exception_1 THEN Statements;
WHEN exception_2 THEN Statements;
- - --- ---- -- --END;
In this syntax, exception_1 and exception_2 are the names of exceptions (may be
predefined or user-defined ). Statements in the PL/SQl code that will be executed
if the exception name is satisfied.
EXAMPLE-1:
This example writes PL/SQL code for validating accnt_id of Accounts table
so that it must not be left blank, if it is blank cursor should not be allowed to move to
the next field.
DECLARE
no_value
exception;
BEGIN
IF : Accounts.accnt_id IS NULL THEN
97
RAISE no_value;
ELSE
next_field;
END IF;
EXCEPTION
WHEN no_value THEN

Message ( 'account id cannot be, blank Please enter valid data !!! ' );
go_field ( : system.cursor_field );
END;
In the above example accnt_id field of Accounts table is checked for NULL
value. If it is NULL, then RAISE statement calls exception handler no_value.
This exception name no_value is declared in DECLARE section and defined in
the EXCEPTION section of PL/SQL block by using WHEN statement. no_value
is a user-defined exception.
EXAMPLE-2:
DECLARE
balance
Accounts.bal %TYPE;
acount_num
Accounts.accnt_id %TYPE;
BEGIN
SELECT accnt_id bal INTO account_num, balance
FROM Accounts WHERE accnt_id > 0000;
EXCEPTION
WHEN no_data_found THEN
Message ('empty table');
END;
In the above example exception is used in the PL/SQL block. This exception is
predefined internal PL/SQL exception (NO_DATA_FOUND).
98
Therefore, it does not require declaration in DECLARE section and RAISE

statement in BEGIN END portion of the block. Even though it is not raised, the
ORACLE server will raise this exception when there is no data in bal and accnt_id
field.
PL/SQL FUNCTIONS AND PROCEDURES :

PL/SQL allows you to define functions and procedures. These are similare to
functions and procedures defined in any other languages, and they are defined as one
PL/SQL block.
FUNCTIONS :
The syntax for defining a function is as follows :
FUNCTION name [ (argument-list) ] RETURN data-type {IS, AS}
Variable-declarations
BEGIN
Program-code
[ EXCEPTION
error-handling-code]
END;
In this syntax,
name
The name you want to give the function.
argument-list
List of input and/or output parameters for the functions.
data-type.
The data type of the function's return value.
Variable-declarations
Where you declare any variables that are local to the function.
program-code
Where you write PL/SQL statements that make up the
function.
error-handling-code
Where you write any error handling routine.
Notice that the function block is similar to the PL/SQL block that we discussed
earlier.
99
The keyword DECLARE has been replaced by FUNCTION header, which names
the function, describes the parameter and indicates the return type.
Functions can be called by using name( argument list )
Example:
FUNCTION check(b_exp in BOOLEAN,
True_number
in
NUMBER,
False_number in
NUMBER)
RETURN NUMBER IS
BEGIN
IF b_exp THEN RETURN true_number;
ELSE
RETURN false_number;
END IF;
END;
The above function can be called as follows.
Check ( 2 > 1, 1 , 0)
Check (5 = 0, 1, 0)
PROCEDURES:
The declaration of procedures is almost identical to that of function
and the syntax
is given below.
PROCEDURE name [(argument list)] {IS,AS}
Variable declaration
BEGIN
Program code
[EXCEPTION
Error handling code ]
END;
100
Here name is the name that you want to give the procedure and all other are
similar to function declaration. Procedure declaration resembles a function declaration
except that there is no data type and key word PROCEDURE is used instead of
FUNCTION.
Ex:
PROCEDURE swapn (A IN OUT NUMBER, B IN OUT NUMBER) IS

Temp_num
NUMBER;
BEGIN
Temp_num : = A;
A : = B;
B : = temp_num;
END;
The above procedure can be called as follows.
Swapn (3,4);
Swapn (-6,7);
DATABASE TRIGGERS :
PL/SQL can be used to write data base triggers. Triggers are used to define code
that is executed/fired when certain actions or event occur. At the data base level,
triggers can be defined for events such as inserting a record into a table, deleting a
record, and updating a record.
The trigger definition consists of following basic parts.

The events that fires the trigger
The database table on which event must occur
An optional condition controlling when the trigger is executed
A PL/SQL block containing the code to be executed when the trigger is fired.
A trigger is a data base object, like a table or an index. When you define a trigger,
it becomes a part of the database and it is always is executed when the event for
which it is defined occurs.
101
Syntax for creating a data base trigger is shown below.

CREATE [ or REPLACE ] TRIGGER trigger-name
{ BEFORE | AFTER } verb-list ON table-name
[ FOR EACH ROW [ WHEN condition ] ]
DECLARE
Declarations
BEGIN
PL/SQL code
END;
In the above syntax
REPLACE
Is used to recreate if trigger already exists.
trigger-name
Is the name of the trigger to be created.
verb-list
The SQL verbs that fire the Create, i.e. it may be INSERT,
UPDATE or DELETE.
table-name
The table on which the trigger is defined.
condition
An optional condition placed on the execution of the triggers.
declarations.
Consists of any variable, record or cursor declarations needed
by this PL/SQL blocks.
PL/SQL code
PL/SQL code that gets executed when the trigger fires.
EXAMPLE:
CREATE TRIGGER check_salary
BEGORE insert or update of S AL, JOB on EMP
FOR EACH ROW WHEN ( new. Job != 'director')
DECLARE
minsal
NUMBER;
maxsal
NUMBER;
102
BEGIN
SELECT min_sal, max_sal INTO minsal, maxsal
FROM salary-mast WHERE JOB = :new.job;
IF ( :new-sal < minsal or :new.sal > maxsal ) THEN
Message ( 'salary out of range' );
END IF;
END;
3.15 CURSOR IN PL/SQL:
PL/SQL cursors provide a way for your program to select multiple rows of data
from the database and then to process each row individually. Cursors are PL/SQL
constructs that enable you to process, one row at a time, the results of a multi row
query.
ORACLE uses work areas to execute SQL statements, PL/SQL allows user to
name private work areas and access the stored information. The PL/SQL
constructs to identify each and every work area used by SQL is called a Cursor.
There are 2 types of cursors.
Implicit cursors
Explicit cursors
Implicit cursors are declared by ORACLE for each UPDATE, DELETE and
INSERT SQL commands. Explicit cursors are declared and used by the user to
process multiple row, returned by SELECT statement.
The set of rows returned by a query is called the Active Set. Its size depends on
the number of rows that meet the search criteria of the SQL query. The data that is
stored in the cursor is called the Active Data Set.
103
ORACLE cursor is a mechanism used to easily process multiple rows of data.

Cursors contain a pointer that keeps track of current row being accessed, which
enables your program to process the rows at a time.
EXAMPLE:
When a user executes the following SELECT statement
SELECT emp_no, emp_name, job, salary
FROM employee
WHERE dept = 'physics'
The resultant dataset will be displayed as follows
1234
A. N. Sharanu
Asst. Professor
22,000.00
1345
N. Bharath
Senior Lecturer
17,000.00
1400
M. Mala
Lab Incharge
9,000.00
Table3.1
1) EXPLICIT CURSOR MANAGEMENT:
The following are the steps to using explicitly defined cursors within PL/SQL
Declare the cursor
Open the cursor
Fetch data from the cursor
Close the cursor
Declaring the cursor :
Declaring a cursor enables you to define the cursor and assign a name to it. It has
following syntax.
CURSOR cursor-name
IS SELECT statement
Ex: CURSOR c_name IS
SELECT emp_name FROM Emp WHERE dept = 'physics'
Opening a cursor:
104
Opening a cursor executes the query and identifies the active set that contains
all the rows, which meet the query search criteria.
Syntax :
OPEN cursor_name
Ex:
OPEN c_name
Open statement retrieves the records from the database and places it in the
cursor (private SQL area).
Fetching data from cursor:

The fetch statement retrieves the rows from the active set one row at a time. The
fetch statement is used usually used in conjunction with iterative process
( looping statements ). In iterative process the cursor advances to the next row in
the active set each time the fetch command is executed. The fetch command is the
only means to navigate through the active set.
Syntax :
FETCH cursor-name INTO record-list
Record-list is the list of variables that will receive the columns (fields ) from the active set.
Ex: LOOP
---------------------FETCH c_name INTO name;
----------END LOOP;
Closing a cursor :
Closing statement closes/deactivates/disables the previously opened cursor and
makes the active set undefined. Once it is closed, you cannot perform any
operations on it. Once a cursor is closed, the user can reopen the cursor by using
Open statement.
105
Syntax :
Ex:
CLOSE cursor_name
CLOSE c_name;
EXAMPLE-1 :
The HRD manager has decided to raise the salary for all the employees in
the physics department by 0.05. whenever any such raise is given to the employees, a
record for the same is maintained in the emp_raise table ( the data table definitions are
given below ). Write a PL/SQL block to update the salary of each employee and insert
the record in the emp_raise table.
Tabe: employee
emp_code
varchar (10)
emp_name
varchar (10)
dept
varchar (15)
job
varchar (15)
salary
number (6,2)
Table: emp_raise
emp_code Varchar(10)
raise_date
Date
raise_amt
Number(6,2)
Solution:
DECLARE
CURSOR c_emp IS
SELECT emp_code, salary FROM employee
WHERE dept = 'physics';
str_emp_code
employee.emp_code %TYPE;
num_salary
employee.salary %TYPE;
BEGIN
OPEN c_emp;
LOOP
FETCH c_emp INTO str_emp_code, num_salary;
UPDATE employee SET salary : = num_salary + (num_salary * 0.05)
106
WHERE emp_code = str_emp_code;

INSERT INTO emp_raise
VALUES ( str_emp_code, sysdate, num_salary * 0.05 );
END LOOP;
Commit;
CLOSE c_emp;
END;
2) EXPLICIT CURSOR ATTRIBUTES:

ORACLE provides certain attributes/cursor variables to control the execution of
the cursor. Whenever any cursor (explicit or implicit ) is opened and used,
ORACLE creates a set of four system variables via which ORACLE keep track of
the "current status" of the cursor. Programmers can access these variables. They
are
%NOT FOUND:
Evaluates to TRUE if the last fetch is failed i.e. no more rows are
left.
Syntax: cursor_name %NOT FOUND
%FOUND:
Evaluates to TRUE, when last fetch succeeded

Syntax: cursor_name %FOUND
%ISOPEN:
Evaluates to TRUE, if the cursor is opened, otherwise evaluates
to
FALSE.
%ROWCOUNT:
Syntax:
cursor_name %ISOPEN
Returns the number of rows fetched.

Syntax: cursor_name
EXAMPLE :
DECLARE
v_emp_name
varchar2(32);
v_salary_rate
number(6,2);
107
%ROWCOUNT.
v_payroll_total
number(9,2);
v_pay_type
char;
CURSOR c_emp IS
SELECT emp_name, pay_rate, pay_type FROM employee
WHERE emp_dept = 'physics'
BEGIN
IF c_name %ISOPEN THEN
RAISE
not_opened
ELSE
OPEN c_emp;
LOOP
FETCH c_emp INTO
v_emp_name,
v_salary_rate,
v_pay_type;
EXIT WHEN c_emp % NOTFOUND;
IF v_pay_type = 'S' THEN
v_payroll_total : = (v_salary_rate * 1.25 );
ELSE
v_payroll_total : = (v_salary_rate * 40);
END IF;
INSERT INTO weekly_salary VALUES ( 'v_payroll_total' );
END LOOP;
CLOSE c_emp;
EXCEPTION
WHEN not_opened
Message ( 'cursor is not opened' );
END;
REFERENCES:1. Teach yourself PL/SQL in 21 days
- SAMS Publications.
2. ORACLE-7
- Ivan Bayross.
2. ORACLE Developer 2000
- Ivan Bayross.
3. ORACLE Developers guide
- David McClanahan.
108
MODULE 4
4.1
Introduction
Measure Of Quality
We can discuss the goodness of relation schema in two levels.
1. Logical Level
All you know, logical level represent the middle level in three level architecture
of DBMS. The logical level describes how users interpret the relation schema and
the meaning of their attributes. Having good relation schema at this level enables
users to understand clearly the meaning of data in relations and hence to formulate
the queries correctly.
2 Implementation Level
It is the lowermost level in the DBMS architecture, which describes how the
tuples in base relations are stored and updated. This level applies only to the
storage level in the database. But the former logical level applies to both view
level and logical level. And the database becomes effective as much as with the
effective storage.
4.2
Database Design Techniques
Generally we can design the database in two different approaches.

1. Top-Down Design (Analysis) methodology
It starts with the major entities of their interest, their attributes and
their relationships. And then we add other entities and may split these
entities into a number of specialized entities and add the relationships
between these entities.
2. Bottom-Up Design
It starts with a set of attributes. And group these attributes into
entities. Then find out the relationship between these entities. Identify the
higher-level entities, generalized these entities and locate relationships at
this higher level.
109
Problems with bad schema

1. Redundant storage of data:
2. Wastage of disc space
3. More running time
Informal Design Guidelines for relation schema
There are four informal measures of quality for relation schema design.
1. Semantics of the relation attribute
For all attributes belonging to a relation will have certain real
world meaning and a proper interpretation associated with them. The
semantics specifies how to interpret the attribute values stored in a tuple of
the relation.
Guideline:-Design relation schema so that it is easy to explain its meaning.
Do not combine the attributes from multiple entity types and relationship
types into a single relation. Otherwise if the relation contains multiple
entities, the semantics will be unambiguous and the relation cannot be
easily explained.
2. Reducing the redundant values in tuples and update anomalies
One goal of schema design is to minimize the storage space that the
space relations occupy. The anomalies that may be present in the database
relation can be classified into three categories.
1. Insertion anomalies
2. Deletion anomalies
3. Modification anomalies
Guideline: - Design relation schema so that there is no insertion, deletion
and modification anomalies are present in relations. If any anomalies are
present note them clearly and make sure that the programs that update the
database will operate correctly.
3. Null values in tuples
The null value in tuples is wastage of storage space in storage level
and may also lead to problems with understanding the meaning of
attributes and with specifying join operations at the logical level. Another
problem is how to account the aggregate functions in null valued
attributes.
Guideline: - As far possible, avoid placing attributes in base relation whose
values frequently be null. If nulls are unavoidable, make sure that they are
applicable for exceptional cases only.
4.3 Constraints
110
Constraints on database can generally divided into four main

categories.
1. Inherent model based: Constraints that are inherent to the data
model called inherent model based constraint like, a relation
cannot have duplicate tuples.
2. Schema Based: Constraints that can be directly expressed in the
schemas of the data model, typically specifying in DDL are
called Schema based constraint.
3. Application Based:
Schemas expressed by the application
programs are called application-based constraints.

4. Data Dependency: It is the constraint that is related to the
dependency between the values in the relation.
Now we can go through the details of schema-based constraints. These
schema based constraints are expressed in relational model. It includes five basic
constraints.
1. Domain constraint
2. Key constraint
3. Entity integrity constraint
4. Referential integrity constraint
5. Constraint on nulls
4.4 Domain constraint

A domain D represents a set of atomic values. The data type describing the
type of values that can appear in each column is represented by this domain. i.e
each value in the domain indivisible as far as the relational model is concerned.
Specifying the data type from which the data values specifying the domain are
drawn specifies the domain. A domain is given a name, data type and format.
4.5 Entity-Integrity constraint

Entity integrity constraint states that no primary key value can be null. Primary
key value is used to identify tuples. Having null value in primary key implies we
cannot identify tuples. The key constraint and integrity constraint are specified on
individual relations.
4.6 Referential integrity constraint

The Referential integrity constraint is specified between two relations and is used
to maintain the consistency among tuples in two relations. Referential integrity
111
constraint states that a tuple in one relation that refer to another relation must refer
to an existing tuple in that relation. To define Referential integrity constraint first
we have to define the concept of foreign key (FK).
A set of attributes FK in relation schema R1, is a foreign key of R1 that references
the R2 relation if it satisfies the following two rules.
1. The attributes in FK have the same domain as the PK attributes of R2. The
attributes FK are said to reference or refer to the relation R2.
2. A value of FK in tuple t1 of the current state r1(R) either occurs as a value of
PK for some tuple t2 in the current state r2(R) or is null.
If
t1[FK]=t2[PK] then we can say that the tuple t1 refers to the tuple t2.
R1 is referencing relation
R2 is referenced relation.
Key constraint
A relation is defined as a set of tuples. By definition of a relation all the
tuples in a relation are distinct. i.e no two tuples can have the same value for all
their attributes. There are some subsets of relation schema R with the property that
no two tuples in any relation state r of R should have the same value for these
attributes.
A key K of a relation schema R is a super key of R with the additional property
that removing any attribute A from K leaves a set of attributes K that is not a
super key of R any more. Hence key satisfies the following two constraints.
1. The two distinct tuples in any state of the relation cannot have identical
values for all the attributes in the key.
2. It is a minimal super key. i.e from a super key we cannot remove any
attribute and still have the uniqueness constraint to hold the first condition.
Null Constraint
It specifies whether the null values are permitted to an attribute in a database.
4.7 Functional Dependency (FD)

Functional dependency is a constraint between two sets of attributes from the
database.
Definition: A FD denoted by X Y between two sets of attributes X and Y that
are subsets of R specifies a constraint on the possible tuples that can form a
112
relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that
have t1[X] =t2[X], they must also have t1[Y] =t2[Y]. i. e. the values of the Y
component of a tuple in r depends on the values of X component or the X
component determines the value of Y component.
i.e
1. The constraint on R states that there cannot be more than one tuple with
a given X value in any relation state r(R) X Y for any subset of
attributes Y of R.
2. If X Y in R doesnt say whether or not Y X in R.
A functional dependency FD: is called trivial if Y is a subset of X.
Definition: A functional dependency, denoted by XY, between two sets of
attributes X and Y that are subsets of the attributes of relation R, specifies that the
values in a tuple corresponding to the attributes in Y are uniquely determined by
the values corresponding to the attributes in X.
For example, the social security number uniquely determines a name;
SSN Name
Functional dependencies are determined by the semantics of the relation, in
general, they cannot be determined by inspection of an instance of the relation.
That is, a functional dependency is a constraint and not a property derived from a
relation.
Inference rules
Armstrong's axioms - sound and complete i.e, enable the computation of any
functional dependency.
Functional dependencies are
1. Reflexivity - if the B's are a subset of the A's then A B
2. Augmentation - If A B, then A, C B, C.
3. Transitivity - If A B and B C then A C.
Additional inference rules
4. Decomposition - If A B, C then A B
5. Union - If A B and A C then A B, C
6. Pseudo transitive - If A B and C, B D then C, A D
Equivalence of sets of functional dependencies
Two functional dependencies S & T are equivalent iff S T and T S.
The dependency {A_1, ..., A_n} {B_1, ..., B_m}
113
is trivial if the B's are a subset of the A's

is nontrivial if at least one of the B's is not among the A's
is completely nontrivial if none of the B's is also one of the A's
Closure (F+)
All dependencies that include F and that can be inferred from F using the above
rules are called closure of F denoted by F+.
Algorithm to compute closure

We have to find out whether F X Y. This is the case when X Y F + .For
the better method is to generate X+, closure of X under Fand test F X Y
using the first two axioms augmentation and reflexive rules.
Algorithm:
Input: A set of FD s F and asset of attributes X.
Output: The closure X+ of X under the FDs F+.
X+:=X;
Change=true;
While change do
Begin
Change: = False;
For each FD W Z in F do
Begin
If W Z then do
Begin
X+: = X+UZ;
Change: =True;
End;
End;
End;
4.8 Normalization
In relational database theory, normalization is the process of restructuring the
logical data model of a database to eliminate redundancy, organize data
114
efficiently, reduce repeating data and to reduce the potential for anomalies
during data operations. Data normalization also may improve data consistency
and simplify future extension of the logical data model. The formal
classifications used for describing a relational database's level of normalization
are called normal forms (NF).
A non-normalized database can suffer from data anomalies:
A non-normalized database may store data representing a particular referent in
multiple locations. An update to such data in some but not all of those locations
results in an update anomaly, yielding inconsistent data. A normalized database
prevents such an anomaly by storing such data (i.e. data other than primary
keys) in only one location.
A non-normalized database may have inappropriate dependencies, i.e.
relationships between data with no functional dependencies. Adding data to such
a database may require first adding the unrelated dependency. A normalized
database prevents such insertion anomalies by ensuring that database relations
mirror functional dependencies.
Similarly, such dependencies in non-normalized databases can hinder deletion.
That is, deleting data from such databases may require deleting data from the
inappropriate dependency. A normalized database prevents such deletion
anomalies by ensuring that all records are uniquely identifiable and contain no
extraneous information.
4.9 Normal forms
Edgar F. Codd originally defined the first three normal
The first normal form requires that tables be made up of a primary key and a
number of atomic fields, and the second and third deal with the relationship of
non-key fields to the primary key. These have been summarized as requiring that
all non-key fields be dependent on "the key, the whole key and nothing but the
key". In practice, most applications in 3NF are fully normalized. However,
research has identified potential update anomalies in 3NF databases. BCNF is a
further refinement of 3NF that attempts to eliminate such anomalies.
The fourth and fifth normal forms (4NF and 5NF) deal specifically with the
representation of many-many and one-many relationships. Sixth normal form
(6NF) only applies to temporal databases.
115
4.10. First normal form (1NF)

First normal form (1NF) lays the groundwork for an organized database
design:
Ensure that each table has a primary key: minimal set of attributes which
can uniquely identify a record. It states that the domain of an attribute must
include only atomic values and the value of any attribute in a tuple must be
single value from the domain of that attribute. It doesnt allow nested relation.
Data that is redundantly duplicated across multiple rows of a table is moved out
to a separate table.
Atomicity: Each attribute must contain a single value, not a set of values.
Eg: Consider a Relation Person. The person will have the attributes SSN, Name,
Age, Address and College_Degree.
Person
SSN Name
Address Age
College_Degree
Table 4.1
Now we can analyze this relation. Now check what are the possible values of
each attributes. Here SSN and Age will have only one value for a person. But
The college_Degree will have more than one value. And the address and Name
of person can be divided into more than one attributes. Hence this relation is not
in 1NF. So let us change this relation schema into 1NF by dividing this relation
into two relations.
Name FName, MInit, LName
Address ApartmentNo, City
Person_Residence
SSN FName
LName MInit ApartmentNo City
Table 4.2
College_Degree
SSN
UG
PG
116
Table 4.3
4.11Second normal form (2NF)
First, the table must be in 1NF, plus, we want to make sure that every
Non-Primary-Key attribute (field) is fully functionally dependent upon the
ENTIRE Primary-Key for its existence. This rule ONLY applies when you have
a multi-part (concatenated) Primary Key (PK).
It requires that data stored in a table with a composite primary key must not be
dependent on only part of the table's primary key. And the database must meet
all the requirements of the first normal form.
Take each non-key field, and ask this question: If I knew part of the PK, could I
tell what the non-key field would be.
Inventory
Description
Supplier
Cost
Supplier_Address
Table4.4
In this inventory table, Description combined with Supplier is our PK. This is
because we have two of the same product that come from different suppliers.
There are two non-key fields. So, we can ask the questions:
If we know just Description, can we find out Cost? No, because we have more
than one supplier for the same product.
If we know just Supplier, and we find out Cost? No, because we need to know
what the Item is as well.
Therefore, Cost is fully, functionally dependent upon the ENTIRE PK
(Description-Supplier) for its existence.
If we know just Description, can we find out Supplier Address? No, because
we have more than one supplier for the same product.
If we know just Supplier, Can we find out Supplier Address? Yes. The
Address does not depend upon the Description of the item.
Therefore, Supplier Address is NOT functionally dependent upon the ENTIRE PK
(Description-Supplier) for its existence.
We must get rid of Supplier Address from this table.
117
Inventory
Description
Supplier
Cost
Table 4.5
Supplier
Name
Supplier_Address
Table 4.6
At this point, since it is the "Supplier" table, we can rename the "Supplier"
filed to "Name." Name is the PK for this new table.
General Definition:
A relation schema R is in second normal form (2NF) if every nonprime
attribute A in R is not partially dependent on any key of R.
4.12 Third normal form (3NF)

For 3NF, first, the table must be in 2NF, plus, we want to make sure
that the non-key fields are dependent upon ONLY the PK, and not on any other
field in the table. This is very similar to 2NF, except that now you are
comparing the non-key fields to OTHER non-key fields.
For database to be in third normal form
1. The database must meet all the requirements of the second normal form.
2. Any field which is dependent not only on the primary key but also on another
field is moved out to a separate table.
Book
Name
Auth_Name
#Pages
Auth_Affil_No
Table 4.7
Again, just ask the questions:
If I know # of Pages, can I find out Author's Name? No. Can I find out Author's
affiliation No? No.
If I know Author's Name, can I find out # of Pages? No. Can I find out Author's
affiliation No? YES.
118
Therefore, Author's affiliation No is functionally dependent upon Author's

Name, not the PK for its existence.
Book
Name
Auth_Name
Auth_Name
#Pages
#Pages
Table 4.8
Author
Name
Auth_Affil_No
Table 4.9
General Definition:
A relation schema R is in 3NF if, whenever a nontrivial functional
dependency XA holds in R,
Either a) X is a Superkey
Or
b) Y is a prime attribute of R.
i.e. A relation schema R is in 3NF if every nonprime attribute of R meets both of

the following terms:
1. It is fully functionally dependent on every key of R.
2. It is nontransitively dependent on every key of R.
4.13 Boyce-Codd normal form (BCNF)

A row is in BCNF if and only if every determinant is a candidate key.
The second and third normal forms assume that all attributes not part of the
candidate keys depend on the candidate keys but does not deal with
dependencies within the keys. BCNF deals with such dependencies.
A relation R is said to be in BCNF if whenever X -> A holds in R, and A is not
in X, then X is a candidate key for R.
BCNF covers very specific situations where 3NF misses interdependencies
between non key attributes. It should be noted that most relations that are in
3NF are also in BCNF. Infrequently, a 3NF relation is not in BCNF and this
happens only if
(a) the candidate keys in the relation are composite keys (that is, they are not
single attributes),
119
(b) there is more than one candidate key in the relation, and
(c) the keys are not disjoint, that is, some attributes in the keys are common.
The BCNF differs from the 3NF only when there are more than one candidate
keys and the keys are composite and overlapping. Consider for example, the
relationship
enrol (sno, sname, cno, cname, date-enrolled)
Let us assume that the relation has the following candidate keys:
(sno, cno)
(sno, cname)
(sname, cno)
(sname, cname)
(we have assumed sname and cname are unique identifiers). The relation is in
3NF but not in BCNF because there are dependencies
sno -> sname
cno -> cname
where attributes that are part of a candidate key are dependent on part of
another candidate key. Such dependencies indicate that although the relation is
about some entity or association that is identified by the candidate keys
e.g. (sno, cno), there are attributes that are not about the whole thing that the
keys identify. For example, the above relation is about an association
(enrolment) between students and subjects and therefore the relation needs to
include only one identifier to identify students and one identifier to identify
subjects. Providing two identifiers about students (sno, sname) and two keys
about subjects (cno, cname) means that some information about students and
subjects that is not needed is being provided. This provision of information
will result in repetition of information and the anomalies. If we wish to include
further information about students and courses in the database, it should not be
done by putting the information in the present relation but by creating new
relations that represent information about entities student and subject.
These difficulties may be overcome by decomposing the above relation in the
following three relations:
(sno, sname)
(cno, cname)
(sno, cno, date-of-enrolment)
120
We now have a relation that only has information about students, another only
about subjects and the third only about enrolments. All the anomalies and
repetition of information have been removed.
4.14 Multivalued Dependency and Fourth normal form

In a relational model, if all of the information about an entity is to be
represented in one relation, it will be necessary to repeat all the information
other than the multivalue attribute value to represent all the information that
we wish to represent. This results in many tuples about the same instance of
the entity in the relation and the relation having a composite key (the entity id
and the mutlivalued attribute). Of course the other option suggested was to
represent this multivalue information in a separate relation. The situation of
course becomes much worse if an entity has more than one multivalued
attributes and these values are represented in one relation by a number of
tuples for each entity instance. The multivalued dependency relates to this
problem when more than one multivalued attributes exist. Consider the
following relation that represents an entity employee that has one mutlivalued
attribute proj:
emp (e#, dept, salary, proj)
We have so far considered normalization based on functional dependencies;
dependencies that apply only to single-valued facts. For example, e# -> dept
implies only one dept value for each value of e#. Not all information in a
database is single-valued, for example, proj in an employee relation may be
the list of all projects that the employee is currently working on. Although e#
determines the list of all projects that an employee is working on, e# -> proj is
not a functional dependency.
We can more clear the multivalued dependency by the following example.
programmer (emp_name, qualifications, languages)
This relation includes two multivalued attributes of entity programmer;
qualifications and languages. There are no functional dependencies.
The attributes qualifications and languages are assumed independent of each
other. If we were to consider qualifications and languages separate entities, we
would have two relationships (one between employees and qualifications and
121
the other between employees and programming languages). Both the above
relationships are many-to-many i.e. one programmer could have several
qualifications and may know several programming languages. Also one
qualification may be obtained by several programmers and one programming
language may be known to many programmers.
Functional dependency A -> B relates one value of A to one value of B while
multivalued dependency A ->> B defines a relationship in which a set of
values of attribute B are determined by a single value of A.
Now, more formally, X ->> Y is said to hold for R(X, Y, Z) if t1 and t2 are two
tuples in R that have the same values for attributes X and therefore with t1[x]
= t2[x] then R also contains tuples t3 and t4 (not necessarily distinct) such that
t1[x] = t2[x] = t3[x] = t4[x]
t3[Y] = t1[Y] and t3[Z] = t2[Z]
t4[Y] = t2[Y] and t4[Z] = t1[Z]
In other words if t1 and t2 are given by
t1 = [X, Y1, Z1], and
t2 = [X, Y2, Z2]
then there must be tuples t3 and t4 such that
t3 = [X, Y1, Z2], and
t4 = [X, Y2, Z1]
We are therefore insisting that every value of Y appears with every value of Z
to keep the relation instances consistent. In other words, the above conditions
insist that X alone determines Y and Z and there is no relationship between Y
and Z since Y and Z appear in every possible pair and hence these pairings
present no information and are of no significance.
Fourth normal form

Fourth normal form (or 4NF) requires that there be no non-trivial multivalued
dependencies of attribute sets on something other than a superset of a
candidate key. A table is said to be in 4NF if and only if it is in the BCNF and
multivalued dependencies are functional dependencies. The 4NF removes
unwanted data structures: multivalued dependencies.
122
Definition: A relation schema R is in 4NF with respect to a set of dependencies

if, for every non trivial multivalued dependency X ->>Y in F +, X is a superkey
for R.
Properties Of Relational Decompositions
Decomposition Property: A relation must satisfy the following two properties
during decomposition.
i. Lossless:- A lossless-join dependency is a property of decomposition, which
ensures that spurious rows are generated when relations are united through a
natural join operation. i.e. The information in an instance r of R must be
preserved in the instances r1, r2, r3, ..rk where ri = Ri (r)
Decomposition is lossless with respect to a set of functional dependencies F if,
for every relation instance r on R satisfying F,
R=R1 (r) * R2 (r) * . . . . . . . Rn (r)
ii. Dependency Preserving Property: - If a set of functional dependencies
hold on R it should be possible to enforce F by enforcing appropriate
dependencies on each r1 .
Decomposition D= (R1, R2, R3, , Rk) of schema R preserves a set of
dependencies F if,
(R1 (F) U R2 (F) U . . . . . . . . . . . Rn (F)) +=F+
Ri(F) is the projection of F onto Ri.
i.e Any FD that logically follows from F must also logically follows from the
union of projection of F onto Ri S . Then D is called dependency preserving.
4.15 Join Dependency and Fifth Normal Form

Join dependency is the term used to indicate the property of a relation
schema that cannot be decomposed losslesly into two relation schema, but can
be decomposed losslesly into three or more simpler relation schema. It means
that a table, after it has been decomposed into three or more smaller tables
must be capable of being joined again on common keys to form the original
table.
Fifth normal form
Fifth normal form (5NF and also PJ/NF) requires that there are no non-trivial
join dependencies that do not follow from the key constraints. A table is said
123
to be in the 5NF if and only if it is in 4NF and the candidate keys imply every
join dependency in it.
4.16 Pitfalls in Relational Database Design.

A bad design may have several properties, including:
Repetition of information.
Inability to represent certain information.
Loss of information.
Module 5
.
5.1Distributed database concepts

A distributed computing system consists of a number of processing elements that
are interconnected by a computer network and that co-operate in performing certain
assigned tasks.
A distributed database (DDB) is a collection of multiple logically interrelated
databases distributed over a computer network. A distributed database management
system (DDBMS) is a software system that manages a distributed database while
making the distribution transparent to the user. At the physical hardware level, the
following main factors distinguish a DDBMS from a centralized system:
There are multiple computers called sites or nodes.
These sites must be connected by some type of communication

network to transmit data and commands among sites.
Parallel versus Distributed technology There are two main types of

multiprocessor system architecture:
124
Shared memory (tightly coupled) architecture: Multiple memory share

secondary (disk) storage and also share primary memory.
Shared disk (loosely coupled) architecture: Multiple processors share

secondary (disk) storage but each has their own primary memory.
Database management systems developed using the above types of architectures are
termed parallel database management systems; rather than DDBMS they utilize
parallel processor technology. In another type of architecture called shared nothing
architecture, every processor has its own primary and secondary (disk) memory, no
common memory exists and the processors communicate over a high-speed
interconnection network. Although the shared nothing architecture resembles a
distributed database computing environment, major differences exist in the mode of
operation. In shared nothing architecture, there is symmetry and homogeneity of
nodes; this is not true of the distributed database environment where heterogeneity of
nodes is very common.Advantages of Distributed Databases:1. Management of distributed data with different levels of transparency: Ideally,
a DBMS should be distribution transparent in the sense of hiding the details of where
each file is physically stored within the system. The following types of transparencies
are possible:
Distribution or network transparency: This refers to the freedom for the user
from the operational details of the network. It may be divided into location
transparency and naming transparency. Location transparency refers to the
fact that the command used to perform a task is independent of the location of
data and the location of the system where the command was issued. Naming
transparency implies that once a name is specified, the named objects can be
accessed unambiguously without additional specification.
Replication transparency: Copies of data may be stored at multiple sites for

better availability, performance and reliability. Replication transparency makes
the user unaware of the existence of fragments.
Fragmentation transparency: Fragmentation makes the user unaware of the

existence of fragments.
2. Increased availability and reliability: Reliability is defined as the probability that

a system is running at a certain time point. Availability is the probability that the
system is continuously available during a time interval. When the data and DBMS
125
software are distributed over several sites one site may fail while other sites continue
to operate. Only the data and software that exist at the failed state cannot be accessed.
This improves both reliability and availability.
3. Improved performance: A distributed DBMS fragments the database by keeping
the data closer to where it is needed most. Data localization reduces the contention for
CPU and I/O services and simultaneously reduces access delays involved in wide area
networks. When a large database is distributed over multiple sites, smaller databases
exist at each site. As a result, local queries and transactions accessing data at a single
site have better performance because of the small local databases.
Moreover,
interquery and intraquery parallelism can be achieved by executing multiple queries at

different sites.
4. Easier expansion: In a distributed environment, expansion of the system in terms
of adding more data, increasing database sizes or adding more processors is much
easier.
Additional Functions of Distributed Databases
1. Keeping track of data: The ability to keep track of the data distribution,
fragmentation and replication by expanding the DBMS catalog.
2. Distributed Query processing: The ability to access remote sites and transmit
queries and data among the various sites via a communication network.
3. Distributed transaction management: The ability to devise execution strategies for
queries and transactions that access data from more than one site and to synchronize
the access to distributed data and maintain integrity of the overall database.
4. Replicated data management: The ability to decide which copy of a replicated data
item to access and to maintain the consistency of copies of a replicated data item.
5. Distributed database recovery: The ability to recover from individual site crashes
and from new types of failures such as the failure of communication links.
6. Security: Distributed transactions must be executed with the proper management of
the security of the data and the authorization/access privileges of users.
7. Distributed directory (catalog) management: A directory contains information
(metadata) about data in the database.
5.2 Data Fragmentation
126
This is the process of breaking up the database into logical units called fragments,
which may be assigned for storage at the various sites. There are mainly two types of
fragmentation:
Horizontal fragmentation
Vertical fragmentation
a) Horizontal fragmentation A horizontal fragment of a relation is a subset of the

tuples in that relation. The tuples that belong to the horizontal fragment are specified
by a condition on one or more attributes of the relation. Often, only a single attribute
is involved. Horizontal fragmentation divides a relation horizontally by grouping
rows to create subset of tuples, where each subset has a certain logical meaning.
These fragments can be assigned to different sites in the distributed system. Derived
horizontal fragmentation applies the partitioning of a primary relation to other
secondary relations which are related to the primary via a foreign key. Each horizontal
fragment on a relation R can be specified by a Ci(R) operation in the relational
algebra. A set of horizontal fragments whose conditions C1, C2, ., Cn include all
the tuples in R (i.e. every tuples in R satisfies (C1 or C2 or ..or Cn)) is called a
complete horizontal fragmentation of R. In many cases, a complete horizontal
fragmentation is also disjoint; i.e. no tuple in R satisfies (Ci and Cj) for any i j.
b) Vertical fragmentation Vertical fragmentation divides a relation vertically by
columns. A vertical fragment of a relation keeps only certain attributes of the relation.
It is necessary to include the primary key or some candidate key attribute in every
vertical fragment so that the full relation can be reconstructed from the fragments. For
e.g.: Consider the schema Employee (Name, Bdate, Address, Sex, SSN, Salary, DNo).
We want to fragment this relation into 2 vertical fragments. The first fragment
includes personal information Name, Address, Bdate and Sex and the second
fragment includes work related information SSN, Salary and DNo. This
fragmentation is not proper because, if the two fragments are stored separately we
cannot put the original employee tuples back together, since there is no common
attribute between the two fragments. Hence we must add SSN attribute to the personal
information fragment also. A vertical fragment on a relation R can be specified by a
Li(R) operation in the relational algebra. A set of vertical fragments whose projection
lists L1, L2, ., Ln include all the attributes in R but share only the primary key
127
attribute of R is called a complete vertical fragmentation of R. In this case, the

projection lists satisfy the following conditions:
1. L1 U L2 U..U Ln = ATTRS(R)
2. Li Lj = PK(R) for any i j, where ATTRS(R) is the set of attributes of R and
PK(R) is the primary key of R.
c) Mixed (Hybrid) fragmentations Mixed fragmentation is the combination of
vertical fragmentation and horizontal fragmentation. In general a fragment of a
relation can be constructed by a SELECT-PROJECT combination of operations
L(C(R)).
If C = True and L ATTRS(R), we get a vertical fragment.
If C True and L = ATTRS(R), we get a horizontal fragment.
If C True and L ATTRS(R), we get a mixed fragment.
d) Fragmentation schema A fragmentation schema of a database is a definition of

a set of fragments that includes all attributes and tuples in the database and satisfies
the condition that the whole database can be reconstructed from the fragments by
applying some sequence of OUTER UNION and UNION operations.
e) Allocation schema An allocation schema describes the allocation of fragments to
sites of the DDBS; hence it is a mapping that specifies for each fragment the site(s) at
which it is stored.
5.3 Data Replication and Allocation

If a fragment is stored at more than one site, it is said to be replicated.
a) Fully replicated distributed database If the replication of whole database is
done at every site in the distributed system, the resulting database is called a fully
replicated distributed database. This can improve availability remarkably because the
system can continue to operate as long as at least one site is up. It also improves
performance of retrieval for global queries. The disadvantage of full replication is that
it can slow down update operations drastically.
b) Nonredundant allocation In this system, each fragment is stored at exactly one
site. In this case, all fragments must be disjoint except for the repetition of primary
keys among vertical (or mixed) fragments.
128
c) Partial replication In this system, some fragments of the database may be

replicated whereas others may not. The number of copies of each fragment can range
from one up to the total number of sites in the distributed system.
d) Replication schema A description of the replication of fragments is called
replication schema. Each fragment or each copy of a fragment must be assigned to
a particular site in the distributed system. This process is called data distribution or
data allocation.
5.4 Types of Distributed Database Systems

The term distributed database management system can describe various systems that
dif fer from one another in many respects. The main thing that all such systems have
in com. mon is the fact that data and software are distributed over multiple sites
connected by some form of communication network.
The first factor we consider is the qegree of homogen.eity of the DDBMS
software. If all servers (or individual local DBMSs) use identical soft.ware and all
users (clients) use identical software, the DDBMS is called homogeneous; 'otherwise,
it is called heterogeneous. Another factor related to the degree of homogeneity is the
degree of local auton. omy. If there is no provision for the local site to function as a
stand-alone DBMS, then the system has no local autonomy. On the other hand, if
direct access by local transactions to a server is permitted, the system has some
degree of local autonomy.
At one extreme of the autonomy spectrum, we have a DDBMS that "looks like" a
centralized DBMS to the user. A single conceptual schema exists, and all access to the
system is obtained through a site that is part of the 'DDBMS--which means that no
local autonomv exists. At the other extreme we encounter a type of DDBMS called a
federated DDBMS (or a multidatabase system). In such a system, each server is an
fndependent and autonomous centralized DBMS that has its own local users, local
transactions, and DBA and hence has a very high degree of local autonomy. The term
federated database system (FOBS) is used when there is some global view or schema
of the federation of databases that is shared by the applications. On the other hand, a
multidatabase system does not have a global schema and interactively constructs one
as needed by the application. Both systems are hybrids between distributed and
centralized systems and the distinction we made between them is not strictly
followed. We will refer to them as FDBSs in a generic sense.
129
In a heterogeneous FOBS, one server may be a relational DBMS, another a network

DBMS, and a third an object or hierarchical DBMS; in such a case it is necessary to
have a canonical system language and to include language translators to translate
subqueries nom the canonical language to the language of each server. We briefly
discuss the issues affecting the design of FDBSs below.
Federated Database Management Systems Issues
. The type of heterogeneity present inFDBSs may arise from several sources.
.. Differences in data models: Databases in an organization come from a

variety of data models including, the relational data model, the object data
model, etc.The modeling capabilities of the models vary. Hence, to deal with
them uniformly via a single global schema or to process them in a single
language is challenging. Even if two databases are both from the RDBMS
environment, the same information may be represented as an attribute name,
as a relation name, or as a value in different databases. This calls for an
intelligent query-processing mechanism that can relate information based on
metadata.
. Differences in constraints: Constraint facilities for specification and

implementation vary from system to system-. There are comparable features
that must be reconciled in the construction of a global schema. For example,
the relationships from ER models are represented as referential integrity
constraints in the relational model. Triggers may have to be used to
implement certain constraints in the relational model. The global schema
must also deal with potential conflicts among constraints.
. Differences in query languages: Even with the same data model, the
languages and their versions vary. For example, SQL has multiple versions
like SQL-89, SQL-92 (SQL2), and SQL3, and each system has its own set of
data types, comparison operators, string manipulation features, and so on.
Semantic Heterogeneity.
Semantic heterogeneity occurs when there are differences in the meaning,
interpretation, and intended use of the same or related data. Semantic heterogeneity
130
among component database systems (OBSs) creates tne mggest hurdle in designing
global schemas of heterogeneous databases. The design autonQmy of component
OBSs refers to their freedom of choosing the following design parameters, which In
turn affect the eventual complexity of the FOBS:
. The universe of discourse from which the data is drawn: For example, two
customer accounts databases in the federation may be from United States and
Japan with entirely different sets of attributes about customer accounts
required by the accounting practices. Currency rate fluctuations would also
present a problem. Hence, relations in these two databases which have
identical nameS---CUSTOMER or ACCOUNT may have some common and
some entirely distinct information.
. Representation and naming: The representation and naming of data

elements and,the structure of the data model may be prespecified for each
local database.
. The understanding, meaning, and subjective interpretation of data.

This is a chief contributor to semantic heterogeneity
. Transaction,and policy constraints: These deal with serializability criteria,

compensating transactions, and other transaction policies.
Derivation of summaries: Aggregation, summarization, and other dataprocessing features and operations supported by the system.
5.5Query Processing in Distributed Databases

Data Transfer Costs of Distributed Query Processing
In a distributed system, several additional factors further complicate
query processing. The first is the cost of transferring data over the network. This
data includes intermediate files that are transferred to other sites for further
processing, as well as the final result files that may have to be transferred to the
site where the query result is needed. Although these costs may not be very high
if the sites are connected via a high-performance local area network, they
become quite significant in other types of networks. Hence, OOBMS query
optimization algorithms consider the goal of n~ducing the amount of data
transfer as an optimization criterion in choosing a distributed query execution
131
strategy.
Distributed Query Processing Using Semijoin

The idea behind distributed query processing using the semi join operation
is to reduce the number of tuples in a relation before transferring it to another site.
Intuitively, the idea is 10 send the joining column of one relation R to the site
where the other relation S is located; this column is then joined with S. Following
that, the join attributes, along with rheattributes required in the result, are
projected out and shipped back to the original site and joined with R. Hence, only
the joining column of R is transferred in one direction, and a subset of S with no
extraneous tuples or attributes is transferred in the other direction.lf only a small
fraction of the tuples in S participate in the join, this can be quite an efficient
solution to minimizing data transfer.
Query and Update Decomposition

In a DDBMS with no distribution transparency, the user phrases a query directly in
terms of specific fragments.
The user must also maintain consistency of replicated data items when updating a
DDBMS with no replication transparency.
On the other hand, a DDBMS that supports full distribution, fragmentation, and
replication transparency allows the user to specify a query or update request on
the schema just as though the DBMS were centralized. For updates, the DDBMS
is responsible for maintaining consistency among replicated items by using one of
the distributed concurrency control algorithms. For queries, a query decomposi.
tion module must break up or decompose a query into subqueries that can be
executed at the individual sites. In addition, a strategy for combining the results of
the subqueries to form the query result must be generated. Whenever the DDBMS
determines that an item referenced in the query is replicated, it must choose or
materialize a particular replica during query execution.
To determine which replicas include the data items referenced in a query, the
DDBMS refers to the fragmentation, replication, and distribution information
stored in the DDBMS catalog. For vertical fragmentation, the attribute list for
each fragment is kept in the catalog. For horizontal fragmentation, a condition,
sometimes called a guard, is kept for each fragment. This is basically a selection
condition that specifies which tuples exist in the fragment; it is called a guard
132
because only tuples that satisfy this condition are permitted to be stored in the
fragment. For mixed fragments, both the attribute list and the guard can. dition are
kept in the catalog.
5.6 Concurrency Control and Recovery in Distributed Databases

For concurrency control and recovery purposes, numerous problems arise in a
distributed DBMS environment that are not encountered in a centralized DBMS
environment. These include the following:
Dealing with multiple copies of the data items: The concurrency control
method is responsible for maintaining consistency among these copies. The
recovery method is responsible for making a copy consistent with other
copies if the site on which the copy is stored fails and recovers later.
Failure of individual sites: The DDBMS should continue to operate with its
running sites, if possible, when one or more individual sites fail. When a site
recovers, its local database must be brought up to date with the rest of the
sites before it rejoins the system.
Failure of communication links: The system must be able to deal with failure
of one or more of the communication links that connect the sites. An extreme
case of this problem is that network partitioning may occur. This breaks up the
sites into two or more partitions, where the sites within each partition can
communicate only with one another and not with sites in other partitions. .
Distributed commit: Problems can arise with committing a transaction that is

accessing databases stored on multiple sites if some sites fail during the
commit process. The two-phase commit protocol (see Chapter 21) is often
used to deal with this problem.
Distributed deadlock: Deadlock may occur among several sites, so techniques

for dealing with deadlocks must be extended to take this into account.
133
References
1. Fundamentals of Database System Elmasri and Navathe (3rd Edition),Pearson
Education Asia
2. Database System Concepts
- Henry F Korth, Abraham Silbershatz, Mc
Graw Hill 2nd edition.

3. An Introduction to Database Systems -
C.J.Date
(7th
Edition)
Pearson
Education Asia
4. Database Principles, Programming and Performance Patrick ONeil, Elizabeth
ONeil
5. An Introduction to Database Systems - Bibin C. Desai
6. Teach yourself PL/SQL in 21 days
7. SQL,PLSQL
- SAMS Publications.
- Ivan Bayross.
8. ORACLE Developers guide
- David McClanahan.
134

Database Management System Features

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Database Management System Features

Uploaded by

Copyright:

Available Formats

Database Management system

Dept of Computer Science & Engg, VJCET

1.1 Basic Concepts

1.2 Purpose of database systems

Data redundancy and inconsistency