Professional Documents
Culture Documents
MODULE I
Data Isolation
Data are scattered in different files and files may be in various formats.
So it is difficult extract the appropriate data.
Integrity problems
The constraint of data is enforced through the programs by appropriate
code. So if we need to add a new constraint, we have to change the
code. Then, it is very difficult to add or change the constraints. The
problem will be compounded when constraints involves several
constraints from different files.
Atomicity problems
Suppose a failure occurs during execution of the program. Then the
execution stops in the middle of the program resulting in an
inconsistency. But the execution of a program should end to a
consistency state. For a traditional file system the failure mostly result
to an inconsistency state.
And in department have another record for students to keep their marks and
progress. Even though both office and Department interested in data about
students, each user maintains separate files, because each user requires some
data that is not available from other user.
Now what are the features of database approach?
Database system is
1. Self describing:
i.e. The database system contains not only the database itself but also a
complete definition or description and structure of database . This structure
is stored in a catalog with type, storage format and constraints as I
mentioned earlier. The information stored in database is called meta-data.
2. Data security
The DBMS can prevent unauthorized users from viewing or updating the
database. Using passwords, users are allowed access to the entire database
or a subset of it known as a "subschema." For example, in a student
database, some users may be able to view payment details while others
may view only mark list of students.
3. Data Integrity
The DBMS can ensure that no more than one user can update the same
record at the same time. It can keep duplicate records out of the database;
for example, no two customers with the same customer number can be
entered.
4. Interactive Query
Most DBMSs provide query languages and report writers that let users
interactively interrogate the database and analyze its data. This important
feature gives users access to all management information as needed. i.e.
we will get easily all details of each student at any time.
5. Interactive Data Entry and Updating
Many DBMSs provide a way to interactively enter and edit data, allowing
you to manage your own files and databases. However, interactive
operation does not leave an audit trail and does not provide the controls
necessary in a large organization. These controls must be programmed into
the data entry and update programs of the application.
6. Data Independence
With DBMSs, the details of the data structure are not stated in each
application program. The program asks the DBMS for data by field name;
for example, a coded equivalent of "give me customer name and balance
due" would be sent to the DBMS. Without a DBMS, the programmer must
reserve space for the full structure of the record in the program. Any
change in data structure requires changing all application programs.
Data:
Data stored in a database include numerical data which may be
Standard operations:
DDL is the language used to describe the contents of the database. It is used to describe, for example, attribute names
(field names), data types, location in the database, etc.
commands for input, edit, analysis, output, reformatting, etc. Some degree
of standardization has been achieved with SQL (Structured Query
Language).
Programming tools:
Besides commands and queries, the database should be accessible
File structures:
Every DBMS has its own internal structures used to organize the
data although some common data models are used by most DBMS.
Abstraction
We all know that each application program have some data relevant
to a particular task. And an application program needs to use a portion of data,
which is used by some other programs. In early days of computerization, each
application programmer designs the file structure, metadata of the file and the
access method each record. That is, each application program use its own data,
details concerning the structure of data as well as the access and to interpret
each data. The application programs are implemented independently and by
hence itself, any change in storage media requires changes to these structures
and access methods. Because the files were structured for one application, it was
difficult to use the data in these files to new applications requiring data from
several files belonging to different existing applications.
Eg: Consider two application programs that require the data on an entity
set EMPLOYEE. The first application program involves the public relation
department sending each employee a news letter and related material. This
application program interested in the record type EMPLOYEE, that
containing
the
values
for
the
attributes
of
EMPL_Name
and
EMPL_Address.
..View 1
View 2
View 3
..
Logical Level
Defined by DBA
Physical Level
optimization
Fig 1.1
View 4
adequate controls are needed to control users updating data and to control
data quality. With increased number of users accessing data directly, there
are enormous opportunities for users to damage the data. Unless there are
suitable controls, the data quality may be compromised.
3. Data Integrity
Since a large number of users could be using a database
concurrently, we should have to ensure that data remain correct during
operation. The main threat to data integrity comes from several different
users attempting to update the same data at the same time. The database
therefore needs to be protected against accidental changes by the users.
4. Enterprise Vulnerability
Centralizing all data of an enterprise in one database may mean
that the database becomes critical resource. The survival of the enterprise
may depend on reliable information being available from its database. The
enterprise therefore becomes vulnerable to the destruction of the database or
to unauthorized modification of the database.
5. The Cost of using a DBMS
Conventional data processing systems are typically designed to
run a number of well-defined, preplanned processes. Such systems are often
"tuned" to run efficiently for the processes that they were designed for.
Although the conventional systems are usually fairly inflexible in that new
applications may be difficult to implement and/or expensive to run, they are
usually very efficient for the applications they are designed for.
The database approach on the other hand provides a
flexible alternative where new applications can be developed relatively
inexpensively. The flexible approach is not without its costs and one of these
costs is the additional cost of running applications that
the conventional system was designed for. Using standardized software is
almost always less machine efficient than specialized software.
10
In some case the attribute values can be related so that one can be
derived from the other. Consider a person as an entity. The attributes age and
DateOfBirth of person is
related. i.e. the age of a person can be derived from the current date and his
DateOfBirth. The age attribute hence is called Derived attribute and the
DateOfBirth is called stored attribute from where age of person calculated.
Entity set
An entity set is a set of entities of the same type that share the
same properties, or attributes. It is represented by a set of attributes. An
attribute, as used in the E-R model can be characterized by the following
attributes.
Null attributes
Derived attributes
primary key is used to denote the candidate key that is chosen by the
database designer to identify an entity from an entity set. A key (super,
11
candidate and primary) is a property of the entity set rather than the
individual entities.
Entity- Relationship (E-R) Diagram
The overall logical structure of a database can be expressed graphically by
an E-R diagram. The diagram consists of the following major components.
Lines: links attribute set to entity set and entity set to relationship set.
For eg: Consider an E-R diagram, which consists of two entity sets
customer and loan.
Addr
Designatio
n
Emp_i
d
Produc
t
Salary
Wor
ks
For
Employee
Location
Company
Fig1.2
A data model is a plan for building a database. The model represents
data conceptually, the way the user sees it, rather than how computers store
it. Data models focus on required data elements and associations; most often
they are expressed graphically using
12
Customer
Order
13
Fig 1.3
Advantages:
Hierarchical Model is simple to construct and operate on
Corresponds to a number of natural hierarchically organized domains e.g., assemblies in manufacturing, personnel organization in companies
Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT,
GET NEXT WITHIN PARENT etc.
Disadvantages:
Navigational and procedural nature of processing
Database is visualized as a linear arrangement of records
Little scope for "query optimization"
Customer
Product
Order
14
Fig 1.4
Advantages:
Network Model is able to model complex relationships and represents
semantics of add/delete on the relationships.
Can handle most situations for modeling using record types and
relationship types.
Language is navigational; uses constructs like FIND, FIND member, FIND
owner, FIND NEXT within set, GET etc. Programmers can do optimal
navigation through the database.
Disadvantages:
Navigational and procedural nature of processing
Database contains a complex array of pointers that thread through a set of
records.
Little scope for automated "query optimization"
1.11Object-Oriented Model
Object DBMSs add database functionality to object programming
languages. They bring much more than persistent storage of
programming language objects. Object DBMSs
extend the semantics of the C++, Smalltalk and Java object programming
languages to provide full-featured database programming capability,
while retaining native language compatibility. A major benefit of this
approach is the unification of the application and database development
into a seamless data model and language environment. As a result,
applications require less code, use more natural data modeling, and code
bases are easier to maintain. Object developers can write complete
database applications with a modest
15
Objects
Order
Fig 1.5
16
Customer
Order
Fig 1.6
17
Product
MODULE 2
2.1 Basic Structure of relational model - The relational model for database
management is a data model based on predicate logic and set theory. It was invented
by Edgar Codd. The fundamental assumption of the relational model is that all data
are represented as mathematical n-ary relations, an n-ary relation being a subset of
the Cartesian product of n sets.
1) Relation The fundamental organizational structure for data in the relational model
is the relation. A relation is a two-dimensional table made up of rows and columns.
Each relation also called a table, stores data about entities.
2) Tuples - The rows in a relation are called tuples. They represent specific
occurrences (or records) of an entity. Each row consists of a sequence of values, one
for each column in the table. In addition, each row (or record) in a table must be
unique. A tuple variable is a variable that stand for a tuple.
3) Attributes The column in a relation is called attribute. The attributes represent
characteristics of an entity.
4) Domain For each attribute there is a set of permitted values called domain of that
attribute. For all relations r, the domain of all attributes of r should be atomic. A
domain is said to be atomic if elements of the domain are considered to be indivisible
units.
2.3 Keys A key is the relational means of specifying uniqueness. The keys
applicable in relational model are primary key, candidate key and super key.
18
1.) Primary key - A primary key is a value that can be used to identify a unique
row in a table. Attributes are associated with it.
2.) Candidate key - A candidate key of a relation variable is a set of attributes of that
relation variable such that (1) at all times it holds in the relation assigned to that
variable that there are no two distinct tuples with the same values for these attributes
and (2) there is not a proper subset for which (1) holds.
3.) Super key - A superkey is defined in the relational model as a set of attributes of a
relation variable for which it holds that in all relations assigned to that variable there
are no two distinct tuples that have the same values for the attributes in this set.
4.) Foreign key - A foreign key is a field or group of fields in a database record that
point to a key field or group of fields forming a key of another database record in
some (usually different) table. A relation schema, r1, derived from an E-R schema
may include among its attributes the primary key of another relation schema, r2. This
attribute is the foreign key from r1, referencing r2. The relation r1 is called the
referencing relation of the foreign key dependency and r2 is called the referenced
relation of r2.
2.4 Schema diagram A database schema, along with primary key and foreign
key dependencies, can be depicted pictorially by schema diagrams. Each relation in
the database schema is represented as a box, with the attributes listed inside it and the
relation name above it. If there are primary key attributes, a horizontal line crosses the
box, with the primary key attributes listed above the line. Foreign key dependencies
appear as arrows from the foreign key attributes of the referencing relation to the
foreign key attributes of the referenced relation.
19
Borrow relation
Branch name
Loan#
Amount
17
Customer
name
Jones
Downtown
Round Hill
23
Smith
2000
Redwood
13
Hayes
1300
1000
Table 2.1
Branch relation
Branch name
Branch city
Assets
Downtown
Brooklyn
9000000
Round Hill
Horseneck
21000000
Table 2.2
To
select
tuples (rows) of the borrow relation where the branch is Redwood, we would
write
Redwood
Palo Alto
17000000
branchname,customername(borrow)
3) Union operation The union operation is a binary operation since it involves 2
relations. It is used to retrieve tuples appearing in either or both the relations
participating in the UNION. It is denoted as U. For a union operation RUS to be legal,
we require that
o R and S must have the same number of attributes.
o The domains of the corresponding attributes must be the same.
20
4) Set difference The set difference operation is a binary operation. Set difference is
denoted by the minus sign ( ). It finds tuples that are in one relation, but not in
another. Thus R-S results in a relation containing tuples that are in R but not in S.
5) Cartesian product This is a binary operation involving 2 relations. It is used to
obtain all possible combination of tuples from two relations. The cartesian product
of two relations is denoted by a cross ( ), written R1 x R2 for relations R1 and R2.
The result of R1 x R2 is a new relation with a tuple for each possible pairing of tuples
from R1 and R2. In order to avoid ambiguity, the attribute names have attached to
them the name of the relation from which they came. If no ambiguity will result, we
drop the relation name. If R1 has n tuples, and R2 has m tuples, then R=R1 x R2 will
have mxn tuples.
6) Rename The rename operation solves the problems that occur with naming
when performing the cartesian product of a relation with itself.
Suppose we want to find the names of all the customers who live on the same street
and in the same city as Smith.
Customer name
Customer street
Customer city
Jones
Main
Harrison
Smith
North
Rye
Hayes
Main
Harrison
To find other customers with the same information, we need to reference the customer
relation again:
21
If we use this to rename one of the two customer relations we are using, the
ambiguities will disappear.
Additional operations
1. Set Intersection - Set intersection is denoted by
contains tuples that are in both of its argument relations. It does not add any power as
Eg: Consider the depositor and borrower relations. If we want to find all customers
who have both a loan and an account, we have to take the intersection of two
relations. It can be written as customer name(borrower) customer name(depositor).
S where R
and S are relations. The result of the natural join is the set of all combinations of
tuples in R and S that are equal on their common attribute names.
Consider R and S to be sets of attributes. We denote attributes appearing in both
relations by R S. We denote attributes in either or both relations by RUS. Consider
two relations r(R) and s(S). The natural join of r and s, denoted by r
s is a relation
s = RUS
R S = {1, 2,., n }
For an example consider the tables Employee and Dept and their natural join:
Employee
Dept DeptName
Name EmpId
DeptName
Harry
3415Manager
Finance
Sally
2241
Sales
22
Harriet 2202
Sales
Sales
Harriet
Production Charles
Finance
George
Table 2.4
Table 2.5
Employee
Dept
Harry 3415
Finance
George
Sally
Sales
Harriet
George 3401
Finance
George
Harriet 2202
Sales
Harriet
2241
Table 2.6
3. Equi-join - If we want to combine tuples from two relations where the combination
condition is not simply the equality of shared attributes then it is convenient to have a
more general form of join operator, which is the -join (or theta-join). The -join is a
23
operands by "fill" values for each of the attributes of the other operand. Three outer
join operators are defined: left outer join, right outer join, and full outer join.
Left Outer join - The left outer join is written as R =X S where R and S are relations.
The result of the left outer join is the set of all combinations of tuples in R and S that
are equal on their common attribute names, in addition to tuples in R that have no
matching tuples in S. For an example consider the tables Employee and Dept and their
left outer join:
Dept
Employee
DeptName Manager
Sales
Harriet
Table 2.9
Harry 3415
Finance
Sally
Sales
2241
George 3401
Finance
Harriet 2202
Sales
Tim
Executive
1123
Production Charles
The left outer join can be simulated using the natural join and
set union as follows:
24
R =X S = R (R
S)
Employee =X Dept
Table 2.10
Name EmpId DeptName Manager
Harry 3415
Finance
George 3401
Finance
Tim
1123
Executive
Sally
2241
Sales
Harriet 2202
Sales
Harriet
Harriet
25
Table 2.11
Employee
Harry 3415
Finance
Sally
Sales
DeptName Manager
George 3401
Finance
Sales
Harriet 2202
Sales
Production Charles
Tim
Executive
2241
1123
Dept
Harriet
Table2.12
Employee X= Dept
Sally
2241
Sales
Harriet
Harriet 2202
Sales
Harriet
Production Charles
Table2.13
Full outer join - The outer join or full outer join in effect combines the results of the
left and right outer joins. The full outer join is written as R =X= S where R and S are
relations. The result of the full outer join is the set of all combinations of tuples in R
26
and S that are equal on their common attribute names, in addition to tuples in S that
have no matching tuples in R and tuples in R that have no matching tuples in S in their
common attribute names.
For an example consider the tables Employee and Dept and their full outer join:
In the resulting relation, tuples in R which have no common values in common
attribute names with tuples in S take a null value, . Tuples in S which have no
common values in common attribute names with tuples in R, also take a null value,
Employee
Harry 3415
Finance
Sally
Sales
2241
George 3401
Finance
Harriet 2202
Sales
Tim
Executive
1123
Table 2.14
Dept
DeptName Manager
Sales
Harriet
Production Charles
Table 2.15
27
Table 2.16
Table2.17
Harry 3415
Sally
2241
George 3401
Finance
Completed
Sales
Harriet
Finance
Harriet 2202
Sales
Harriet
Tim
1123
Executive
Production Charles
Student
Fred
Task
Database1
Fred
Database2
Fred
Compiler1
Eugene
Database1
Eugene
Compiler1
Sara
Database1
Sara
Database2
Task
Database1
Student
Fred
Database2
Sara
28
Table 2.18
portion of a tuple is in
portion in
portion in relation .
6. Assignment operation - Sometimes it is useful to be able to write a relational
algebra expression in
operation, denoted
No extra relation is added to the database, but the relation variable created can be used
in subsequent expressions. Assignment to a permanent relation would constitute a
modification to the database.
2.6 Tuple Relational Calculus - The tuple calculus is a calculus that was
introduced by Edgar F. Codd as part of the relational model in order to give a
declarative database query language for this data model. The tuple relational calculus
is a nonprocedural language. (The relational algebra was procedural.) We must
provide a formal description of the information desired.
A query in the tuple relational calculus is expressed as { t / P(t) } i.e. the set of tuples t
for which predicate P is true. We also use the notation
o t[a] to indicate the value of tuple on attribute.
o t r to show that tuple t is in relation r.
29
Example Queries
For example, to find the branch-name, loan number, customer name and amount for
loans over $1200:
This gives us all attributes, but suppose we only want the customer names. (We would
use project in the algebra.) We need to write an expression for a relation on scheme
(cname).
In English, we may read this equation as the set of all tuples t such that there exists a
tuple s in the relation borrow for which the values of t and s for the cname attribute
are equal, and the value of s for the amount attribute is greater than 1200.
The notation
means that there exists a tuple t in relation r such that
predicate Q(t) is true''. Consider another example: Find all customers having a loan
from the SFU branch, and the cities in which they live:
In English, we might read this as the set of all (cname,ccity) tuples for which cname
is a borrower at the SFU branch, and ccity is the city of cname. Tuple variable s
ensures that the customer is a borrower at the SFU branch. Tuple variable u is
restricted to pertain to the same customer as , and also ensures that ccity is the city of
the customer.
The logical connectives (AND) and (OR) are allowed, as well as (negation). We
also use the existential quantifier and the universal quantifier .
Formal Definition
A tuple relational calculus expression is of the form { t | P(t) } where P is a formula.
Several tuple variables may appear in a formula.
Tuple variable : A tuple variable is said to be a free variable unless it is quantified by
a or a . If it is quantified by a or a , it is said to be bound variable.
Formula : A formula is built of atoms. An atom is one of the following forms:
o
o
o
30
o
o
o
o
An atom is a formula.
If P is a formula, then so are and (P).
If P1and P2 are formulae, then so are P1 P2,
,
If P(s) is a formula containing a free tuple variable s, then
and
Safety of Expressions
A tuple relational calculus expression may generate an infinite expression, e.g.
There are infinite number of tuples that are not in borrow. Most of these tuples contain
values that do not appear in the database. So we have to restrict the relational
calculus.
Safe Tuple Expressions
The domain of a formula , denoted dom( ), is the set of all values referenced in P.
We may say an expression { t / P(t) }is safe if all values that appear in the result are
values from dom( ). A safe expression yields a finite number of tuples as its result.
Otherwise, it is called unsafe. The tuple relational calculus restricted to safe
expressions is equivalent in expressive power to the relational algebra.
where the
is a formula.
31
o
o
o
An atom is a formula.
If P is a formula, then so are and (P).
If P1and P2 are formulae, then so are P1 P2,
,
.
If P(s) is a formula containing a free tuple variable s, then
and
Find all customers who have a loan for an amount > than $1200.
Find all customers having a loan from the SFU branch, and the city in which they live.
Find all customers having a loan, an account or both at the SFU branch.
Find all customers who have an account at all branches located in Brooklyn.
Safety of Expressions
We say that an expression
{ < x1, x2,..,xn > | P (x1, x2,.xn)} is safe if all of the following hold:
32
1. All values that appear in tuples of the expression are values from dom(P).
2. For every there exists sub formula of the form x (P1(x)), the subformula is
true if and only if there is a value x in dom(P1) such that P1(x) is true.
3. For every for all subformula of the form Vx (P1(x)), the subformula is true if
and only if P1(x) is true for all values of x.
An expression such as { <b, l, a> | (<b, l, a> loan)} is unsafe because
it allows values in the result that are not in the domain of the expression.
All three of the following are equivalent:
o
o
o
2.8 SQL Sql has become the standard relational database language. It has several
parts:
o Data definition language (DDL) - provides commands to
Define relation schemes.
Delete relations.
Create indices.
Modify schemes.
o Interactive data manipulation language (DML) - a query language based on
both relational algebra and tuple relational calculus, plus commands to insert,
delete and modify tuples.
o Embedded data manipulation language - for use within programming
languages like C, PL/1, Cobol, Pascal, etc.
o View Definition - commands for defining views
o Authorization - specifying access rights to relations and views.
o Integrity - a limited form of integrity checking.
o Transaction control - specifying beginning and end of transactions.
Basic Structure
Basic structure of an SQL expression consists of select, from and where clauses.
A typical SQL query has the form :
select A1, A2,.,An
from r1, r2,.,rn
where P
Each Ai represents an attribute, and each ri a relation. P is a predicate. This query is
equivalent to the algebra expression.
A1, A2,.,An( p (r1 x r2 x x rm))
33
If the where clause is omitted, the predicate P is true. The list of attributes can be
replaced with a * to select all. The result of an SQL query is a relation.
The select clause - corresponds to the projection operation of the relational algebra. It
is used to list the attributes desired in the result of a query. If we want to remove
duplicates in a selection procedure, we use the keyword distinct after select. The
keyword all is used to specify explicitly that duplicates are not removed. select *
means select all the attributes. Select clause can also contain arithmetic expressions
involving operators (+, -, *, / ) and operating on constants or attributes of tuples.
Eg: 1. select branch-name
from loan
1. select branch-name, loan-number, amount*100
from loan
The where clause - corresponds to selection predicate in relational algebra. It consists
of a predicate involving attributes of the relations that appear in the from clause. SQL
uses the logical connectives and, or and not - rather than mathematical symbols , V
and in the where clause. The operands of the logical connectives can be expressions
involving the comparison operators <, >, , , = and <>. SQL includes a between
comparison operator to simplify where clauses that specify that a value be less than or
equal to some value or greater than or equal to some other value.
Eg: 1. select loan-number
from loan
where amount between 90000 and 100000
The from clause - corresponds to Cartesian product of the relational algebra. It lists
the relations to be scanned in the evaluation of the expression.
The rename operation SQL provides a mechanism for renaming both relations and
attributes. It uses the as clause, taking the form: old-name as new-name.
String operations - The most commonly used operation on strings is pattern
matching using the operator like. We describe patterns using two special characters:
Percent (%) The % character matches any substring.
Underscore ( _ ) The _ character matches any character.
Patterns are case-sensitive. The keyword escape is used to define the escape character.
We can use not like for string mismatching.
Ordering the display of tuples - SQL allows the user to control the order in which
tuples are displayed.
o
o
o
Set operations - SQL has the set operations union, intersect and except. union
eliminates duplicates, being a set operation. If we want to retain duplicates, we may
use union all, similarly for intersect and except.
Not all implementations of SQL have these set operations. except in SQL-92 is called
minus in SQL-86.
Aggregate functions - In SQL we can compute functions on groups of tuples using
the group by clause. Attributes given are used to form groups with the same values.
SQL can then compute
o
o
o
o
o
These are called aggregate functions. They return a single value. having-clause is
used to state conditions that applies to groups rather than to tuples. Predicates in the
having clause are applied after the formation of groups. If a where clause and a
having clause appear in the same query, the where clause predicate is applied first.
Tuples satisfying where clause are placed into groups by the group by clause. The
having clause is applied to each group. Groups satisfying the having clause are used
by the select clause to generate the result tuples. If no having clause is present, the
tuples satisfying the where clause are treated as a single group.
Null values The keyword null is used to test for a null value(absence of information
about the value of an attribute).
2.9Views in SQL - A view in SQL is defined using the create view command:
create view v as <query expression>
where <query expression> is any legal query expression. The view created is given
the name v. To create a view all-customer of all branches and their customers:
create view all-customer as
(select bname, cname
from depositor, account
where depositor.account# = account.account#)
union
(select bname, cname
from borrower, loan
where borrower.loan# = loan.loan#)
Having defined a view, we can now use it to refer to the virtual relation it creates.
View names can appear anywhere a relation name can.
35
It is important that we evaluate the select statement fully before carrying out any
insertion. If some insertions were carried out even as the select statement were being
evaluated, the insertion might insert an infinite number of tuples. Evaluating the select
statement completely before performing insertions avoids such problems. It is
possible for inserted tuples to be given values on only some attributes of the schema.
The remaining attributes are assigned a null value denoted by null. We can prohibit
the insertion of null values using the SQL DDL.
Delete The delete command removes tuples from a relation. Deletion is expressed
in much the same way as a query. Instead of displaying, the selected tuples are
removed from the database. We can only delete whole tuples. A deletion in SQL is of
the form delete from r where P. Tuples in r for which P is true are deleted. If
the where clause is omitted, all tuples are deleted. We may only delete tuples from one
relation at a time, but we may reference any number of relations in a select-fromwhere clause embedded in the where clause of a delete. However, if the delete
request contains an embedded select that references the relation from which tuples are
to be deleted, ambiguities may result.
Update - Updating allows us to change some values in a tuple without necessarily
changing all. where clause of update statement may contain any construct legal in a
where clause of a select statement (including nesting). A nested select within an
update may reference the relation that is being updated. As before, all tuples in the
relation are first tested to see whether they should be updated, and the updates are
carried out afterwards.
Update of a view - The view update exists also in SQL. An example will illustrate:
Consider a clerk who needs to see all information in the loan relation except amount.
Let the view branch-loan be given to the clerk:
create view branch-loan as select bname, loan# from loan
Since SQL allows a view name to appear anywhere a relation name may appear, the
clerk can write: insert into branch-loan values (SFU, L-307). This insertion
is represented by an insertion into the actual relation loan, from which the view is
constructed. However, we have no value for amount. This insertion results in (SFU'',
L-307, null) being inserted into the loan relation.
36
MODULE 3
We have
transaction systems which are systems that operate on very large databases, on which
several (sometimes running into hundreds) of users concurrently operate i.e. they
manipulate the database transaction. There are several such systems presently in
operation in our country also if you consider the railway reservation system,
wherein thousands of stations each with multiple number of computers operate on a
huge database, the database containing the reservation details of all trains of our
country for the next several days. There are many other such systems like the airlines
reservation systems, distance banking systems, stock market systems etc. In all these
cases apart from the accuracy and integrity of the data provided by the database (note
that money is involved in almost all the cases either directly or indirectly), the
systems should provide instant availability and fast response to these hundreds of
concurrent users. In this block, we discuss the concept of transaction, the problems
involved in controlling concurrently operated systems and several other related
concepts. We repeat a transaction is a logical operation on a database and the users
intend to operate with these logical units trying either to get information from the
database and in some cases modify them.
concurrency, we view the concept of multiuser systems from another point of view
the view of the database designer.
37
which is used by one person at a time. (Note however, that the same system can be
used by different persons at different periods of time). Now extending this
concept to a database, a multiuser database is one which can be accessed and
modified by a number of users simultaneously whereas a single user database is
one which can be used by only one person at a time. Note that multiuser
databases essentially mean there is a concept of multiprogramming but the
converse is not true. Several users may be operating simultaneously, but not all of
them may be operating on the database simultaneously.
Now, before we see what problems can arise because of concurrency, we see
what operations can be done on the database. Such operations can be single line
commands or can be a set of commands meant to be operated sequentially. Those
operations are invariably limited by the begin transaction and end transaction
statements and the implication is that all operations in between them are to be done on
a given transaction.
Another concept is the granularity of the transaction. Assume each field in a
database is named. The smallest such named item of the database can be called a field
of a record. The unit on which we operate can be one such grain or a number of
such grains collectively defining some data unit. However, in this course, unless
specified otherwise, we use of single grain operations, but without loss of
generality. To facilitate discussions, we presume a database package in which the
following operations are available.
i)
Read_tr(X: The operation reads the item X and stores it into an assigned
variable. The name of the variable into which it is read can be anything,
but we would give it the same name X, so that confusions are avoided. I.e.
whenever this command is executed the system reads the element required
from the database and stores it into a program variable called X.
ii)
Write tr(X): This writes the value of the program variable currently
stored in X into a database item called X.
Once the read tr(X) is encountered, the system will have to perform the
following operations.
1. Find the address of the block on the disk where X is stored.
38
Suppose we are
39
perform a read tr (X) so that the value of X is copied on to the variable X of person
A (let us call it XA) and of the person B (XB). So each of them know that there are 10
seats available.
Suppose A wants to book 8 seats. Since the number of seats he wants is (say
Y) less than the available seats, the program can allot him the seats, change the
number of available seats (X) to X-Y and can even give him the seat numbers that
have been booked for him.
The problem is that a similar operation can be performed by B also. Suppose
he needs 7 seats. So, he gets his seven seats, replaces the value of X to 3 (10 7) and
gets his reservation.
The problem is noticed only when these blocks are returned to main database
(the disk in the above case).
Before we can analyse these problems, we look at the problem from a more
technical view.
1 The lost update problem: This problem occurs when two transactions that access
the same database items have their operations interleaved in such a way as to make
the value of some database incorrect.
TB
Read tr(X)
Read tr(X)
X = X NA
X = X - NB
Write tr(X)
write tr(X)
fig1
fig2
40
Time
Note that the problem occurred because the transaction TB failed to record the
transactions TA. I.e. TB lost on TA. Similarly since TA did the writing later on, TA lost
the updatings of TB.
operational reason. Or the system may have later on noticed that the operation should
not have been done and cancels it. To be fair, it also ensures that the original value is
restored.
But in the meanwhile, another transaction TB has accessed the data and since it
has no indication as to what happened later on, it makes use of this data and goes
ahead. Once the original value is restored by TA, the values generated by TB are
obviously invalid.
TA
TB
Read tr(X)
Time
X=XN
Write tr(X)
Read tr(X)
X=X-N
write tr(X)
Failure
X=X+N
Write tr(X)
Fig3
The value generated by TA out of a non-sustainable transaction is a dirty
data which is read by TB, produces an illegal value. Hence the problem is called a
dirty read problem.
41
TA
TB
Sum = 0
Read tr(A)
Sum = Sum + A
Read tr(X)
X=XN
Write tr(X)
Read tr(X)
Sum = Sum + X
Read tr(Y)
Sum = Sum + Y
Read (Y)
Y=YN
Write tr(Y)
Fig4
In the above example, both TA will be updating both X and Y. But since it first
updates X and then Y and the operations are so interleaved that the transaction TB
uses both of them in between the operations, it ends up using the old value of Y
with the new value of X. In the process, the sum we got does not refer either to
the old set of values or to the new set of values.
4 Unrepeatable read: This can happen when an item is read by a transaction twice,
(in quick succession) but the item has been changed in the meanwhile, though the
transaction has no reason to expect such a change. Consider the case of a reservation
system, where a passenger gets a reservation detail and before he decides on the
aspect of reservation the value is updated at the request of some other passenger at
another place.
42
ii)
iii)
iv)
43
v)
The other reasons can be physical problems like theft, fire etc or system
problems like disk failure, viruses etc. In all such cases of failure, a recovery
mechanism is to be in place.
databases comprise of only the read and write operations, the system needs several
additional operations for its purposes.
recovery discussed in the previous section. If the system were to recover from a crash
or any other catastrophe, it should first be able to keep track of the transactions
when they start, when they terminate or when they abort. Hence the following
operations come into picture.
i)
ii)
iii)
iv)
Most systems also keep track of the present status of all the transactions at the present
instant of time (Note that in a real multiprogramming environment, more than one
transaction may be in various stages of execution). The system should not only be
able to keep a tag on the present status of the transactions, but also should know what
are the nextpossibilities for the transaction to proceed and in case of a failure, how to
roll it back. The whole concept takes the state transition diagram. A simple state
transition diagram, in view of what we have seen so for can appear as follows:
44
Terminate
Failure
Terminated
Abort
Terminate
Committe
d
Begin
Active
Transaction
End
Partially
committed
Transaction
Commit
Read/Write
Fig5
The arrow marks indicate how a state of a transaction can change to a next
state. A transaction is in an active state immediately after the beginning of execution.
Then it will be performing the read and write operations. At this state, the system
protocols begin ensuring that a system failure at this juncture does not make
erroneous recordings on to the database. Once this is done, the system Commits
itself to the results and thus enters the Committed state. Once in the committed
state, a transaction automatically proceeds to the terminated state.
The transaction may also fail due to a variety of reasons discussed in a
previous section. Once it fails, the system may have to take up error control exercises
like rolling back the effects of the previous write operations of the transaction. Once
this is completed, the transaction enters the terminated state to pass out of the system.
A failed transaction may be restarted later either by the intervention of the
user or automatically.
The concept of system log:
To be able to recover from failures of the transaction operations the
system needs to essentially maintain a track record of all transaction operations that
are taking place and that are likely to affect the status of the database.
This
information is called a System log (Similar to the concept of log books) and may
become useful when the system is trying to recover from failures.
The log
information is kept on the disk, such that it is not likely to be affected by the normal
45
system crashes, power failures etc. (Otherwise, when the system crashes, if the disk
also crashes, then the entire concept fails). The log is also periodically backed up into
removable devices (like tape) and is kept in archives.
The question is, what type of data or information needs to be logged into the
system log?
Let T refer to a unique transaction id, generated automatically whenever a
new transaction is encountered and this can be used to uniquely identify the
transaction. Then the following entries are made with respect to the transaction T.
i)
ii)
[Write-tr, T, X, old, new]: denotes that the transaction T has changed the
old value of the data X to a new value.
iii)
[read_tr, T, X] : denotes that the transaction T has read the value of the X
from the database.
iv)
v)
These entries are not complete. In some cases certain modification to their purpose
and format are made to suit special needs.
(Note that though we have been talking that the logs are primarily useful for recovery
from errors, they are almost universally used for other purposes like reporting,
auditing etc).
The two commonly used operations are undo and redo operations. In the undo, if
the transaction fails before permanent data can be written back into the database, the
log details can be used to sequentially trace back the updatings and return them to
their old values. Similarly if the transaction fails just before the commit operation is
complete, one need not report a transaction failure. One can use the old, new values
of all write operation on the log and ensure that the same is entered onto the database.
operations that access the database have been successfully executed and the effects of
all such transactions have been included in the log. Once a transaction T reaches a
commit point, the transaction is said to be committed i.e. the changes that the
transaction had sought to make in the database are assumed to have been recorded
into the database. The transaction indicates this state by writing a [commit, T] record
into its log. At this point, the log contains a complete sequence of changes brought
about by the transaction to the database and has the capacity to both undo it (in case
of a crash) or redo it (if a doubt arises as to whether the modifications have actually
been recorded onto the database).
Before we close this discussion on logs, one small clarification. The records
of the log are on the disk (secondary memory). When a log record is to be written, a
secondary device access is to be made, which slows down the system operations. So
normally a copy of the most recent log records are kept in the memory and the
updatings are made there. At regular intervals, these are copied back to the disk. In
case of a system crash, only those records that have been written onto the disk will
survive.
forcefully written back to the disk and then commit is to be executed. This concept is
called forceful writing of the log file.
isolation and durability. Often by combining their first letters, they are called ACID
properties.
i)
Looking
Consistency preservation:
47
iv)
atomicity concept is taken care of, while designing and implementing the transaction.
If, however, a transaction fails even before it can complete its assigned task, the
recovery software should be able to undo the partial effects inflicted by the
transactions onto the database.
The preservation of consistency is normally considered as the duty of the
database programmer. A consistent state of a database is that state which satisfies
the constraints specified by the schema.
included to make the rules more effective. The database programmer writes his
programs in such a way that a transaction enters a database only when it is in a
consistent state and also leaves the state in the same or any other consistent state.
This, of course implies that no other transaction interferes with the action of the
transaction in question.
This leads us to the next concept of isolation i.e. every transaction goes about
doing its job, without being bogged down by any other transaction, which may also
48
be working on the same database. One simple mechanism to ensure this is to make
sure that no transaction makes its partial updates available to the other transactions,
until the commit state is reached. This also eliminates the temporary update problem.
However, this has been found to be inadequate to take care of several other problems.
Most database transaction today come with several levels of isolation. A transaction
is said to have a level zero (0) isolation, if it does not overwrite the dirty reads of
higher level transactions (level zero is the lowest level of isolation). A transaction is
said to have a level 1 isolation, if it does not lose any updates. At level 3, the
transaction neither loses updates nor has any dirty reads. At level 3, the highest level
of isolation, a transaction does not have any lost updates, does not have any dirty
reads, but has repeatable reads.
49
For the recovery and concurrency control operations, we concentrate mainly on readtr
and writetr operations, because these operations actually effect changes to the
database. The other two (equally) important operations are commit and abort, since
they decide when the changes effected have actually become active on the database.
Since listing each of these operations becomes a lengthy process, we make a notation
for describing the schedule. The operations of readtr, writetr, commit and abort, we
indicate by r, w, c and a and each of them come with a subscript to indicate the
transaction number
For example SA : r1(x); y2(y); w2(y); r1(y), W1 (x); a1
Indicates the following operations in the same order:
Readtr(x)
transaction 1
Read tr (y)
transaction 2
Write tr (y)
transaction 2
Read tr(y)
transaction 1
Write tr(x)
transaction 1
Abort
transaction 1
ii)
iii)
50
But r1 (x); w2(y) and r1(x) and r2(x) do not conflict, because in the first case the read
and write are on different data items, in the second case both are trying read the same
data item, which they can do without any conflict.
A Complete Schedule: A schedule S of n transactions T1, T2.. Tn is said to be a
Complete Schedule if the following conditions are satisfied.
i)
ii)
iii)
Whenever there are conflicting operations, one of two will occur before
the other in the schedule.
A Partial order of the schedule is said to occur, if the first two conditions of the
complete schedule are satisfied, but whenever there are non conflicting operations in
the schedule, they can occur without indicating which should appear first.
This can happen because non conflicting operations any way can be executed in any
order without affecting the actual outcome.
However, in a practical situation, it is very difficult to come across complete
schedules. This is because new transactions keep getting included into the schedule.
Hence, often one works with a committed projection C(S) of a schedule S. This set
includes only those operations in S that have committed transactions i.e. transaction Ti
whose commit operation Ci is in S.
Put in simpler terms, since non committed operations do not get reflected in the actual
outcome of the schedule, only those transactions, who have completed their commit
operations contribute to the set and this schedule is good enough in most cases.
51
schedules and those that do not, are called non-recoverable schedules. As a rule,
such non-recoverable schedules should not be permitted.
Formally, a schedule S is recoverable if no transaction T which appears is S
commits, until all transactions T1 that have written an item which is read by T have
committed.
The concept is a simple one. Suppose the transaction T reads an item X from
the database, completes its operations (based on this and other values) and commits
the values. I.e. the output values of T become permanent values of database.
But suppose, this value X is written by another transaction T (before it is read
by T), but aborts after T has committed. What happens? The values committed by T
are no more valid, because the basis of these values (namely X) itself has been
changed. Obviously T also needs to be rolled back (if possible), leading to other
rollbacks and so on.
The other aspect to note is that in a recoverable schedule, no committed
transaction needs to be rolled back. But, it is possible that a cascading roll back
scheme may have to be effected, in which an uncommitted transaction has to be rolled
52
3.6 Serializability
Given two transaction T 1 and T2 are to be scheduled, they can be scheduled in
a number of ways. The simplest way is to schedule them without in that bothering
about interleaving them. I.e. schedule all operation of the transaction T 1 followed by
all operations of T2 or alternatively schedule all operations of T2 followed by all
operations of T1.
53
T1
T2
read_tr(X)
X=X+N
write_tr(X)
read_tr(Y)
Y=Y+N
Write_tr(Y)
Time
read_tr(X)
X=X+P
Write_tr(X)
Fig 6 Non-interleaved (Serial Schedule) :A
T1
T2
read_tr(X)
T2
T2
read_tr(X)
read_tr(X)
X=X+N
X=X+P
X=X+P
write_tr(X)
Write_tr(X)
write_tr(X)
read_tr(Y)
readtr(X)
Y=Y+N
Write_tr(Y)
Fig 7
These now can be termed as serial schedules, since the entire sequence of operation in
one transaction is completed before the next sequence of transactions is started.
In the interleaved mode, the operations of T1 are mixed with the operations of T2.
This can be done in a number of ways. Two such sequences are given below:
54
T1
T2
read_tr(X )
X=X+N
read_tr(X)
X=X+P
write_tr(X)
read_tr(Y)
Write_tr(X)
Y=Y+N
Write_tr(Y)
Fig 8 Interleaved (non-serial schedule):C
T1
T2
read_tr(X)
X=X+N
write_tr(X)
read_tr(X)
X=X+P
Write_tr(X)
read_tr(Y)
Y=Y+N
Write_tr(Y)
Fig 9 Interleaved (Nonserial) Schedule D.
Formally a schedule S is serial if, for every transaction, T in the schedule, all
operations of T are executed consecutively, otherwise it is called non serial. In such a
non-interleaved schedule, if the transactions are independent, one can also presume
that the schedule will be correct, since each transaction commits or aborts before the
55
next transaction begins. As long as the transactions individually are error free, such a
sequence of events are guaranteed to give a correct results.
The problem with such a situation is the wastage of resources. If in a serial
schedule, one of the transactions is waiting for an I/O, the other transactions also
cannot use the system resources and hence the entire arrangement is wasteful of
resources. If some transaction T is very long, the other transaction will have to keep
waiting till it is completed. Moreover, wherein hundreds of machines operate
concurrently becomes unthinkable. Hence, in general, the serial scheduling concept is
unacceptable in practice.
However, once the operations are interleaved, so that the above cited problems
are overcome, unless the interleaving sequence is well thought of, all the problems
that we encountered in the beginning of this block become addressable. Hence, a
methodology is to be adopted to find out which of the interleaved schedules give
correct results and which do not.
A schedule S of N transactions is serialisable if it is equivalent to some
serial schedule of the some N transactions. Note that there are n! different serial
schedules possible to be made out of n transaction. If one goes about interleaving
them, the number of possible combinations become unmanageably high. To ease our
operations, we form two disjoint groups of non serial schedules- these non serial
schedules that are equivalent to one or more serial schedules, which we call
serialisable schedules and those that are not equivalent to any serial schedule and
hence are not serialisable once a nonserial schedule is serialisable, it becomes
equivalent to a serial schedule and by our previous definition of serial schedule will
become a correct schedule. But now can one prove the equivalence of a nonserial
schedule to a serial schedule?
The simplest and the most obvious method to conclude that two such
schedules are equivalent is to find out their results. If they produce the same results,
then they can be considered equivalent. i.e. it two schedules are result equivalent,
then they can be considered equivalent. But such an oversimplification is full of
problems. Two sequences may produce the same set of results of one or even a large
56
number of initial values, but still may not be equivalent. Consider the following two
sequences:
S1
S2
read_tr(X)
read_tr(X)
X=X+X
X=X*X
write_tr(X)
Write_tr(X)
fig 10
For a value X=2, both produce the same result. Can be conclude that they are
equivalent? Though this may look like a simplistic example, with some imagination,
one can always come out with more sophisticated examples wherein the bugs of
treating them as equivalent are less obvious. But the concept still holds -result
equivalence cannot mean schedule equivalence. One more refined method of finding
equivalence is available. It is called conflict equivalence. Two schedules can be
said to be conflict equivalent, if the order of any two conflicting operations in both the
schedules is the same (Note that the conflicting operations essentially belong to two
different transactions and if they access the same data item, and atleast one of them is
a write_tr(x) operation). If two such conflicting operations appear in different orders
in different schedules, then it is obvious that they produce two different databases in
the end and hence they are not equivalent.
1 Testing for conflict serializability of a schedule:
We suggest an algorithm that tests a schedule for conflict serializability.
1. For each transaction Ti, participating in the schedule S, create a node
labeled T1 in the precedence graph.
2. For each case where Tj executes a readtr(x) after Ti executes write_tr(x),
create an edge from Ti to Tj in the precedence graph.
3. For each case where Tj executes write_tr(x) after Ti executes a read_tr(x),
create an edge from Ti to Tj in the graph.
4. For each case where Tj executes a write_tr(x) after Ti executes a
write_tr(x), create an edge from Ti to Tj in the graph.
57
T2
T1
T2
X
Schedule A
Schedule B
X
T1
T1
T2
T2
X
Schedule C
Schedule D
Fig 11
58
ii)
iii)
The concept being view equivalent is that as long as each read operation of the
transaction reads the result of the same write operation in both the schedules, the write
operations of each transaction must produce the same results.
operations are said to see the same view of both the schedules. It can easily be
verified when S or S1 operate independently on a database with the same initial state,
they produce the same end states. A schedule S is said to be view serializable, if it is
view equivalent to a serial schedule.
It can also be verified that the definitions of conflict serializability and view
serializability are similar, if a condition of constrained write assumption holds on
all transactions of the schedules. This condition states that any write operation wi(X)
in Ti is preceded by a ri(X) is Ti and that the value written by wi(X) in Ti depends
only on the value of X read by ri(X). This assumes that computation of the new value
of X is a function f(X) based on the old value of x read from the database. However,
the definition of view serializability is less restrictive than that of conflict
serializability under the unconstrained write assumption where the value written by
the operation Wi(x) in Ti can be independent of its old value from the database. This
is called a blind write.
But the main problem with view serializability is that it is extremely complex
computationally and there is no efficient algorithm to do the same.
3.Uses of serializability:
If one were to prove the serializability of a schedule S, it is equivalent to saying that S
is correct. Hence, it guarantees that the schedule provides correct results. But being
serializable is not the same as being serial. A serial scheduling inefficient because of
59
the reasons explained earlier, which leads to under utilization of the CPU, I/O devices
and in some cases like mass reservation system, becomes untenable. On the other
hand, a serializable schedule combines the benefits of concurrent execution( efficient
system utilization, ability to cater to larger no of concurrent users) with the guarantee
of correctness.
But all is not well yet. The scheduling process is done by the operating system
routines after taking into account various factors like system load, time of transaction
submission, priority of the process with reference to other process and a large number
of other factors. Also since a very large number of possible interleaving combinations
are possible, it is extremely difficult to determine before hand the manner in which
the transactions are interleaved. In other words getting the various schedules itself is
difficult, let alone testing them for serializability.
Hence, instead of generating the schedules, checking them for serializability and then
using them, most DBMS protocols use a more practical method impose restrictions
on the transactions themselves.
60
concurrently. Such controls, when implemented properly can overcome many of the
problems of concurrent operations listed earlier. However, the locks themselves may
create a few problems, which we shall be seeing in some detail in subsequent sections.
61
The only restrictions on the use of the binary locks are that they should be
implemented as indivisible units (also called critical sections in operating systems
terminology). That means no interleaving operations should be allowed, once a lock
or unlock operation is started, until the operation is completed. Otherwise, if a
transaction locks a unit and gets interleaved with many other transactions, the locked
unit may become unavailable for long times to come with catastrophic results.
To make use of the binary lock schemes, every transaction should follow certain
protocols:
1. A transaction T must issue the operation lockitem(X), before issuing a
readtr(X) or writetr(X).
2. A transaction T must issue the operation unlockitem(X) after all readtr(X)
and write_tr(X) operations are complete on X.
62
Shared/Exclusive locks
While the operation of the binary lock scheme appears satisfactory, it suffers
from a serious drawback.
operation), no other transaction can access the data item. But in large concurrent
systems, this can become a disadvantage. It is obvious that more than one transaction
should not go on writing into X or while one transaction is writing into it, no other
transaction should be reading it, no harm is done if several transactions are allowed to
simultaneously read the item. This would save the time of all these transactions,
without in anyway affecting the performance.
This concept gave rise to the idea of shared/exclusive locks. When only read
operations are being performed, the data item can be shared by several transaction,
only when a transaction wants to write into it that the lock should be exclusive. Hence
the shared/exclusive lock is also sometimes called multiple mode lock. A read lock is
a shared lock (which can be used by several transactions), whereas a writelock is an
exclusive lock. So, we need to think of three operations, a read lock, a writelock and
unlock. The algorithms can be as follows:
Read Lock Operation:
Readlock(X):
Start: If Lock (X) = unlocked
Then {
Lock(X)
read locked,
63
No of reads(X)
}
else if Lock(X) = read locked
then no. of reads(X) = no of reads(X)0+1;
else { wait until Lock(X)
manager
wakes up the transaction) }
go to start
end.
The writelock operation:
Writelock(X)
Start: If lock(X) = unlocked
Then Lock(X)
write-locked.
unlocked
no of reads 1
if no of reads(X)=0
then { Lock(X) = unlocked
wakeup one of the waiting transactions, if any
}
64
The algorithms are fairly straight forward, except that during the unlocking
operation, if a number of read locks are there, then all of them are to be unlocked
before the unit itself becomes unlocked.
To ensure smooth operation of the shared / exclusive locking system, the
system must enforce the following rules:
1. A transaction T must issue the operation readlock(X) or writelock(X)
before any read or write operations are performed.
2. A transaction T must issue the operation writelock(X) before any
writetr(X) operation is performed on it.
3. A transaction T must issue the operation unlock (X) after all readtr(X) are
completed in T.
4. A transaction T will not issue a readlock(X) operation if it already holds a
readlock or writelock on X.
5. A transaction T will not issue a writelock(X) operation if it already holds a
readlock or writelock on X.
Conversion Locks
In some cases, it is desirable to allow lock conversion by relaxing the
conditions (4) and (5) of the shared/ exclusive lock mechanism. I.e. if a transaction T
already holds are type of lock on a item X, it may be allowed to convert it to other
types. For example, it is holding a readlock on X, it may be allowed to upgrade it to a
writelock. All that the transaction does is to issue a writelock(X) operation. If T is
the only transaction holding the readlock, it may be immediately allowed to upgrade
itself to a writelock, otherwise it has to wait till the other readlocks (of other
transactions) are released. Similarly if it is holding a writelock, T may be allowed to
downgrade it to readlock(X). The algorithms of the previous sections can be amended
65
to accommodate the conversion locks and this has been left as on exercise to the
students.
Before we close the section, it should be noted that use of binary locks does
not by itself guarantee serializability. This is because of the fact that in certain
combinations of situations, a key holding transaction may end up unlocking the unit
too early. This can happen because of a variety of reasons, including a situation
wherein a transaction feels it is no more needing a particular data unit and hence
unlocks, it but may be indirectly writing into it at a later time (through some other
unit). This would result in ineffective locking performance and the serializability is
lost. To guarantee such serializability, the protocol of two phase locking is to be
implemented, which we will see in the next section.
Phase I
writelock(X)
----------------------------------unlock(Y)
Readtr(X)
Phase II
X=X+Y
writetr(X)
unlock(X)
fig12
66
(and X ) : The
S=S
RXS=SXR
6. Commuting with
involve only the attributes of one of the relations being joined, say R, the two
operations can be commuted as follows:
c (R
S) = (c(R) )
Alternatively, if the selection condition c can be written as c1 and c2, where condition
c1 involves only the attributes of R and condition c2 involves only the attributes of S,
the operations commute as follows:
c (R
S) = (c1(R) )
(c2(S) )
is replaced by a X operation.
67
7. Commuting with
.An, B1, B2,.Bm} where A1, A2, ..An are attributes of R and B1, B2,
Bm are attributes of S. If the join condition c involves only attributes in L, the two
operations can be commuted as follows: L ( R
B2,..Bm
c ( B1,
(S) )
If the join condition c contains additional attributes not in L, these must be added to
the projection list, and a final operation is needed. i.e. if attributes An+1,,An+k
of R and Bm+1,,Bm+p of S are involved in the join condition c but are not in the
projection list L, the operations commute as follows:
L(R
c with X.
as follows:
c S)
definition of most restrictive SELECT can mean either the ones that produce a
relation with the fewest tuples or with the smallest absolute size. Another
possibility is to define the most restrictive SELECT as the one with the
smallest selectivity. Second, make sure that the ordering of leaf nodes does not
cause CARTESIAN PRODUCT operations. For e.g. if the two relations with
the most restrictive SELECT do not have a direct join condition between
them, it may be desirable to change the order of leaf nodes to avoid Cartesian
products.
4. Using rule 12, combine a CARTESIAN PRODUCT operation with a
subsequent SELECT operation in the tree into a JOIN operation, if the
condition represents a join condition.
5. Using rules 3, 4, 7 and 11 concerning the cascading of PROJECT and the
commuting of PROJECT with other operations, break down and move lists of
projection attributes down the tree as far as possible by creating new
PROJECT operations as needed. Only those attributes needed in the query
result and in subsequent operations in the query tree should be kept after each
PROJECT operation.
6. Identify subtrees that represent groups of operations that can be executed by a
single algorithm.
2. Cost Based optimization A query optimizer should not solely depend on
heuristic rules; it should also estimate and compare the costs of executing a query
using different execution strategies and should choose the strategy with the lowest
cost estimate. This approach is more suitable for compiled queries where the
optimization is done at compile time and the resulting execution strategy code is
stored and executed directly at run-time.
Cost Components for Query Execution
The cost of executing a query includes the following components:
1. Access cost to secondary storage: This is the cost of searching for, reading and
writing data blocks that reside on secondary storage, mainly on disk. The cost of
searching for records in a file depends on the type of access structures on that file,
such as ordering, hashing and primary or secondary indices. In addition, factors
such as whether the file blocks are allocated contiguously on the same disk
cylinder or scattered on the disk affect the access cost.
69
2. Storage cost: This is the cost of storing any intermediate files that are generated by
an execution strategy for the query.
3. Computation cost: This is the cost of performing in memory operations on the
data buffers during query execution. Such operations include searching for and
sorting records, merging records for a join and performing computations on field
values.
4. Memory usage cost: This is the cost pertaining to the number of memory buffers
needed during query execution.
5. Communication cost: This is the cost of shipping the query and its result from the
database site to the site or terminal where the query originated.
These components are used for cost function that is used to estimate query execution
cost. To estimate the costs of various execution strategies, we must keep track of
information that is needed for the cost functions. This information may be stored in
the DBMS catalog, where it is accessed by the query optimizer. First, we must know
the size of each file. For a file whose records are all of the same type, the number of
records(tuples), the (average) record size and the number of blocks are needed. The
blocking factor of the file may also be needed.
3.10 Assertions
An assertion is a predicate expressing a condition that we wish the database always to
satisfy. Domain constraints and referential-integrity constraints are special forms of
assertions. There are many constraints that we cannot express using only these special
forms. Examples of such constraints include
1. The sum of all loan numbers for each branch must be less than the sum of all
account balances at the branch.
2. Every loan has at least one customer who maintains an account with a minimum
balance of $1000.00
An assertion in SQL-92 takes the form
Create assertion <assertion-name> check <predicate>
The two constraints mentioned can be written as shown next. Since SQL does not
provide a for all X, P(X) construct (where P is a predicate), we are forced to
implement the construct using the equivalent not exists X such that not P(X)
construct , which can be written in SQL.
70
(select
sum(amount)
from
loan
where
loan.branch-
3.10 Triggers
A trigger is a statement that is executed automatically by the system as a side
effect of a modification to the database. To design a trigger mechanism, we must meet
two requirements:
1. Specify the conditions under which the trigger is to be executed.
2. Specify the actions to be taken when the trigger executes
71
2. Two or more log files called redo log files; these record all changes made to
data and are used in the process of recovering, if certain changes do not get
written to permanent storage.
3. One or more control files; these contain control information such as database
name, file names and locations and a database creation timestamp.
4. Trace files and an alert log; background processes have a trace file associate
with them and the alert log maintains major database events.
The structure of an Oracle database consists of the definition of database in terms of
schema objects and one or more tablespaces. The schema objects contain definitions
of tables, views, sequences, stored procedures, indexes, clusters and database links.
Oracle instance: The set of processes that constitute an instance of the servers
operation is called an Oracle instance, which consists of a System Global Area and a
set of background processes.
System Global Area (SGA) : This area of memory is used for database
information shared by users. Oracle assigns an SGA area when an instance starts.
The SGA in turn is divided into several types of memory structures:
1. Database buffer cache: This keeps the most recently accessed data blocks from
the database. This helps in reducing the disk I/O activity.
2. Redo log buffer, which is the buffer for the redo log file and is used for
recovery purposes.
3. Shared pool, which contains shared memory constructs.
Program Global Area (PGA): This is a memory buffer that contains data and
control information for a server process.
Oracle Processes: Oracle creates server processes to handle requests from connected
user processes. The background processes are created for each instance of Oracle;
they perform I/O asynchronously and provide parallelism for better performance.
72
Oracle Startup and Shutdown: An Oracle database is not available to users until the
Oracle server has been started up and the database has been opened. Starting a
database and making it available system wide requires the following steps:
1. Starting an instance of the database: The SGA is allocated and background
processes are created in this step.
2. Mounting a database: This associates a previously started Oracle instance with a
database. Until then it is available only to administrators. The database
administrator chooses whether to run the database in exclusive or parallel mode.
When an oracle instance mounts a database in an exclusive mode, only that
instance can access the database. On the other hand, if the instance is started in a
parallel or share mode, other instances that are started in parallel mode can also
mount the database.
3. Opening a database: Opening a database makes it available for normal database
operations by having oracle open the on-line data files and log files.
The reverse of the above operations will shut down an Oracle instance as follows:
1. Close the window.
2. Dismount the database.
3. Shut down the Oracle instance.
Names of users
Security information
Integrity constraints
73
Data Blocks: Data Block represents the smallest unit of I/O. A data block has the
following components:
Header: Contains general block information such as block address and type of
segment.
Table directory: Contains information about tables that have data in the data
block.
Free space: Space allocated for row updates and new rows.
74
Rollback segments: Each database must contain one or more rollback segments,
which are used for undoing transactions.
3.14 Programming in PL/SQL:
BLOCK PL/SQL STRUCTURE:
PL/SQL is a block-structured language. A PL/SQL block defines a unit of
processing, which can include its own local variables, SQL statements, cursors, and
exception handlers. The blocks can be nested. The simplest block structure is given
below.
DECLARE
Variable declarations
BEGIN
Program statements
EXCEPTION
WHEN exception THEN
Program Statements
In the above PL/SQL block, block parts are logical. Blocks starts with
DECLARATION section in which memory variables and other oracle objects can
be declared. The next section contains SQL executable statements for
manipulating table data by using the variables and constants declared in the
DECLARE section. EXCEPTIONS is the last sections of the PL/SQL block which
contains SQL and/or PL/SQL code to handle errors that may crop up during the
execution of the above code block. EXCEPTION section is optional.
Each block can contain other blocks, i.e. blocks can be nested. Blocks of the
code cannot be nested in the DECLARATION section.
PL/SQL CHARACTER SET
PL/SQL uses the standard ASCII set. The basic character set includes the
following.
75
Uppercase alphabets
A to Z.
Lowercase alphabets
a to z.
Numbers
0 to 9
Words
in a PL/SQL blocks are called
usedSymbols
( ) + lexical
- * /units.
< >We= can
! ;freely
: , insert
. @
# $ blocks.
^ & The
_ \ spaces
{ } have
? [ no
] effect
blank spaces between lexical units in%a PL/SQL
on the PL/SQL block.
The ordinary symbols used in PL/SQL blocks are
( )
/ < >
= ;
**
||
<<
!= -= ^= <= >= : =
>>
VARIABLES
Variables may be used to store the result of a query or calculations. Variables
must be declared before being used. Variables in PL/SQL block are named variables.
A variable name must begin with a character and can be followed by a maximum of
29 other characters (variable length is 30 characters).
Reserved words cannot be used as variable names unless enclosed within the
double quotes. Variables must be separated from each other by at least one space or by
a punctuation mark.
The case (upper/lower) is insignificant when declaring variable names. Space
cannot be used in a variable name.
LITERALS
A literal is a numeric value or a character string used to represent itself. So, literals
can be classified into two types.
Numeric literals
76
Numeric literals:
Integers
25
43
437
-57
etc
Floats
6.34
25E-03
0.1
+17.1
etc
Hello world
EMPLOYEE NAME
*******
A
*
We can represent single quote character itself in a non-numeric literal by writing it
twice.
Ex:
PL/SQL will also have literals, which are called as logical ( boolean) literals.
These are predetermined constants. The value it can take are TRUE, FALSE, and
NULL.
COMMENTS
A comment line begins with a double hyphen (--). In this case the entire
line will be treated as a comment.
Ex:
The comment line begins with a slash followed by an asterisk (/*) till the
occurrence of an asterisk followed by a slash (*/). In this case comment
lines can be extended to more than one lines.
77
Ex-1:
Ex-2:
CHAR:
NUMBER (4,2)
stores
4234.60
NUMBER (10)
stores
3289473348
CHAR (10)
stores
MASTERFILE
Syntax:
VARCHAR2 (20)
stores TRANSACTIONFILE
variable name
DATE
BOOLEAN: This data type stores only TRUE, FALSE or NULL values.
Syntax: variable name
Ex:
flag
BOOLEAN
BOOLEAN.
%TYPE declares a variable or constant to have the same data type as that of a
previously defined variable or of a column in a table or in a view.
NOT NULL causes creation of a variable or a constant that cannot have a NULL
value. If you attempt to assign the value NULL to a variable or a constant that has
been assigned a NOT NULL constraint, causes an error.
NOTE: As soon as a variable or constant has been declared as NOT NULL, it must be
assigned a value. Hence every NOT NULL declaration of a variable or constant needs
to be followed by PL/SQL expression that loads a value into the variable or constant
declared.
DECLARING VARIABLES
We can declare a variable of any data type either native to the ORACLE or native to
PL/SQL. Variables are declared in the DECLARE section of the PL/SQL block.
Declaration involves the name of the variable followed by its data type. All statement
must end with a semicolon (;) which is the delimiter in PL/SQL. To assign a value to
the variable the assignment operator (:=) is used.
79
Ex: pay
NUMBER (6,2);
in_stack
BOOLEAN;
name
VARCHAR2 (30);
room
CHAR (2);
date_of_purchase
DATE;
emp_name = SMITH;
DECLARING A CONSTANT:
Declaring a constant is similar to declaring a variable except that you have
to add
the key word CONSTANT and immediately assign a value to it. Thereafter, no further
assignment to the constants is possible.
Ex:
pf_percent
CONSTANT
80
current_sal
employee.sal % TYPE
In the above example, current_sal is the variable of PL/SQL block. It gets the data
type
and constraints of the column (field) sal belong to the table Employee. Declaring a variable
with the %TYPE attribute has two advantages
You do not need to know the data type of the table column
If you change the parameters of the table column, the variables parameters will
change as well.
PL/SQL allows you to use the %TYPE attribute in a nesting variable declaration.
The following example illustrates several variables defined on earlier %TYPE
declarations in a nesting fashion.
Ex:
Dept_sales
INTEGER;
Area_sales
dept_sales %TYPE0;
Group_sales
area_sales %TYPE;
Regional_sales
area_sales %TYPE;
Corporate_sales
regional_sales %TYPE;
In case, variables for the entire row of a table need to be declared, then
instead
of declaring them individually, %ROWTYPE is used.
Ex: emp_row_var
employee %ROWTYPE;
81
In the second law, the two identifiers are unique and any change in one does
not affect the other.
PL/SQL OPERATORS
Operators are the glue that holds expressions together. PL/SQL operators can be
divided into
the following categories.
Arithmetic operators
Comparison operators
Logical operators
String operators
PL/SQL operators are either unary (i.e. they act on one value/variable) or binary
(they act on two values/variables)
1) ARITHMETIC OPERATORS:
Arithmetic operators are used for mathematical computations. They are
+
Addition
Multiplication
Division
**
82
2) COMPARISON OPERATORS:
Comparison operators return a BOOLEAN result, either TRUE or FALSE.
They are
Equality operator
5=3
!=
Inequality operator
a!=b
<>
Inequality operator
5<>3
-=
Inequality operator
john -= johny
<
a<b
>
a>b
<=
a<=b
>=
a>=b
In addition to this PL/SQL also provides some other comparison operators like LIKE,
IN,
BETWEEN, IS NULL etc.
LIKE:
Pattern-matching operator.
Ex-1:
beginning with
83
Checks to see if a value lies within a specified list of values. The syntax is
Syntax:
Returns TRUE.
the_value [NOT]
Returns TRUE
Returns FALSE.
3) LOGICAL OPERATORS.
PL/SQL implements 3 logical operations AND, OR and NOT. The NOT
operator is unary operator and is typically used to negate the result of a comparison
expression, where as the AND and OR operators are typically used to link together
multiple comparisons.
A AND B is true only if A returns TRUE and B returns TRUE else it is
FALSE.
A OR B is TRUE if either of A or B is TRUE. And it is FALSE if both A
and B
are FASLE.
84
NOT A
Returns TRUE
Returns TRUE.
String_1 || string_2
String_1 and string_2 both are strings and can be a string constants, string variables or
string expressions. The concatenation operator returns a resultant string consisting of
all the characters in string_1 followed by all the characters in string_2.
Ex : Chandra || shekhar
Returns Chandrashekhar
A=Engineering
B=College
C=VARCHAR2 (50)
C=A || || B
College.
NOTE-1: PL/SQL string comparisons are always case sensitive, i.e. aaa not
equal to
AAA.
NOTE-2: ORACLE has some built in functions that are designed to convert
from one
data type to another data type.
To_date:
Ex:
Returns 01-jan-1992.
Returns 01/01/1998.
2001.
To_date (1/1/02, mm/dd/rr);
Returns 123.99
Returns
123.99.
86
i.
FOR Loop
LOOP STATEMENT:
WHILE LOOP :
The WHILE loop enables you to evaluate a condition before a
sequence of statements are executed. This is different from the FOR loop
where you must execute the loop atleast once. The syntax for the WHILE loop
is as follows:
Syntax:
87
DECLARE
Count NUMBER(2) : = 0;
BEGIN
WHILE count < = 10
LOOP
Count : = count + 1;
Message('while loop executes');
END LOOP;
END;
EXIT;
EXIT WHEN statement enables you to specify the condition required to exit the
execution of the loop. In this case no if statement is required.
Ex-1: IF count > = 10
EXIT;
PL/SQL
block. The syntax is as follows
Syntax:
GOTO
<label name> ;
The label is surrounded by double brackets (<< >>) and label must not have a
semi colon after the label name. The label name does not contain a semi colon
because it is not a PL/SQL statement. But rather an identifier of a block of PL/SQL
code. You must have at least one statement after the label otherwise an error will
88
result. The GOTO destination must be in the same block, at the same level as or
higher than the GOTO statement itself.
Ex:
The entry point of the destination block is defined within << >> as
shown above, i.e. labels are written within the symbol
FOR LOOP:
FOR loop will allow you to execute a block of code repeatedly until
High_value LOOP
Statements to execute
END LOOP;
The loop_index is defined by oracle as a local variable of type integer.
REVERSE allows you to execute the loop in reverse order. The low_value ..
High_value is the range to execute the loop. These can be constants or
variables. The line must be terminated with loop with no semicolon at the end
of this line. You can list the statements to be executed until the loop is
executed is evaluated to false.
Ex: FOR v_count IN 1 .. 5 LOOP
Message ('for loop executes');
END LOOP;
In the above example the message 'for loop executes' is displayed five
times.
We can terminate the FOR loop permanently using EXIT statement
based on some BOOLEAN condition. Nesting of FOR loop can also be
89
allowed in PL/SQL. The outer loop executed once, then the inner loop is
executed as many times as the range indicates, and then the control is returned
to the outer loop until its range expires.
Ex: FOR out_count IN 1..2 LOOP
FOR in_count IN 1..2 LOOP
Message ('nested for loop');
END LOOP;
END LOOP;
In the above example the message 'nested for loop' is displayed four
times.
Let us discuss some examples from the understanding how to write a
PL/SQL block structure. Here we assume that a table called "EMP" is created
and the datas are already inserted into it.
Table name : EMP
Create table EMP
( emp_no
NUMBER (3),
name
VARCHAR2 (15),
salary
NUMBER (6,2),
dept
VARCHAR2 (15),
div
VARCHAR2 (2) );
EXAMPLE-1:
DECLARE
num
NUMBER (3);
sal
emp.salary %TYPE;
emp_name
emp.name %TYPE;
count
NUMBER (2) : = 1;
90
WHILE
LOOP
Count : = count + 1;
SELECT emp_no, name, salary INTO
Num, emp_name, sal FROM EMP
WHERE emp_no > 2150;
END LOOP;
Commit;
END;
In the above example there are five statements in the declaration part.
The num is a integer type, sal and emp_name takes the similar data type of
the salary and name columns of EMP table respectively. Count is a variable
of type integer and takes initial value 1. Starting_emp is a constant and it is of
integer type with immediately assigned value 134.
Between BEGIN and END key words, there are some SQL executable
statements used for manipulating the table data. The SELECT statement
extracts data stored in name and salary columns of EMP table corresponding
to the employee having employee number 134. It stores those values In the
variables emp_name and sal respectively.
If sal less than 4000 then the statements within the loop will be
executed. Within the loop, there are two SQL statements, the first one
increments the count value by 1 and the second statement is a SELECT
statement. The commit statement commits the changes made to that table. The
END statement terminates the PL/SQL block.
EXAMPLE-2:
This example assumes the existence of table accounts created by using
the following SQL statements.
Create table Accounts
(accnt_id
NUMBER(3),
91
name
VARCHAR2(25),
bal
NUMBER(6,2) );
PL/SQL block:
DECLARE
acct_balance NUMBER(6,2);
acct
debit_amt
BEGIN
SELECT bal INTO acct_balance FROM Accounts
WHERE accnt_id = acct;
IF acct_balance = debit_amt THEN
UPDATE Accounts
SET bal : = bal - debit_amt WHERE accnt_id = acct;
ELSE
Message ('insufficient amount in account');
END IF;
END;
The above example illustrates the use of IF .. THEN .. ELSE.. END IF
condition control statements.
Declaration part declares one variable and two constants. The
SELECT statement extracts the amount in the bal column of Accounts table
corresponding to account number 312, and stores that in a variable
acct_balance.
If statement checks acct_balance for sufficient amount before
debiting. It updates the table Accounts if it has sufficient amount in the
balance, else it displays a message intimating insufficient fund in the account
of specified accnt_id.
EXAMPLE-3:
92
NUMBER (6),
product
VARCHAR2 (15),
quantity
NUMBER (5) );
VARCHAR2 (50),
d_ate
DATE );
PL/SQL block :
DECLARE
num_in_stack
NUMBER(5);
BEGIN
SELECT quantity INTO num_in_stack
FROM Inventory WHERE product = 'gasket';
IF num_in_stack > 0 THEN
UPDATE Inventory SET quantity : = quantity - 1
WHERE product = 'gasket';
INSERT INTO Purchase_record
VALUES (' One gasket purchased', sysdate);
ELSE
INSERT INTO Purchase_record
VALUES ('no gasket availabel',sysdate);
Message ( 'there are no more gasket in stack' );
END IF;
Commit;
END;
The above block of PL/SQL code does the following;
It determines how many gaskets are left in stack.
If the number left in staff is greater than zero, it updates the inventory
to reflect the sale of a gasket.
93
94
RAISE statement acts like CALL statement of high level languages. It has
general format
RAISE < name of exception >
When RAISE statement is executed, it stops the normal processing of
PL/SQL block
of code and control passes to an error handler block of the code at the end
of PL/SQL
program block ( EXCEPTION section ).
An exception declaration declares a name for user defined error conditions that
the PL/SQL code block recognizes. It can only appear in the DECLARE section of the
PL/SQL code which preceedes the key word BEGIN.
EXAMPLE :
DECLARE
--------------zero_commission
Exception;
--------------BEGIN
----------------IF commission = 0 THEN
RAISE zero_commission;
-----------------------EXCEPTION
WHEN zero_commission THEN
Process the error
END;
Exception handler (error handler block ) is written between the key words
EXCEPTION and END. The exception handling part of a PL/SQL code is
optional. This block of code specifies what action has to be taken when the named
exception condition occurs.
95
The naming convention for exception name are exactly the same as those for
variables or constants. All the rules for accessing an exception from PL/SQL
blocks are same as those for variables and constants. However, it should be noted
that exceptions cannot be passed as arguments to functions or procedures like
variables or constants.
NO_DATA_FOUND
TOO_MANY_ROWS
one rows.
VALUE_ERROR
INVALID_NUMBER
ZERO_DIVIDE
PROGRAM_ERROR
STORAGE_ERROR
memory if corrupted.
DUP_VAL_ON_INDEX
INVALID_CURSOR
attempted.
CURSOR_ALREADY_OPEN
previously opened.
96
NOT_LOGGED_ON
LOGIN_DENIED
OTHERS
EXAMPLE-1:
This example writes PL/SQL code for validating accnt_id of Accounts table
so that it must not be left blank, if it is blank cursor should not be allowed to move to
the next field.
DECLARE
no_value
exception;
BEGIN
97
RAISE no_value;
ELSE
next_field;
END IF;
EXCEPTION
EXAMPLE-2:
DECLARE
balance
Accounts.bal %TYPE;
acount_num
Accounts.accnt_id %TYPE;
BEGIN
SELECT accnt_id bal INTO account_num, balance
FROM Accounts WHERE accnt_id > 0000;
EXCEPTION
WHEN no_data_found THEN
Message ('empty table');
END;
In the above example exception is used in the PL/SQL block. This exception is
predefined internal PL/SQL exception (NO_DATA_FOUND).
98
argument-list
data-type.
Variable-declarations
Where you declare any variables that are local to the function.
program-code
function.
error-handling-code
Notice that the function block is similar to the PL/SQL block that we discussed
earlier.
99
The keyword DECLARE has been replaced by FUNCTION header, which names
the function, describes the parameter and indicates the return type.
Functions can be called by using name( argument list )
Example:
FUNCTION check(b_exp in BOOLEAN,
True_number
in
NUMBER,
False_number in
NUMBER)
RETURN NUMBER IS
BEGIN
IF b_exp THEN RETURN true_number;
ELSE
RETURN false_number;
END IF;
END;
The above function can be called as follows.
Check ( 2 > 1, 1 , 0)
Check (5 = 0, 1, 0)
PROCEDURES:
The declaration of procedures is almost identical to that of function
and the syntax
is given below.
PROCEDURE name [(argument list)] {IS,AS}
Variable declaration
BEGIN
Program code
[EXCEPTION
Error handling code ]
END;
100
Here name is the name that you want to give the procedure and all other are
similar to function declaration. Procedure declaration resembles a function declaration
except that there is no data type and key word PROCEDURE is used instead of
FUNCTION.
Ex:
NUMBER;
BEGIN
Temp_num : = A;
A : = B;
B : = temp_num;
END;
The above procedure can be called as follows.
Swapn (3,4);
Swapn (-6,7);
DATABASE TRIGGERS :
PL/SQL can be used to write data base triggers. Triggers are used to define code
that is executed/fired when certain actions or event occur. At the data base level,
triggers can be defined for events such as inserting a record into a table, deleting a
record, and updating a record.
101
NUMBER;
maxsal
NUMBER;
102
BEGIN
SELECT min_sal, max_sal INTO minsal, maxsal
FROM salary-mast WHERE JOB = :new.job;
IF ( :new-sal < minsal or :new.sal > maxsal ) THEN
Message ( 'salary out of range' );
END IF;
END;
3.15 CURSOR IN PL/SQL:
PL/SQL cursors provide a way for your program to select multiple rows of data
from the database and then to process each row individually. Cursors are PL/SQL
constructs that enable you to process, one row at a time, the results of a multi row
query.
ORACLE uses work areas to execute SQL statements, PL/SQL allows user to
name private work areas and access the stored information. The PL/SQL
constructs to identify each and every work area used by SQL is called a Cursor.
There are 2 types of cursors.
Implicit cursors
Explicit cursors
Implicit cursors are declared by ORACLE for each UPDATE, DELETE and
INSERT SQL commands. Explicit cursors are declared and used by the user to
process multiple row, returned by SELECT statement.
The set of rows returned by a query is called the Active Set. Its size depends on
the number of rows that meet the search criteria of the SQL query. The data that is
stored in the cursor is called the Active Data Set.
103
A. N. Sharanu
Asst. Professor
22,000.00
1345
N. Bharath
Senior Lecturer
17,000.00
1400
M. Mala
Lab Incharge
9,000.00
Table3.1
1) EXPLICIT CURSOR MANAGEMENT:
The following are the steps to using explicitly defined cursors within PL/SQL
Declare the cursor
Open the cursor
Fetch data from the cursor
Close the cursor
Declaring the cursor :
Declaring a cursor enables you to define the cursor and assign a name to it. It has
following syntax.
CURSOR cursor-name
IS SELECT statement
Ex: CURSOR c_name IS
SELECT emp_name FROM Emp WHERE dept = 'physics'
Opening a cursor:
104
Opening a cursor executes the query and identifies the active set that contains
all the rows, which meet the query search criteria.
Syntax :
OPEN cursor_name
Ex:
OPEN c_name
Open statement retrieves the records from the database and places it in the
cursor (private SQL area).
Record-list is the list of variables that will receive the columns (fields ) from the active set.
Ex: LOOP
---------------------FETCH c_name INTO name;
----------END LOOP;
Closing a cursor :
Closing statement closes/deactivates/disables the previously opened cursor and
makes the active set undefined. Once it is closed, you cannot perform any
operations on it. Once a cursor is closed, the user can reopen the cursor by using
Open statement.
105
Syntax :
Ex:
CLOSE cursor_name
CLOSE c_name;
EXAMPLE-1 :
The HRD manager has decided to raise the salary for all the employees in
the physics department by 0.05. whenever any such raise is given to the employees, a
record for the same is maintained in the emp_raise table ( the data table definitions are
given below ). Write a PL/SQL block to update the salary of each employee and insert
the record in the emp_raise table.
Tabe: employee
emp_code
varchar (10)
emp_name
varchar (10)
dept
varchar (15)
job
varchar (15)
salary
number (6,2)
Table: emp_raise
emp_code Varchar(10)
raise_date
Date
raise_amt
Number(6,2)
Solution:
DECLARE
CURSOR c_emp IS
SELECT emp_code, salary FROM employee
WHERE dept = 'physics';
str_emp_code
employee.emp_code %TYPE;
num_salary
employee.salary %TYPE;
BEGIN
OPEN c_emp;
LOOP
FETCH c_emp INTO str_emp_code, num_salary;
UPDATE employee SET salary : = num_salary + (num_salary * 0.05)
106
Evaluates to TRUE if the last fetch is failed i.e. no more rows are
left.
Syntax: cursor_name %NOT FOUND
%FOUND:
%ISOPEN:
to
FALSE.
%ROWCOUNT:
Syntax:
cursor_name %ISOPEN
EXAMPLE :
DECLARE
v_emp_name
varchar2(32);
v_salary_rate
number(6,2);
107
%ROWCOUNT.
v_payroll_total
number(9,2);
v_pay_type
char;
CURSOR c_emp IS
SELECT emp_name, pay_rate, pay_type FROM employee
WHERE emp_dept = 'physics'
BEGIN
IF c_name %ISOPEN THEN
RAISE
not_opened
ELSE
OPEN c_emp;
LOOP
FETCH c_emp INTO
v_emp_name,
v_salary_rate,
v_pay_type;
EXIT WHEN c_emp % NOTFOUND;
IF v_pay_type = 'S' THEN
v_payroll_total : = (v_salary_rate * 1.25 );
ELSE
v_payroll_total : = (v_salary_rate * 40);
END IF;
INSERT INTO weekly_salary VALUES ( 'v_payroll_total' );
END LOOP;
CLOSE c_emp;
EXCEPTION
WHEN not_opened
Message ( 'cursor is not opened' );
END;
REFERENCES:1. Teach yourself PL/SQL in 21 days
- SAMS Publications.
2. ORACLE-7
- Ivan Bayross.
- Ivan Bayross.
- David McClanahan.
108
MODULE 4
4.1
Introduction
Measure Of Quality
We can discuss the goodness of relation schema in two levels.
1. Logical Level
All you know, logical level represent the middle level in three level architecture
of DBMS. The logical level describes how users interpret the relation schema and
the meaning of their attributes. Having good relation schema at this level enables
users to understand clearly the meaning of data in relations and hence to formulate
the queries correctly.
2 Implementation Level
It is the lowermost level in the DBMS architecture, which describes how the
tuples in base relations are stored and updated. This level applies only to the
storage level in the database. But the former logical level applies to both view
level and logical level. And the database becomes effective as much as with the
effective storage.
4.2
109
4.3 Constraints
110
111
constraint states that a tuple in one relation that refer to another relation must refer
to an existing tuple in that relation. To define Referential integrity constraint first
we have to define the concept of foreign key (FK).
A set of attributes FK in relation schema R1, is a foreign key of R1 that references
the R2 relation if it satisfies the following two rules.
1. The attributes in FK have the same domain as the PK attributes of R2. The
attributes FK are said to reference or refer to the relation R2.
2. A value of FK in tuple t1 of the current state r1(R) either occurs as a value of
PK for some tuple t2 in the current state r2(R) or is null.
If
t1[FK]=t2[PK] then we can say that the tuple t1 refers to the tuple t2.
R1 is referencing relation
R2 is referenced relation.
Key constraint
A relation is defined as a set of tuples. By definition of a relation all the
tuples in a relation are distinct. i.e no two tuples can have the same value for all
their attributes. There are some subsets of relation schema R with the property that
no two tuples in any relation state r of R should have the same value for these
attributes.
A key K of a relation schema R is a super key of R with the additional property
that removing any attribute A from K leaves a set of attributes K that is not a
super key of R any more. Hence key satisfies the following two constraints.
1. The two distinct tuples in any state of the relation cannot have identical
values for all the attributes in the key.
2. It is a minimal super key. i.e from a super key we cannot remove any
attribute and still have the uniqueness constraint to hold the first condition.
Null Constraint
It specifies whether the null values are permitted to an attribute in a database.
relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that
have t1[X] =t2[X], they must also have t1[Y] =t2[Y]. i. e. the values of the Y
component of a tuple in r depends on the values of X component or the X
component determines the value of Y component.
i.e
1. The constraint on R states that there cannot be more than one tuple with
a given X value in any relation state r(R) X Y for any subset of
attributes Y of R.
2. If X Y in R doesnt say whether or not Y X in R.
A functional dependency FD: is called trivial if Y is a subset of X.
Definition: A functional dependency, denoted by XY, between two sets of
attributes X and Y that are subsets of the attributes of relation R, specifies that the
values in a tuple corresponding to the attributes in Y are uniquely determined by
the values corresponding to the attributes in X.
For example, the social security number uniquely determines a name;
SSN Name
Functional dependencies are determined by the semantics of the relation, in
general, they cannot be determined by inspection of an instance of the relation.
That is, a functional dependency is a constraint and not a property derived from a
relation.
Inference rules
Armstrong's axioms - sound and complete i.e, enable the computation of any
functional dependency.
Functional dependencies are
1. Reflexivity - if the B's are a subset of the A's then A B
2. Augmentation - If A B, then A, C B, C.
3. Transitivity - If A B and B C then A C.
Additional inference rules
4. Decomposition - If A B, C then A B
5. Union - If A B and A C then A B, C
6. Pseudo transitive - If A B and C, B D then C, A D
Equivalence of sets of functional dependencies
Two functional dependencies S & T are equivalent iff S T and T S.
The dependency {A_1, ..., A_n} {B_1, ..., B_m}
113
4.8 Normalization
In relational database theory, normalization is the process of restructuring the
logical data model of a database to eliminate redundancy, organize data
114
efficiently, reduce repeating data and to reduce the potential for anomalies
during data operations. Data normalization also may improve data consistency
and simplify future extension of the logical data model. The formal
classifications used for describing a relational database's level of normalization
are called normal forms (NF).
A non-normalized database can suffer from data anomalies:
A non-normalized database may store data representing a particular referent in
multiple locations. An update to such data in some but not all of those locations
results in an update anomaly, yielding inconsistent data. A normalized database
prevents such an anomaly by storing such data (i.e. data other than primary
keys) in only one location.
A non-normalized database may have inappropriate dependencies, i.e.
relationships between data with no functional dependencies. Adding data to such
a database may require first adding the unrelated dependency. A normalized
database prevents such insertion anomalies by ensuring that database relations
mirror functional dependencies.
Similarly, such dependencies in non-normalized databases can hinder deletion.
That is, deleting data from such databases may require deleting data from the
inappropriate dependency. A normalized database prevents such deletion
anomalies by ensuring that all records are uniquely identifiable and contain no
extraneous information.
4.9 Normal forms
Edgar F. Codd originally defined the first three normal
The first normal form requires that tables be made up of a primary key and a
number of atomic fields, and the second and third deal with the relationship of
non-key fields to the primary key. These have been summarized as requiring that
all non-key fields be dependent on "the key, the whole key and nothing but the
key". In practice, most applications in 3NF are fully normalized. However,
research has identified potential update anomalies in 3NF databases. BCNF is a
further refinement of 3NF that attempts to eliminate such anomalies.
The fourth and fifth normal forms (4NF and 5NF) deal specifically with the
representation of many-many and one-many relationships. Sixth normal form
(6NF) only applies to temporal databases.
115
Address Age
College_Degree
Table 4.1
Now we can analyze this relation. Now check what are the possible values of
each attributes. Here SSN and Age will have only one value for a person. But
The college_Degree will have more than one value. And the address and Name
of person can be divided into more than one attributes. Hence this relation is not
in 1NF. So let us change this relation schema into 1NF by dividing this relation
into two relations.
Name FName, MInit, LName
Address ApartmentNo, City
Person_Residence
SSN FName
Table 4.2
College_Degree
SSN
UG
PG
116
Table 4.3
4.11Second normal form (2NF)
First, the table must be in 1NF, plus, we want to make sure that every
Non-Primary-Key attribute (field) is fully functionally dependent upon the
ENTIRE Primary-Key for its existence. This rule ONLY applies when you have
a multi-part (concatenated) Primary Key (PK).
It requires that data stored in a table with a composite primary key must not be
dependent on only part of the table's primary key. And the database must meet
all the requirements of the first normal form.
Take each non-key field, and ask this question: If I knew part of the PK, could I
tell what the non-key field would be.
Inventory
Description
Supplier
Cost
Supplier_Address
Table4.4
In this inventory table, Description combined with Supplier is our PK. This is
because we have two of the same product that come from different suppliers.
There are two non-key fields. So, we can ask the questions:
If we know just Description, can we find out Cost? No, because we have more
than one supplier for the same product.
If we know just Supplier, and we find out Cost? No, because we need to know
what the Item is as well.
Therefore, Cost is fully, functionally dependent upon the ENTIRE PK
(Description-Supplier) for its existence.
If we know just Description, can we find out Supplier Address? No, because
we have more than one supplier for the same product.
If we know just Supplier, Can we find out Supplier Address? Yes. The
Address does not depend upon the Description of the item.
Therefore, Supplier Address is NOT functionally dependent upon the ENTIRE PK
(Description-Supplier) for its existence.
We must get rid of Supplier Address from this table.
117
Inventory
Description
Supplier
Cost
Table 4.5
Supplier
Name
Supplier_Address
Table 4.6
At this point, since it is the "Supplier" table, we can rename the "Supplier"
filed to "Name." Name is the PK for this new table.
General Definition:
A relation schema R is in second normal form (2NF) if every nonprime
attribute A in R is not partially dependent on any key of R.
Auth_Name
#Pages
Auth_Affil_No
Table 4.7
Again, just ask the questions:
If I know # of Pages, can I find out Author's Name? No. Can I find out Author's
affiliation No? No.
If I know Author's Name, can I find out # of Pages? No. Can I find out Author's
affiliation No? YES.
118
Auth_Name
Auth_Name
#Pages
#Pages
Table 4.8
Author
Name
Auth_Affil_No
Table 4.9
General Definition:
A relation schema R is in 3NF if, whenever a nontrivial functional
dependency XA holds in R,
Either a) X is a Superkey
Or
b) Y is a prime attribute of R.
119
(b) there is more than one candidate key in the relation, and
(c) the keys are not disjoint, that is, some attributes in the keys are common.
The BCNF differs from the 3NF only when there are more than one candidate
keys and the keys are composite and overlapping. Consider for example, the
relationship
enrol (sno, sname, cno, cname, date-enrolled)
Let us assume that the relation has the following candidate keys:
(sno, cno)
(sno, cname)
(sname, cno)
(sname, cname)
(we have assumed sname and cname are unique identifiers). The relation is in
3NF but not in BCNF because there are dependencies
sno -> sname
cno -> cname
where attributes that are part of a candidate key are dependent on part of
another candidate key. Such dependencies indicate that although the relation is
about some entity or association that is identified by the candidate keys
e.g. (sno, cno), there are attributes that are not about the whole thing that the
keys identify. For example, the above relation is about an association
(enrolment) between students and subjects and therefore the relation needs to
include only one identifier to identify students and one identifier to identify
subjects. Providing two identifiers about students (sno, sname) and two keys
about subjects (cno, cname) means that some information about students and
subjects that is not needed is being provided. This provision of information
will result in repetition of information and the anomalies. If we wish to include
further information about students and courses in the database, it should not be
done by putting the information in the present relation but by creating new
relations that represent information about entities student and subject.
These difficulties may be overcome by decomposing the above relation in the
following three relations:
(sno, sname)
(cno, cname)
(sno, cno, date-of-enrolment)
120
We now have a relation that only has information about students, another only
about subjects and the third only about enrolments. All the anomalies and
repetition of information have been removed.
121
the other between employees and programming languages). Both the above
relationships are many-to-many i.e. one programmer could have several
qualifications and may know several programming languages. Also one
qualification may be obtained by several programmers and one programming
language may be known to many programmers.
Functional dependency A -> B relates one value of A to one value of B while
multivalued dependency A ->> B defines a relationship in which a set of
values of attribute B are determined by a single value of A.
Now, more formally, X ->> Y is said to hold for R(X, Y, Z) if t1 and t2 are two
tuples in R that have the same values for attributes X and therefore with t1[x]
= t2[x] then R also contains tuples t3 and t4 (not necessarily distinct) such that
t1[x] = t2[x] = t3[x] = t4[x]
t3[Y] = t1[Y] and t3[Z] = t2[Z]
t4[Y] = t2[Y] and t4[Z] = t1[Z]
In other words if t1 and t2 are given by
t1 = [X, Y1, Z1], and
t2 = [X, Y2, Z2]
then there must be tuples t3 and t4 such that
t3 = [X, Y1, Z2], and
t4 = [X, Y2, Z1]
We are therefore insisting that every value of Y appears with every value of Z
to keep the relation instances consistent. In other words, the above conditions
insist that X alone determines Y and Z and there is no relationship between Y
and Z since Y and Z appear in every possible pair and hence these pairings
present no information and are of no significance.
122
123
to be in the 5NF if and only if it is in 4NF and the candidate keys imply every
join dependency in it.
Repetition of information.
Loss of information.
Module 5
.
124
Database management systems developed using the above types of architectures are
termed parallel database management systems; rather than DDBMS they utilize
parallel processor technology. In another type of architecture called shared nothing
architecture, every processor has its own primary and secondary (disk) memory, no
common memory exists and the processors communicate over a high-speed
interconnection network. Although the shared nothing architecture resembles a
distributed database computing environment, major differences exist in the mode of
operation. In shared nothing architecture, there is symmetry and homogeneity of
nodes; this is not true of the distributed database environment where heterogeneity of
nodes is very common.Advantages of Distributed Databases:1. Management of distributed data with different levels of transparency: Ideally,
a DBMS should be distribution transparent in the sense of hiding the details of where
each file is physically stored within the system. The following types of transparencies
are possible:
Distribution or network transparency: This refers to the freedom for the user
from the operational details of the network. It may be divided into location
transparency and naming transparency. Location transparency refers to the
fact that the command used to perform a task is independent of the location of
data and the location of the system where the command was issued. Naming
transparency implies that once a name is specified, the named objects can be
accessed unambiguously without additional specification.
125
software are distributed over several sites one site may fail while other sites continue
to operate. Only the data and software that exist at the failed state cannot be accessed.
This improves both reliability and availability.
3. Improved performance: A distributed DBMS fragments the database by keeping
the data closer to where it is needed most. Data localization reduces the contention for
CPU and I/O services and simultaneously reduces access delays involved in wide area
networks. When a large database is distributed over multiple sites, smaller databases
exist at each site. As a result, local queries and transactions accessing data at a single
site have better performance because of the small local databases.
Moreover,
126
This is the process of breaking up the database into logical units called fragments,
which may be assigned for storage at the various sites. There are mainly two types of
fragmentation:
Horizontal fragmentation
Vertical fragmentation
127
128
. Differences in query languages: Even with the same data model, the
languages and their versions vary. For example, SQL has multiple versions
like SQL-89, SQL-92 (SQL2), and SQL3, and each system has its own set of
data types, comparison operators, string manipulation features, and so on.
Semantic Heterogeneity.
Semantic heterogeneity occurs when there are differences in the meaning,
interpretation, and intended use of the same or related data. Semantic heterogeneity
130
among component database systems (OBSs) creates tne mggest hurdle in designing
global schemas of heterogeneous databases. The design autonQmy of component
OBSs refers to their freedom of choosing the following design parameters, which In
turn affect the eventual complexity of the FOBS:
. The universe of discourse from which the data is drawn: For example, two
customer accounts databases in the federation may be from United States and
Japan with entirely different sets of attributes about customer accounts
required by the accounting practices. Currency rate fluctuations would also
present a problem. Hence, relations in these two databases which have
identical nameS---CUSTOMER or ACCOUNT may have some common and
some entirely distinct information.
Derivation of summaries: Aggregation, summarization, and other dataprocessing features and operations supported by the system.
131
strategy.
On the other hand, a DDBMS that supports full distribution, fragmentation, and
replication transparency allows the user to specify a query or update request on
the schema just as though the DBMS were centralized. For updates, the DDBMS
is responsible for maintaining consistency among replicated items by using one of
the distributed concurrency control algorithms. For queries, a query decomposi.
tion module must break up or decompose a query into subqueries that can be
executed at the individual sites. In addition, a strategy for combining the results of
the subqueries to form the query result must be generated. Whenever the DDBMS
determines that an item referenced in the query is replicated, it must choose or
materialize a particular replica during query execution.
To determine which replicas include the data items referenced in a query, the
DDBMS refers to the fragmentation, replication, and distribution information
stored in the DDBMS catalog. For vertical fragmentation, the attribute list for
each fragment is kept in the catalog. For horizontal fragmentation, a condition,
sometimes called a guard, is kept for each fragment. This is basically a selection
condition that specifies which tuples exist in the fragment; it is called a guard
132
because only tuples that satisfy this condition are permitted to be stored in the
fragment. For mixed fragments, both the attribute list and the guard can. dition are
kept in the catalog.
Dealing with multiple copies of the data items: The concurrency control
method is responsible for maintaining consistency among these copies. The
recovery method is responsible for making a copy consistent with other
copies if the site on which the copy is stored fails and recovers later.
Failure of individual sites: The DDBMS should continue to operate with its
running sites, if possible, when one or more individual sites fail. When a site
recovers, its local database must be brought up to date with the rest of the
sites before it rejoins the system.
Failure of communication links: The system must be able to deal with failure
of one or more of the communication links that connect the sites. An extreme
case of this problem is that network partitioning may occur. This breaks up the
sites into two or more partitions, where the sites within each partition can
communicate only with one another and not with sites in other partitions. .
133
References
1. Fundamentals of Database System Elmasri and Navathe (3rd Edition),Pearson
Education Asia
2. Database System Concepts
C.J.Date
(7th
Edition)
Pearson
Education Asia
4. Database Principles, Programming and Performance Patrick ONeil, Elizabeth
ONeil
5. An Introduction to Database Systems - Bibin C. Desai
6. Teach yourself PL/SQL in 21 days
7. SQL,PLSQL
- SAMS Publications.
- Ivan Bayross.
- David McClanahan.
134