You are on page 1of 45

w

w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
CS 2255 DATABASE MANAGEMENT SYSTEMS

UNIT1. INTRODUCTION
A database is a structured collection of data. The data are typically organized to model relevant
aspects of reality (for example, the availability of rooms in hotels), in a way that supports
processes requiring this information (for example, finding a hotel with vacancies).
The term database is correctly applied to the data and their supporting data structures, and not to
the database management system (DBMS). The database data collection with DBMS is called a
database system.
The term database system implies that the data are managed to some level of quality (measured
in terms of accuracy, availability, usability, and resilience) and this in turn often implies the use
of a general-purpose database management system (DBMS).
[1]
A general-purpose DBMS is
typically a complex software system that meets many usage requirements to properly maintain
its databases which are often large and complex.
This is specially the case with client-server, near-real time transactional systems, in which
multiple users have access to data, data is concurrently entered and inquired for in ways that
preclude single-thread batch processing. Most of the complexity of those requirements are still
present with personal, desktop-based database systems.
Well known DBMSs include Oracle, FoxPro, IBM DB2, Linter, Microsoft Access, Microsoft
SQL Server, MySQL,PostgreSQL and SQLite. A database is not generally portable across
different DBMS, but different DBMSs can inter-operate to some degree by using standards like
SQL and ODBC together to support a single application built over more than one database. A
DBMS also needs to provide effective run-time execution to properly support (e.g., in terms of
performance, availability, and security) as many database end-users as needed.
A way to classify databases involves the type of their contents, for example: bibliographic,
document-text, statistical, or multimedia objects. Another way is by their application area, for
example: accounting, music compositions, movies, banking, manufacturing, or insurance.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
The term database may be narrowed to specify particular aspects of organized collection of data
and may refer to the logical database, to the physical database as data content in computer data
storage or to many other database sub-definitions.
Purpose of Database System
A DBMS has evolved into a complex software system and its development typically requires
thousands of person-years of development effort. Some general-purpose DBMSs, like Oracle,
Microsoft SQL Server, FoxPro, and IBM DB2, have been undergoing upgrades for thirty years
or more. General-purpose DBMSs aim to satisfy as many applications as possible, which
typically makes them even more complex than special-purpose databases. However, the fact that
they can be used "off the shelf", as well as their amortized cost over many applications and
instances, makes them an attractive alternative (Vs. one-time development) whenever they meet
an application's requirements.
Though attractive in many cases, a general-purpose DBMS is not always the optimal solution:
When certain applications are pervasive with many operating instances, each with many users, a
general-purpose DBMS may introduce unnecessary overhead and too large "footprint" (too large
amount of unnecessary, unutilized software code). Such applications usually justify dedicated
development. Typical examples are email systems, though they need to possess certain DBMS
properties: email systems are built in a way that optimizes email messages handling and
managing, and do not need significant portions of a general-purpose DBMS functionality.
Views of data
In database theory, a view consists of a stored
[clarify]
query accessible as a virtual
[clarify]
table in a
relational database or a set of documents in a document-oriented database composed of the result
set of a query or map-and-reduce functions. Unlike ordinary tables (base tables) in a relational
database, a view does not form part of the physical schema: it is a dynamic, virtual table
computed or collated from data in the database. Changing the data in a table alters the data
shown in subsequent invocations of the view. In some NoSQL databases views are the only way
to query data.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
Views can provide advantages over tables:
Views can represent a subset of the data contained in a table
Views can join and simplify multiple tables into a single virtual table
Views can act as aggregated tables, where the database engine aggregates data (sum,
average etc.) and presents the calculated results as part of the data
Views can hide the complexity of data; for example a view could appear as Sales2000 or
Sales2001, transparently partitioning the actual underlying table
Views take very little space to store; the database contains only the definition of a view,
not a copy of all the data it presents
Depending on the SQL engine used, views can provide extra security
Views can limit the degree of exposure of a table or tables to the outer world
Just as functions (in programming) can provide abstraction, so database users can create
abstraction by using views. In another parallel with functions, database users can manipulate
nested views, thus one view can aggregate data from other views. Without the use of views the
normalization of databases above second normal form would become much more difficult.
Views can make it easier to create lossless join decomposition.
Just as rows in a base table lack any defined ordering, rows available through a view do not
appear with any default sorting. A view is a relational table, and the relational model defines a
table as a set of rows. Since sets are not ordered - by definition - the rows in a view are not
ordered, either. Therefore, an ORDER BY clause in the view definition is meaningless. The SQL
standard (SQL:2003) does not allow an ORDER BY clause in a subselect
[clarify]
in a CREATE
VIEW statement, just as it is not allowed in a CREATE TABLE statement. However, sorted data
can be obtained from a view, in the same way as any other table - as part of a query statement.
Nevertheless, some DBMS (such as Oracle Database and SQL Server
[ambiguous]
) allow a view to
be created with an ORDER BY clause in a subquery, affecting how data is displayed.
Data models A data model is an abstract structure that provides the means to effectively
describe specific data structures needed to model an application. As such a data model needs
sufficient expressive power to capture the needed aspects of applications. These applications are
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
often typical to commercial companies and other organizations (like manufacturing, human-
resources, stock, banking, etc.). For effective utilization and handling it is desired that a data
model is relatively simple and intuitive. This may be in conflict with high expressive power
needed to deal with certain complex applications. Thus any popular general-purpose data model
usually well balances between being intuitive and relatively simple, and very complex with high
expressive power. The application's semantics is usually not explicitly expressed in the model,
but rather implicit (and detailed by documentation external to the model) and hinted to by data
item types' names (e.g., "part-number") and their connections (as expressed by generic data
structure types provided by each specific model).
Hierarchical model
In the Hierarchical model different record types (representing real-world entities) are embedded
in a predefined hierarchical (tree-like) structure. This hierarchy is used as the physical order of
records in storage. Record access is done by navigating through the data structure using pointers
combined with sequential accessing.
This model has been supported primarily by the IBM IMS DBMS, one of the earliest DBMSs.
Various limitations of the model have been compensated at later IMS versions by additional
logical hierarchies imposed on the base physical hierarchy.
Network model
In this model a hierarchical relationship between two record types (representing real-world
entities) is established by the set construct. A set consists of circular linked lists where one
record type, the set owner or parent, appears once in each circle, and a second record type, the
subordinate or child, may appear multiple times in each circle. In this way a hierarchy may be
established between any two record types, e.g., type A is the owner of B. At the same time
another set may be defined where B is the owner of A. Thus all the sets comprise a general
directed graph (ownership defines a direction), or network construct. Access to records is either
sequential (usually in each record type) or by navigation in the circular linked lists.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
This model is more general and powerful than the hierarchical, and has been the most popular
before being replaced by the Relational model. It has been standardized by CODASYL. Popular
DBMS products that utilized it were Cincom Systems' Total and Cullinet's IDMS. IDMS gained
a considerable customer base and exists and supported until today. In the 1980s it has adopted
the Relational model and SQL in addition to its original tools and languages.
An inverted file or inverted index of a first file, by a field in this file (the inversion field), is a
second file in which this field is the key. A record in the second file includes a key and pointers
to records in the first file where the inversion field has the value of the key. This is also the
logical structure of contemporary database indexes. The related Inverted file data model utilizes
inverted files of primary database files to efficiently directly access needed records in these files.
Notable for using this data model is the ADABAS DBMS of Software AG, introduced in 1970.
ADABAS has gained considerable customer base and exists and supported until today. In the
1980s it has adopted the Relational model and SQL in addition to its original tools and
languages.
Relational model
The relational model is a simple model that provides flexibility. It organizes data based on two-
dimensional arrays known as relations, or tables as related to databases. These relations consist
of a heading and a set of zero or more tuples in arbitrary order. The heading is an unordered set
of zero or more attributes, or columns of the table. The tuples are a set of unique attributes
mapped to values, or the rows of data in the table. Data can be associated across multiple tables
with a key. A key is a single, or set of multiple, attribute(s) that is common to both tables. The
most common language associated with the relational model is the Structured Query Language
(SQL), though it differs in some places.
Entity-relationship model
been applied in areas such as engineering and spatial databases, telecommunications and in
various scientific domains. The conglomeration of object oriented programming and database
technology led to this new kind of database. These databases attempt to bring the database world
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
and the application-programming world closer together, in particular by ensuring that the
database uses the same type system as the application program. This aims to avoid the overhead
(sometimes referred to as the impedance mismatch) of converting information between its
representation in the database (for example as rows in tables) and its representation in the
application program (typically as objects). At the same time, object databases attempt to
introduce key ideas of object programming, such as encapsulation and polymorphism, into the
world of databases.
A variety of these ways have been tried
[by whom?]
for storing objects in a database. Some products
have approached the problem from the application-programming side, by making the objects
manipulated by the program persistent. This also typically requires the addition of some kind of
query language, since conventional programming languages do not provide language-level
functionality for finding objects based on their information content. Others
[which?]
have attacked
the problem from the database end, by defining an object-oriented data model for the database,
and defining a database programming language that allows full programming capabilities as well
as traditional query facilities.
Object relational model
the relational model are sometimes classified as post-relational.
[7]
Alternate terms include
"hybrid database", "Object-enhanced RDBMS" and others. The data model in such products
incorporates relations but is not constrained by E.F. Codd's Information Principle, which requires
that
all information in the database must be cast explicitly in terms of values in relations and in no
other way
Some of these extensions to the relational model integrate concepts from technologies that pre-
date the relational model. For example, they allow representation of a directed graph with trees
on the nodes. The German company sones implements this concept in its GraphDB.
Some post-relational products extend relational systems with non-relational features. Others
arrived in much the same place by adding relational features to pre-relational systems.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
Paradoxically, this allows products that are historically pre-relational, such as PICK and
MUMPS, to make a plausible claim to be post-relational.
The resource space model (RSM) is a non-relational data model based on multi-dimensional
classification.
Database languages
Database languages are dedicated programming languages, tailored and utilized to
define a database (i.e., its specific data types and the relationships among them),
manipulate its content (e.g., insert new data occurrences, and update or delete existing
ones), and
query it (i.e., request information: compute and retrieve any information based on its
data).
Database languages are data-model-specific, i.e., each language assumes and is based on a
certain structure of the data (which typically differs among different data models). They typically
have commands to instruct execution of the desired operations in the database. Each such
command is equivalent to a complex expression (program) in a regular programming language,
and thus programming in dedicated (database) languages simplifies the task of handling
databases considerably. An expressions in a database language is automatically transformed (by
a compiler or interpreter, as regular programming languages) to a proper computer program that
runs while accessing the database and providing the needed results. The following are notable
examples:
SQL for the Relational model
A major Relational model language supported by all the relational DBMSs and a standard.
SQL was one of the first commercial languages for the relational model. Despite not adhering to
the relational model as described by Codd, it has become the most widely used database
language.
[10][11]
Though often described as, and to a great extent is a declarative language, SQL
also includes procedural elements. SQL became a standard of the American National Standards
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
Institute (ANSI) in 1986, and of the International Organization for Standards (ISO) in 1987.
Since then the standard has been enhanced several times with added features. However, issues of
SQL code portability between major RDBMS products still exist due to lack of full compliance
with, or different interpretations of the standard. Among the reasons mentioned are the large size,
and incomplete specification of the standard, as well as vendor lock-in.
OQL for the Object model
An object model language standard (by the Object Data Management Group) that has influenced
the design of some of the newer query languages like JDOQL and EJB QL, though they cannot
be considered as different flavors of OQL.
XQuery for the XML model
XQuery is an XML based database language (also named XQL). SQL/XML combines XQuery
and XML with SQL.
Database System Architecture
Database architecture (to be distinguished from DBMS architecture; see below) may be viewed,
to some extent, as an extension of data modeling. It is used to conveniently answer requirements
of different end-users from a same database, as well as for other benefits. For example, a
financial department of a company needs the payment details of all employees as part of the
company's expenses, but not other many details about employees, that are the interest of the
human resources department. Thus different departments need different views of the company's
database, that both include the employees' payments, possibly in a different level of detail (and
presented in different visual forms). To meet such requirement effectively database architecture
consists of three levels: external, conceptual and internal. Clearly separating the three levels was
a major feature of the relational database model implementations that dominate 21st century
databases.
[13]

www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
The external level defines how each end-user type understands the organization of its
respective relevant data in the database, i.e., the different needed end-user views. A single
database can have any number of views at the external level.
The conceptual level unifies the various external views into a coherent whole, global
view.
[13]
It provides the common-denominator of all the external views. It comprises all
the end-user needed generic data, i.e., all the data from which any view may be
derived/computed. It is provided in the simplest possible way of such generic data, and
comprises the back-bone of the database. It is out of the scope of the various database
end-users, and serves database application developers and defined by database
administrators that build the database.
The Internal level (or Physical level) is as a matter of fact part of the database
implementation inside a DBMS (see Implementation section below). It is concerned with
cost, performance, scalability and other operational matters. It deals with storage layout
of the conceptual level, provides supporting storage-structures like indexes, to enhance
performance, and occasionally stores data of individual views (materialized views),
computed from generic data, if performance justification exists for such redundancy. It
balances all the external views' performance requirements, possibly conflicting, in
attempt to optimize the overall database usage by all its end-uses according to the
database goals and priorities.
All the three levels are maintained and updated according to changing needs by database
administrators who often also participate in the database design.
The above three-level database architecture also relates to and being motivated by the concept of
data independence which has been described for long time as a desired database property and
was one of the major initial driving forces of the Relational model. In the context of the above
architecture it means that changes made at a certain level do not affect definitions and software
developed with higher level interfaces, and are being incorporated at the higher level
automatically. For example, changes in the internal level do not affect application programs
written using conceptual level interfaces, which saves substantial change work that would be
needed otherwise.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
In summary, the conceptual is a level of indirection between internal and external. On one hand
it provides a common view of the database, independent of different external view structures,
and on the other hand it is uncomplicated by details of how the data are stored or managed
(internal level). In principle every level, and even every external view, can be presented by a
different data model. In practice usually a given DBMS uses the same data model for both the
external and the conceptual levels (e.g., relational model). The internal level, which is hidden
inside the DBMS and depends on its implementation (see Implementation section below),
requires a different level of detail and uses its own data structure types, typically different in
nature from the structures of the external and conceptual levels which are exposed to DBMS
users (e.g., the data models above): While the external and conceptual levels are focused on and
serve DBMS users, the concern of the internal level is effective implementation details
Database users and AdministratorA database administrator (short form DBA) is a
person responsible for the installation, configuration, upgrade, administration, monitoring and
maintenance of databases in an organization.
[1]

The role includes the development and design of database strategies, system monitoring and
improving database performance and capacity, and planning for future expansion requirements.
They may also plan, co-ordinate and implement security measures to safeguard the database.
EntityRelationship model (E-R model )
an Entity Relationship model (ER model for short) is an abstract way to describe a database.
It usually starts with a relational database, which stores data in tables. Some of the data in these
tables point to data in other tables - for instance, your entry in the database could point to several
entries for each of the phone numbers that are yours. The ER model would say that you are an
entity, and each phone number is an entity, and the relationship between you and the phone
numbers is 'has a phone number'. Diagrams created to design these entities and relationships are
called entityrelationship diagrams or ER diagrams.
This article refers to the techniques proposed in Peter Chen's 1976 paper.
[1]
However, variants of
the idea existed previously,
[2]
and have been devised subsequently such as supertype and subtype
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
data entities
[3]
and commonality relationships (an example with additional concepts is the
enhanced entityrelationship model).
Using the three schema approach to software engineering, there are three levels of ER models
that may be developed. The conceptual data model is the highest level ER model in that it
contains the least granular detail but establishes the overall scope of what is to be included within
the model set. The conceptual ER model normally defines master reference data entities that are
commonly used by the organization. Developing an enterprise-wide conceptual ER model is
useful to support documenting the data architecture for an organization.
A conceptual ER model may be used as the foundation for one or more logical data models. The
purpose of the conceptual ER model is then to establish structural metadata commonality for the
master data entities between the set of logical ER models. The conceptual data model may be
used to form commonality relationships between ER models as a basis for data model
integration.
A logical ER model does not require a conceptual ER model especially if the scope of the logical
ER model is to develop a single disparate information system. The logical ER model contains
more detail than the conceptual ER model. In addition to master data entities, operational and
transactional data entities are now defined. The details of each data entity are developed and the
entity relationships between these data entities are established. The logical ER model is however
developed independent of technology into which it will be implemented.
One or more physical ER models may be developed from each logical ER model. The physical
ER model is normally developed be instantiated as a database. Therefore, each physical ER
model must contain enough detail to produce a database and each physical ER model is
technology dependent since each database management system is somewhat different.
The physical model is normally forward engineered to instantiate the structural metadata into a
database management system as relational database objects such as database tables, database
indexes such as unique key indexes, and database constraints such as a foreign key constraint or
a commonality constraint. The ER model is also normally used to design modifications to the
relational database objects and to maintain the structural metadata of the database.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
The first stage of information system design uses these models during the requirements analysis
to describe information needs or the type of information that is to be stored in a database. The
data modeling technique can be used to describe any ontology (i.e. an overview and
classifications of used terms and their relationships) for a certain area of interest. In the case of
the design of an information system that is based on a database, the conceptual data model is, at a
later stage (usually called logical design), mapped to a logical data model, such as the relational
model; this in turn is mapped to a physical model during physical design. Note that sometimes,
both of these phases are referred to as "physical design".
E-R Diagrams



Two related entities


An entity with an attribute


A relationship with an attribute


Primary key
An entity may be defined as a thing which is recognized as being capable of an independent
existence and which can be uniquely identified. An entity is an abstraction from the complexities
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
of a domain. When we speak of an entity, we normally speak of some aspect of the real world
which can be distinguished from other aspects of the real world.
[4]

An entity may be a physical object such as a house or a car, an event such as a house sale or a car
service, or a concept such as a customer transaction or order. Although the term entity is the one
most commonly used, following Chen we should really distinguish between an entity and an
entity-type. An entity-type is a category. An entity, strictly speaking, is an instance of a given
entity-type. There are usually many instances of an entity-type. Because the term entity-type is
somewhat cumbersome, most people tend to use the term entity as a synonym for this term.
Entities can be thought of as nouns. Examples: a computer, an employee, a song, a mathematical
theorem.
Introduction to relational databases
A relational database is a collection of data items organized as a set of formally described
tables from which data can be accessed easily. A relational database is created using the
relational model. The software used in a relational database is called a relational database
management system (RDBMS). A relational database is the predominant choice in storing data,
over other models like the hierarchical database model or the network model.
The relational database was first defined in June 1970 by Edgar Codd, of IBM's San Jose
Research Laboratory.
Terminology


www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
Relational database terminology.
Relational database theory uses a set of mathematical terms, which are roughly equivalent to
SQL database terminology. The table below summarizes some of the most important relational
database terms and their SQL database equivalents.

Relational term SQL equivalent
relation, base relvar table
derived relvar view, query result, result set
tuple row
attribute (name) column name
relation scheme, set of attributes set of column names
attribute value column (data)

UNIT II RELATIONAL MODEL

The relational model
The relational model for database management is a database model based on first-order
predicate logic, first formulated and proposed in 1969 by Edgar F. Codd.
[1][2]
In the relational
model of a database, all data is represented in terms of tuples, grouped into relations. A database
organized in terms of the relational model is a relational database.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m


Diagram of an example database according to the Relational model.



In the relational model, related records are linked together with a "key".
The purpose of the relational model is to provide a declarative method for specifying data and
queries: users directly state what information the database contains and what information they
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
want from it, and let the database management system software take care of describing data
structures for storing the data and retrieval procedures for answering queries.
Most implementations of the relational model use the SQL data definition and query language. A
table in an SQL database schema corresponds to a predicate variable; the contents of a table to a
relation; key constraints, other constraints, and SQL queries correspond to predicates. However,
SQL databases, including DB2, deviate from the relational model in many details; Codd fiercely
argued against deviations that compromise the original principles.
The catalog:
Types Keys
* Alternate key - An alternate key is any candidate key which is not selected to be the primary
key
* Candidate key - A candidate key is a field or combination of fields that can act as a primary
key field for that table to uniquely identify each record in that table.
For Eg:
The table:
Emloyee(Name,Address,Ssn,Employee_Idprimary_key,Phone_ext)
In the above example Ssn no. and employee identity are ccandidate keys.

* Compound key - compound key (also called a composite key or concatenated key) is a key
that consists of 2 or more attributes.
* Primary key - a primary key is a value that can be used to identify a unique row in a table.
Attributes are associated with it. Examples of primary keys are Social Security numbers
(associated to a specific person) or ISBNs (associated to a specific book).
In the relational model of data, a primary key is a candidate key chosen as the main method of
uniquely identifying a tuple in a relation.
For Eg:
Emloyee(Name,Address,Ssn,Employee_Idprimary_key,Phone_ext)

www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
* Superkey - A superkey is defined in the relational model as a set of attributes of a relation
variable (relvar) for which it holds that in all relations assigned to that variable there are no two
distinct tuples (rows) that have the same values for the attributes in this set. Equivalently a
superkey can also be defined as a set of attributes of a relvar upon which all attributes of the
relvar are functionally dependent.
For Eg:
Emloyee(Name,Address,Ssn,Employee_Idprimary_key,Phone_ext)
<Ssn,Name,Address>
<Ssn,Name>
<Ssn>
All the above are super keys.

* Foreign key - a foreign key (FK) is a field or group of fields in a database record that points to
a key field or group of fields forming a key of another database record in some (usually
different) table. Usually a foreign key in one table refers to the primary key (PK) of another
table. This way references can be made to link
For Eg:
For a Student....
School(Name,Address,Phone,School_Reg_noprimary_key

Relational algebra
In computer science, relational algebra is an offshoot of first-order logic and of algebra of sets
concerned with operations over finitary relations, usually made more convenient to work with by
identifying the components of a tuple by a name (called attribute) rather than by a numeric
column index, which is what is called a relation in database terminology.
The main application of relational algebra is providing a theoretical foundation for relational
databases, particularly query languages for such databases, chiefly among which is
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
Introduction
Relational algebra received little attention outside of pure mathematics until the publication of
E.F. Codd's relational model of data in 1970. Codd proposed such an algebra as a basis for
database query languages. (See section Implementations.)
Both a named and a unnamed perspective are possible for relational algebra, depending on
whether the tuples are endowed with component names or not. In the unnamed perspective, a
tuple is simply a member of a Cartesian product. In the named perspective, tuples are functions
from a finite set U of attributes (of the relation) to a domain of values (assumed distinct from
U).
[1]
The relational algebras obtained from the two perspectives are equivalent.
[2]
The typical
undergraduate textbooks present only the named perspective though,
[3][4]
and this article follows
suit.
Relational algebra is essentially equivalent in expressive power to relational calculus (and thus
first-order logic); this result is known as Codd's theorem. One must be careful to avoid a
mismatch that may arise between the two languages because negation, applied to a formula of
the calculus, constructs a formula that may be true on an infinite set of possible tuples, while the
difference operator of relational algebra always returns a finite result. To overcome these
difficulties, Codd restricted the operands of relational algebra to finite relations only and also
proposed restricted support for negation (NOT) and disjunction (OR). Analogous restrictions are
found in many other logic-based computer languages. Codd defined the term relational
completeness to refer to a language that is complete with respect to first-order predicate calculus
apart from the restrictions he proposed. In practice the restrictions have no adverse effect on the
applicability of his relational algebra for database purposes.
Primitive operations
As in any algebra, some operators are primitive and the others are derived in terms of the
primitive ones. It is useful if the choice of primitive operators parallels the usual choice of
primitive logical operators.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
Five primitive operators of Codd's algebra are the selection, the projection, the Cartesian
product (also called the cross product or cross join), the set union, and the set difference.
Another operator, rename was not noted by Codd, but the need for it is shown by the inventors of
ISBL. These six operators are fundamental in the sense that omitting any one of them causes a
loss of expressive power. Many other operators have been defined in terms of these six. Among
the most important are set intersection, division, and the natural join. In fact ISBL made a
compelling case for replacing the Cartesian product with the natural join, of which the Cartesian
product is a degenerate case.
Altogether, the operators of relational algebra have identical expressive power to that of domain
relational calculus or tuple relational calculus. However, for the reasons given in section
Introduction, relational algebra is less expressive than first-order predicate calculus without
function symbols. Relational algebra corresponds to a subset of first-order logic, namely Horn
clauses without recursion and negation.
Set operators
Although three of the six basic operators are taken from set theory, there are additional
constraints that are present in their relational algebra counterparts: For set union and set
difference, the two relations involved must be union-compatiblethat is, the two relations must
have the same set of attributes. Because set intersection can be defined in terms of set difference,
the two relations involved in set intersection must also be union-compatible.
The Cartesian product is defined differently from the one in set theory in the sense that tuples are
considered to be 'shallow' for the purposes of the operation. That is, the Cartesian product of an
n-tuple by an m-tuple has the 2-tuple "flattened" into an (n + m)-tuple. In set theory, the
Cartesian product is a set of 2-tuples. More formally, R S is defined as follows:
R S = {(r
1
, r
2
, ..., r
n
, s
1
, s
2
, ..., s
m
) | (r
1
, r
2
, ..., r
n
) R, (s
1
, s
2
, ..., s
m
) S}
Like the Cartesian product, the cardinality of the result is the product of the cardinalities of its
factors, i.e., |R S| = |R| |S|. In addition, for the Cartesian product to be defined, the two
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
relations involved must have disjoint headersthat is, they must not have a common attribute
name.
Projection ()
A projection is a unary operation written as where is a set of attribute
names. The result of such projection is defined as the set that is obtained when all tuples in R are
restricted to the set .
This specifies the specific subset of columns (attributes of each tuple) to be retrieved. To obtain
the names and phone numbers from an address book, the projection might be written
. The result of that projection would be
a relation which contains only the contactName and contactPhoneNumber attributes for each
unique entry in addressBook.
Selection ()
A generalized selection is a unary operation written as where is a propositional
formula that consists of atoms as allowed in the normal selection and the logical operators
(and), (or) and (negation). This selection selects all those tuples in R for which holds.
To obtain a listing of all friends or business associates in an address book, the selection might be
written as . The result would be a
relation containing every attribute of every unique record where isFriend is true or where
isBusinessContact is true.
Rename ()
A rename is a unary operation written as where the result is identical to R except that
the b attribute in all tuples is renamed to an a attribute. This is simply used to rename the
attribute of a relation or the relation itself.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
To rename the 'isFriend' attribute to 'is BusinessContact' in a relation,
might be used.
Domain relational calculus
In computer science, domain relational calculus (DRC) is a calculus that was introduced by
Michel Lacroix and Alain Pirotte as a declarative database query language for the relational data
model.
[1]

In DRC, queries have the form:

where each X
i
is either a domain variable or constant, and denotes a
DRC formula. The result of the query is the set of tuples X
i
to X
n
which makes the DRC formula
true.
This language uses the same operators as tuple calculus, the logical connectives (and), (or)
and (not). The existential quantifier () and the universal quantifier () can be used to bind the
variables.
Its computational expressiveness is equivalent to that of Relational algebra.
[2]

Examples
Let (A, B, C) mean (Rank, Name, ID)
and (D, E, F) to mean (Name, DeptName, ID)
Find all captains of the starship USS Enterprise:

In this example, A, B, C denotes both the result set and a set in the table Enterprise.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
Find Names of Enterprise crewmembers who are in Stellar Cartography:

In this example, we're only looking for the name, and that's B. F = C is a requirement, because
we need to find Enterprise crew members AND they are in the Stellar Cartography Department.
Tuple relational calculus

Tuple calculus is a calculus that was introduced by Edgar F. Codd as part of the relational
model, in order to provide a declarative database-query language for this data model. It formed
the inspiration for the database-query languages QUEL and SQL, of which the latter, although
far less faithful to the original relational model and calculus, is now the de-facto-standard
database-query language; viz., a dialect of SQL is used by nearly every relational-database-
management system. Lacroix and Pirotte proposed domain calculus, which is closer to first-order
logic and which showed that both of these calculi (as well as relational algebra) are equivalent in
expressive power. Subsequently, query languages for the relational model were called
relationally complete if they could express at least all of these queries.

Definition of the calculus
Relational database
Since the calculus is a query language for relational databases we first have to define a relational
database. The basic relational building block is the domain, or data type. A tuple is an ordered
multiset of attributes, which are ordered pairs of domain and value; or just a row. A relvar
(relation variable) is a set of ordered pairs of domain and name, which serves as the header for a
relation. A relation is a set of tuples. Although these relational concepts are mathematically
defined, those definitions map loosely to traditional database concepts. A table is an accepted
visual representation of a relation; a tuple is similar to the concept of row.
We first assume the existence of a set C of column names, examples of which are "name",
"author", "address" et cetera. We define headers as finite subsets of C. A relational database
schema is defined as a tuple S = (D, R, h) where D is the domain of atomic values (see relational
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
model for more on the notions of domain and atomic value), R is a finite set of relation names,
and
h : R 2
C

a function that associates a header with each relation name in R. (Note that this is a simplification
from the full relational model where there is more than one domain and a header is not just a set
of column names but also maps these column names to a domain.) Given a domain D we define
a tuple over D as a partial function
t : C D
that maps some column names to an atomic value in D. An example would be (name : "Harry",
age : 25).
The set of all tuples over D is denoted as T
D
. The subset of C for which a tuple t is defined is
called the domain of t (not to be confused with the domain in the schema) and denoted as dom(t).
Finally we define a relational database given a schema S = (D, R, h) as a function
db : R 2
T
D

that maps the relation names in R to finite subsets of T
D
, such that for every relation name r in R
and tuple t in db(r) it holds that
dom(t) = h(r).
The latter requirement simply says that all the tuples in a relation should contain the same
column names, namely those defined for it in the schema.
Fundamental operations Additional operations
SQL or Structured Query Language is a special-purpose programming language designed for
managing data in relational database management systems (RDBMS).
Originally based upon relational algebra and tuple relational calculus, its scope includes data
insert, query, update and delete, schema creation and modification, and data access control.
SQL was one of the first commercial languages for Edgar F. Codd's relational model, as
described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data
Banks".
[4]
Despite not adhering to the relational model as described by Codd, it became the most
widely used database language.
[5][6]
Although SQL is often described as, and to a great extent is,
a declarative language, it also includes procedural elements. SQL became a standard of the
American National Standards Institute (ANSI) in 1986, and of the International Organization for
Standards (ISO) in 1987. Since then, the standard has been enhanced several times with added
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
features. However, issues of SQL code portability between major RDBMS products still exist
due to lack of full compliance with, or different interpretations of, the standard. Among the
reasons mentioned are the large size and incomplete specification of the standard, as well as
vendor lock-in.
SQL fundamentals
Language elements
The SQL language is subdivided into several language elements, including:
Clauses, which are constituent components of statements and queries. (In some cases,
these are optional.)
[10]

Expressions, which can produce either scalar values or tables consisting of columns and
rows of data.
Predicates, which specify conditions that can be evaluated to SQL three-valued logic
(3VL) or Boolean (true/false/unknown) truth values and which are used to limit the
effects of statements and queries, or to change program flow.
Queries, which retrieve the data based on specific criteria. This is the most important
element of SQL.
Statements, which may have a persistent effect on schemata and data, or which may
control transactions, program flow, connections, sessions, or diagnostics.
o SQL statements also include the semicolon (";") statement terminator. Though not
required on every platform, it is defined as a standard part of the SQL grammar.
Integrity
In computing, data integrity refers to maintaining and assuring the accuracy and consistency of
data over its entire life-cycle,
[1]
and is an especially important feature of a database or RDBMS
system. Data warehousing and business intelligence in general demand the accuracy, validity and
correctness of data despite hardware failures, software bugs or human error. Data that has
integrity is identically maintained during any operation, such as transfer, storage or retrieval.
All characteristics of data, including business rules, rules for how pieces of data relate, dates,
definitions and lineage must be correct for its data integrity to be complete. When functions
operate on the data, the functions must ensure integrity. Examples include transforming the data,
storing history and storing metadata.
Types of integrity constraints
Data integrity is normally enforced in a database system by a series of integrity constraints or
rules. Three types of integrity constraints are an inherent part of the relational data model: entity
integrity, referential integrity and domain integrity:
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
Entity integrity concerns the concept of a primary key. Entity integrity is an integrity rule which
states that every table must have a primary key and that the column or columns chosen to be
the primary key should be unique and not null.
Referential integrity concerns the concept of a foreign key. The referential integrity rule states
that any foreign-key value can only be in one of two states. The usual state of affairs is that the
foreign key value refers to a primary key value of some table in the database. Occasionally, and
this will depend on the rules of the data owner, a foreign-key value can be null. In this case we
are explicitly saying that either there is no relationship between the objects represented in the
database or that this relationship is unknown.
Domain integrity specifies that all columns in relational database must be declared upon a
defined domain. The primary unit of data in the relational data model is the data item. Such
data items are said to be non-decomposable or atomic. A domain is a set of values of the same
type. Domains are therefore pools of values from which actual values appearing in the columns
of a table are drawn.
If a database supports these features it is the responsibility of the database to insure data integrity
as well as the consistency model for the data storage and retrieval. If a database does not support
these features it is the responsibility of the applications to insure data integrity while the database
supports the consistency model for the data storage and retrieval.
Having a single, well-controlled, and well-defined data-integrity system increases
stability (one centralized system performs all data integrity operations)
performance (all data integrity operations are performed in the same tier as the consistency
model)
re-usability (all applications benefit from a single centralized data integrity system)
maintainability (one centralized system for all data integrity administration).
As of 2012, since all modern databases support these features (see Comparison of relational
database management systems), it has become the de-facto responsibility of the database to
ensure data integrity. Out-dated and legacy systems that use file systems (text, spreadsheets,
ISAM, flat files, etc.) for their consistency model lack any
[citation needed]
kind of data-integrity
model. This requires organizations to invest a large amount of time, money, and personnel in
building data-integrity systems on a per-application basis that effectively just duplicate the
existing data integrity systems found in modern databases. Many companies, and indeed many
database systems themselves, offer products and services to migrate out-dated and legacy
systems to modern databases to provide these data-integrity features. This offers organizations
substantial savings in time, money, and resources because they do not have to develop per-
application data-integrity systems that must be re-factored each time business requirements
change.
Trigger
A database trigger is procedural code that is automatically executed in response to certain
events on a particular table or view in a database. The trigger is mostly used for maintaining the
integrity of the information on the database. For example, when a new record (representing a
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
new worker) is added to the employees table, new records should also be created in the tables of
the taxes, vacations and salaries.
Triggers in Microsoft SQL Server
Microsoft SQL Server supports triggers either after or instead of an insert, update or delete
operation. They can be set on tables and views with the constraint that a view can be referenced
only by an INSTEAD OF trigger.
Microsoft SQL Server 2005 introduced support for Data Definition Language (DDL) triggers,
which can fire in reaction to a very wide range of events, including:
Drop table
Create table
Alter table
Login events
A full list is available on MSDN.
Performing conditional actions in triggers (or testing data following modification) is done
through accessing the temporary Inserted and Deleted tables.
Security
SQL Server 2012
By default, both DML and DDL triggers execute under the context of the user that calls the
trigger. The caller of a trigger is the user that executes the statement that causes the trigger to
run. For example, if user Mary executes a DELETE statement that causes DML trigger
DML_trigMary to run, the code inside DML_trigMary executes in the context of the user
privileges for Mary. This default behavior can be exploited by users who want to introduce
malicious code in the database or server instance. For example, the following DDL trigger is
created by user JohnDoe:
CREATE TRIGGER DDL_trigJohnDoe
ON DATABASE
FOR ALTER_TABLE
AS
GRANT CONTROL SERVER TO JohnDoe ;
GO
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
What this trigger means is that as soon as a user that has permission to execute a GRANT
CONTROL SERVER statement, such as a member of the sysadmin fixed server role, executes
an ALTER TABLE statement, JohnDoe is granted CONTROL SERVER permission. In other
words, although JohnDoe cannot grant CONTROL SERVER permission to himself, he enabled
the trigger code that grants him this permission to execute under escalated privileges. Both DML
and DDL triggers are open to this kind of security threat.
Advanced SQL features
Simple Features (officially Simple Feature Access) is both an OpenGIS and ISO standard (ISO
19125) that specifies a common storage model of mostly two-dimensional geographical data
(point, line, polygon, multi-point, multi-line, etc.)
The ISO 19125 standard comes in two parts. Part one, ISO 19125-1 (SFA-CA for "common
architecture"), defines a model for two-dimensional simple features, with linear interpolation
between vertices. The data model defined in SFA-CA is a hierarchy of classes. This part also
defines representation using Well-Known Text (and Binary). Part 2 of the standard, ISO 19125-2
(SFA-SQL), defines an implementation using SQL.
[1]
The OpenGIS standard(s) cover
implementations in CORBA and OLE/COM as well, although these have lagged behind the SQL
one and are not standardized by ISO.
The ISO/IEC 13249-3 SQL/MM Spatial extends the Simple Features data model mainly with
circular interpolations (e.g. circular arcs) and adds other features like coordinate transformations
and methods for validating geometries as well as Geography Markup Language support.
Embedded SQL
Embedded SQL is a method of combining the computing power of a programming language
and the database manipulation capabilities of SQL. Embedded SQL statements are SQL
statements written inline with the program source code of the host language. The embedded SQL
statements are parsed by an embedded SQL preprocessor and replaced by host-language calls to
a code library. The output from the preprocessor is then compiled by the host compiler. This
allows programmers to embed SQL statements in programs written in any number of languages
such as: C/C++, COBOL and Fortran.
The ANKITA SQL standards committee defined the embedded SQL standard in two steps: a
formalism called Module Language was defined, then the embedded SQL standard was derived
from Module Language.
[1]
The SQL standard defines embedding of SQL as embedded SQL and
the language in which SQL queries are embedded is referred to as the host language. A popular
host language is C. The mixed C and embedded SQL is called Pro*C in Oracle and Sybase
database management systems. In the PostgreSQL database management system this
precompiler is called ECPG. Other embedded SQL precompilers are Pro*Ada, Pro*COBOL,
Pro*FORTRAN, Pro*Pascal, and Pro*PL/I.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
PL/SQL supports variables, conditions, loops and exceptions. Arrays are also supported, though
in a somewhat unusual way, involving the use of PL/SQL collections. PL/SQL collections is a
slightly advanced topic.
Implementations from version 8 of Oracle Database onwards have included features associated
with object-orientation.
Dynamic SQL
Once the program units have been stored into the database, they become available for execution
at a later time.
While programmers can readily embed Data Manipulation Language (DML) statements directly
into their PL/SQL code using straightforward SQL statements, Data Definition Language (DDL)
requires more complex "Dynamic SQL" statements to be written in the PL/SQL code. However,
DML statements underpin the majority of PL/SQL code in typical software applications.
In the case of PL/SQL dynamic SQL, early versions of the Oracle Database required the use of a
complicated Oracle DBMS_SQL package library. More recent versions have however
introduced a simpler "Native Dynamic SQL", along with an associated EXECUTE
IMMEDIATE syntax.
Oracle Corporation customarily extends package functionality with each successive release of
the Oracle Database.
Introduction to distributed databases and client/server databases.
A distributed database is a database in which storage devices are not all attached to a common
processing unit such as the CPU. It may be stored in multiple computers located in the same
physical location, or may be dispersed over a network of interconnected computers. Unlike
parallel systems, in which the processors are tightly coupled and constitute a single database
system, a distributed database system consists of loosely coupled sites that share no physical
components.
Collections of data (e.g. in a database) can be distributed across multiple physical locations. A
distributed database can reside on network servers on the Internet, on corporate intranets or
extranets, or on other company networks. The replication and distribution of databases improves
database performance at end-user worksites.
[1][clarification needed]

www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
To ensure that the distributive databases are up to date and current, there are two processes:
replication and duplication. Replication involves using specialized software that looks for
changes in the distributive database. Once the changes have been identified, the replication
process makes all the databases look the same. The replication process can be very complex and
time consuming depending on the size and number of the distributive databases. This process can
also require a lot of time and computer resources. Duplication on the other hand is not as
complicated. It basically identifies one database as a master and then duplicates that database.
The duplication process is normally done at a set time after hours. This is to ensure that each
distributed location has the same data. In the duplication process, changes to the master database
only are allowed. This is to ensure that local data will not be overwritten. Both of the processes
can keep the data current in all distributive locations.
[2]

Besides distributed database replication and fragmentation, there are many other distributed
database design technologies. For example, local autonomy, synchronous and asynchronous
distributed database technologies. These technologies' implementation can and does depend on
the needs of the business and the sensitivity/confidentiality of the data to be stored in the
database, and hence the price the business is willing to spend on ensuring data security,
consistency and integrity.

UNIT III DATABASE DESIGN

Functional dependencies
In relational database theory, a functional dependency is a constraint between two sets of
attributes in a relation from a database.
Given a relation R, a set of attributes X in R is said to functionally determine another set of
attributes Y, also in R, (written X Y) if, and only if, each X value is associated with precisely
one Y value; R is then said to satisfy the functional dependency X Y. Equivalently, the
projection is a function, i.e. Y is a function of X.
[1][2]
In simple words, if the values for
the X attributes are known (say they are x), then the values for the Y attributes corresponding to x
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
can be determined by looking them up in any tuple of R containing x. Customarily X is called the
determinant set and Y the dependent set. A functional dependency FD: X Y is called trivial if
Y is a subset of X.
The determination of functional dependencies is an important part of designing databases in the
relational model, and in database normalization and denormalization. A simple application of
functional dependencies is Heaths theorem; it says that a relation R over an attribute set U and
satisfying a functional dependency X Y can be safely split in two relations having the lossless-
join decomposition property, namely into where Z = U XY are
the rest of the attributes. (Unions of attribute sets are customarily denoted by mere juxtapositions
in database theory.) An important notion in this context is a candidate key, defined as a minimal
set of attributes that functionally determine all of the attributes in a relation. The functional
dependencies, along with the attribute domains, are selected so as to generate constraints that
would exclude as much data inappropriate to the user domain from the system as possible.
A notion of logical implication is defined for functional dependencies in the following way: a set
of functional dependencies logically implies another set of dependencies , if any relation R
satisfying all dependencies from also satisfies all dependencies from ; this is usually written
. The notion of logical implication for functional dependencies admits a sound and
complete finite axiomatization, known as Armstrong's axioms.
Non-loss decomposition
Functional dependency
In relational database theory, a functional dependency is a constraint between two sets of
attributes in a relation from a database.
Given a relation R, a set of attributes X in R is said to functionally determine another set of
attributes Y, also in R, (written X Y) if, and only if, each X value is associated with precisely
one Y value; R is then said to satisfy the functional dependency X Y. Equivalently, the
projection is a function, i.e. Y is a function of X.
[1][2]
In simple words, if the values for
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
the X attributes are known (say they are x), then the values for the Y attributes corresponding to x
can be determined by looking them up in any tuple of R containing x. Customarily X is called the
determinant set and Y the dependent set. A functional dependency FD: X Y is called trivial if
Y is a subset of X.
The determination of functional dependencies is an important part of designing databases in the
relational model, and in database normalization and denormalization. A simple application of
functional dependencies is Heaths theorem; it says that a relation R over an attribute set U and
satisfying a functional dependency X Y can be safely split in two relations having the lossless-
join decomposition property, namely into where Z = U XY are
the rest of the attributes. (Unions of attribute sets are customarily denoted by mere juxtapositions
in database theory.) An important notion in this context is a candidate key, defined as a minimal
set of attributes that functionally determine all of the attributes in a relation. The functional
dependencies, along with the attribute domains, are selected so as to generate constraints that
would exclude as much data inappropriate to the user domain from the system as possible.
A notion of logical implication is defined for functional dependencies in the following way: a set
of functional dependencies logically implies another set of dependencies , if any relation R
satisfying all dependencies from also satisfies all dependencies from ; this is usually written
. The notion of logical implication for functional dependencies admits a sound and
complete finite axiomatization, known as Armstrong's axioms.
In the process of efficiently storing data, and eliminating redundancy, tables in a database are
designed and created to be in one of five possible normal forms. Each normal form contains and
enforces the rules of the previous form, and, in turn, applies some stricter rules on the design of
tables.

A set of tables in a database are initially said to be in 0 normal form.


First Normal Form:
Tables are said to be in first normal form when:
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
- The table has a primary key.
- No single attribute (column) has multiple values.
- The non-key attributes (columns) depend on the primary key.

Some examples of placing a table in first normal form are:

author_id: stories:
000024 novelist, playwright // multiple values
000034 magazine columnist
002345 novella, newpaper columnist // multiple values

In first normal form the table would look like:

author_id: stories:
000024 novelist
000024 playwright
000034 magazine columnist
002345 novella
002345 newpaper columnist


Second Normal Form:
=================
Tables are said to be in second normal form when:
- The tables meet the criteria for first normal form.
- If the primary key is a composite of attributes (contains multiple
columns), the non key attributes (columns) must depend on the whole
key.

Note: Any table with a primay key that is composed of a single
attribute (column) is automatically in second normal form.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m

Third Normal Form:
================
Tables are said to be in third normal form when:
- The tables meet the criteria for second normal form.
- Each non-key attribute in a row does not depend on the entry in
another key column.
Dependency preservation
Databases can be described as all of the following:
Information sequence of symbols that can be interpreted as a message. Information can
recorded as signs, or transmitted as signals.
Data values of qualitative or quantitative variables, belonging to a set of items. Data
icomputing (or data processing) are often represented by a combination of items organized in
rows and multiple variables organized in columns. Data are typically the results of measurements
and can be visualised using graphs or images.
Computer data information in a form suitable for use with a computer. Data is often
distinguished from programs. A program is a sequence of instructions that detail a task for the
computer to perform. In this sense, data is everything in software that is not program code.

Boyce/codd normal form
BoyceCodd normal form (or BCNF or 3.5NF) is a normal form used in database
normalization. It is a slightly stronger version of the third normal form (3NF). BCNF was
developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomaly
not dealt with by 3NF as originally defined.
[1]

Chris Date has pointed out that a definition of what we now know as BCNF appeared in a paper
by Ian Heath in 1971.
[2]
Date writes:
"Since that definition predated Boyce and Codd's own definition by some three years, it seems to
me that BCNF ought by rights to be called Heath normal form. But it isn't."
[3]

www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
If a relational scheme is in BCNF then all redundancy based on functional dependency has been
removed, although other types of redundancy may still exist. A relational schema R is in Boyce
Codd normal form if and only if for every one of its dependencies X Y, at least one of the
following conditions hold:
[4]

X Y is a trivial functional dependency (Y X)
X is a superkey for schema R
Multivalued dependency
In database theory, multivalued dependency is a full constraint between two sets of attributes in
a relation.
In contrast to the functional dependency, the multivalued dependency requires that certain
tuples be present in a relation. Therefore, a multivalued dependency is a special case of tuple-
generating dependency. The multivalued dependency plays a role in the 4NF database
normalization.
A multivalued dependency is a special case of a join dependency, with only two sets of values
involved, i.e. it is a 2-ary join dependency.
Formal definition
The formal definition is given as follows.
[1]

Let be a relation schema and let and (subsets). The multivalued dependency

(which can be read as multidetermines ) holds on if, in any legal relation , for all
pairs of tuples and in such that , there exist tuples and in such that





In more simple words the above condition can be expressed as follows: if we denote by
the tuple having values for collectively equal to
correspondingly, then whenever the tuples and exist in , the tuples
and should also exist in .
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
Fourth normal form
Fourth normal form (4NF) is a normal form used in database normalization. Introduced by
Ronald Fagin in 1977, 4NF is the next level of normalization after BoyceCodd normal form
(BCNF). Whereas the second, third, and BoyceCodd normal forms are concerned with
functional dependencies, 4NF is concerned with a more general type of dependency known as a
multivalued dependency. A Table is in 4NF if and only if, for every one of its non-trivial
multivalued dependencies X Y, X is a superkeythat is, X is either a candidate key or a
superset thereof.
[1]

Join dependencies and fifth normal form.
A join dependency is a constraint on the set of legal relations over a database scheme. A table T
is subject to a join dependency if T can always be recreated by joining multiple tables each
having a subset of the attributes of T. If one of the tables in the join has all the attributes of the
table T, the join dependency is called trivial.
The join dependency plays an important role in the Fifth normal form, also known as project-join
normal form, because it can be proven that if you decompose a scheme in tables to ,
the decomposition will be a lossless-join decomposition if you restrict the legal relations on to
a join dependency on called .
Another way to describe a join dependency is to say that the set of relationships in the join
dependency is independent of each other.
Unlike in the case of functional dependencies, there is no sound and complete axiomatization for
join dependencies.
Fifth normal form
Fifth normal form (5NF), also known as Project-join normal form (PJ/NF) is a level of
database normalization designed to reduce redundancy in relational databases recording multi-
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
valued facts by isolating semantically related multiple relationships. A table is said to be in the
5NF if and only if every join dependency in it is implied by the candidate keys.
A join dependency *{A, B, Z} on R is implied by the candidate key(s) of R if and only if each
of A, B, , Z is a superkey for R.
[1]

Only in rare situations does a 4NF table not conform to 5NF. These are situations in which a
complex real-world constraint governing the valid combinations of attribute values in the 4NF
table is not implicit in the structure of that table. If such a table is not normalized to 5NF, the
burden of maintaining the logical consistency of the data within the table must be carried partly
by the application responsible for insertions, deletions, and updates to it; and there is a
heightened risk that the data within the table will become inconsistent. In contrast, the 5NF
design excludes the possibility of such inconsistencies.
UNIT IV TRANSACTIONS
Transaction concepts
A transaction comprises a unit of work performed within a database management system (or
similar system) against a database, and treated in a coherent and reliable way independent of
other transactions. Transactions in a database environment have two main purposes:
1. To provide reliable units of work that allow correct recovery from failures and keep a
database consistent even in cases of system failure, when execution stops (completely or
partially) and many operations upon a database remain uncompleted, with unclear status.
2. To provide isolation between programs accessing a database concurrently. If this
isolation is not provided the program's outcome are possibly erroneous.
A database transaction, by definition, must be atomic, consistent, isolated and durable.
[1]

Database practitioners often refer to these properties of database transactions using the acronym
ACID.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
Transactions provide an "all-or-nothing" proposition, stating that each work-unit performed in a
database must either complete in its entirety or have no effect whatsoever. Further, the system
must isolate each transaction from other transactions, results must conform to existing
constraints in the database, and transactions that complete successfully must get written to
durable storage.
Transaction recovery
A transactional database is a DBMS where write transactions on the database are able to be
rolled back if they are not completed properly (e.g. due to power or connectivity loss).
Most modern relational database management systems fall into the category of databases that
support transactions.
In a database system a transaction might consist of one or more data-manipulation statements
and queries, each reading and/or writing information in the database. Users of database systems
consider consistency and integrity of data as highly important. A simple transaction is usually
issued to the database system in a language like SQL wrapped in a transaction, using a pattern
similar to the following:
1. Begin the transaction
2. Execute a set of data manipulations and/or queries
3. If no errors occur then commit the transaction and end it
4. If errors occur then rollback the transaction and end it
If no errors occurred during the execution of the transaction then the system commits the
transaction. A transaction commit operation applies all data manipulations within the scope of
the transaction and persists the results to the database. If an error occurs during the transaction,
or if the user specifies a rollback operation, the data manipulations within the transaction are not
persisted to the database. In no case can a partial transaction be committed to the database since
that would leave the database in an inconsistent state(Article by Vishak P Nair).
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
Internally, multi-user databases store and process transactions, often by using a transaction ID or
XID.
There are multiple varying ways for transactions to be implemented other than the simple way
documented above. Nested transactions, for example, are transactions which contain statements
within them that start new transactions (i.e. sub-transactions). Multi-level transactions are similar
but have a few extra properties
[citation needed]
. Another type of transaction is the compensating
transaction.
ACID properties
n computer science, ACID (atomicity, consistency, isolation, durability) is a set of properties
that guarantee that database transactions are processed reliably. In the context of databases, a
single logical operation on the data is called a transaction. For example, a transfer of funds from
one bank account to another, even involving multiple changes such as debiting one account and
crediting another, is a single transaction.
Jim Gray defined these properties of a reliable transaction system in the late 1970s and
developed technologies to achieve them automatically.
[1]
In 1983, Andreas Reuter and Theo
Hrder coined the acronym ACID to describe them
Atomicity
Atomicity requires that each transaction is "all or nothing": if one part of the transaction fails, the
entire transaction fails, and the database state is left unchanged. An atomic system must
guarantee atomicity in each and every situation, including power failures, errors, and crashes.
Consistency
The consistency property ensures that any transaction will bring the database from one valid state
to another. Any data written to the database must be valid according to all defined rules,
including but not limited to constraints, cascades, triggers, and any combination thereof.
Isolation
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m

The isolation property ensures that the concurrent execution of transactions results in a system
state that could have been obtained if transactions are executed serially, i.e. one after the other.
Each transaction has to execute in total isolation i.e. if T1 and T2 are being executed
concurrently then both of them should remain unaware of each other's presence
[citation needed]

Durability
Durability means that once a transaction has been committed, it will remain so, even in the event
of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL
statements execute, the results need to be stored permanently (even if the database crashes
immediately thereafter).
System recovery

UNIT V IMPLEMENTATION TECHNIQUES
Overview of Physical Storage Media
Several types of data storage exist in most computer systems. They vary in speed of access, cost
per unit of data, and reliability.
Cache: most costly and fastest form of storage. Usually very small, and managed by the
operating system.
Main Memory (MM): the storage area for data available to be operated on.
General-purpose machine instructions operate on main memory.
Contents of main memory are usually lost in a power failure or ``crash''.
Usually too small (even with megabytes) and too expensive to store the entire database.
Flash memory: EEPROM (electrically erasable programmable read-only memory).
Data in flash memory survive from power failure.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
Reading data from flash memory takes about 10 nano-secs (roughly as fast as from main
memory), and writing data into flash memory is more complicated: write-once takes about 4-10
microsecs.
To overwrite what has been written, one has to first erase the entire bank of the memory. It may
support only a limited number of erase cycles ( to ).
It has found its popularity as a replacement for disks for storing small volumes of data (5-10
megabytes).
Magnetic-disk storage: primary medium for long-term storage.
Typically the entire database is stored on disk.
Data must be moved from disk to main memory in order for the data to be operated on.
After operations are performed, data must be copied back to disk if any changes were made.
Disk storage is called direct access storage as it is possible to read data on the disk in any order
(unlike sequential access).
Disk storage usually survives power failures and system crashes.
Optical storage: CD-ROM (compact-disk read-only memory), WORM (write-once read-many)
disk (for archival storage of data), and Juke box (containing a few drives and numerous disks
loaded on demand).
Tape Storage: used primarily for backup and archival data.
Cheaper, but much slower access, since tape must be read sequentially from the beginning.
Used as protection from disk failures!
The storage device hierarchy is presented in Figure 10.1, where the higher levels are expensive
(cost per bit), fast (access time), but the capacity is smaller.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m


Figure Storage-device hierarchy
Another classification: Primary, secondary, and tertiary storage.
Primary storage: the fastest storage media, such as cash and main memory.
Secondary (or on-line) storage: the next level of the hierarchy, e.g., magnetic disks.
Tertiary (or off-line) storage: magnetic tapes and optical disk juke boxes.
Volatility of storage. Volatile storage loses its contents when the power is removed. Without
power backup, data in the volatile storage (the part of the hierarchy from main memory up) must
be written to nonvolatile storage for safekeeping.
Magnetic Disks

Magnetic storage and magnetic recording are terms from engineering referring to the storage
of data on a magnetized medium. Magnetic storage uses different patterns of magnetization in a
magnetizable material to store data and is a form of non-volatile memory. The information is
accessed using one or more read/write heads. As of 2011, magnetic storage media, primarily
hard disks, are widely used to store computer data as well as audio and video signals. In the field
of computing, the term magnetic storage is preferred and in the field of audio and video
production, the term magnetic recording is more commonly used. The distinction is less
technical and more a matter of preference. Other examples of magnetic storage media include
floppy disks, magnetic recording tape, and magnetic stripes on credit cards.
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
Information is written to and read from the storage medium as it moves past devices called read-
and-write heads that operate very close (often tens of nanometers) over the magnetic surface.
The read-and-write head is used to detect and modify the magnetization of the material
immediately under it.
The magnetic surface is conceptually divided into many small sub-micrometer-sized magnetic
regions, referred to as magnetic domains, (although these are not magnetic domains in a rigorous
physical sense), each of which has a mostly uniform magnetization. Due to the polycrystalline
nature of the magnetic material each of these magnetic regions is composed of a few hundred
magnetic grains. Magnetic grains are typically 10 nm in size and each form a single true
magnetic domain. Each magnetic region in total forms a magnetic dipole which generates a
magnetic field. In older hard disk drive (HDD) designs the regions were oriented horizontally
and parallel to the disk surface, but beginning about 2005, the orientation was changed to
perpendicular to allow for closer magnetic domain spacing.
For reliable storage of data, the recording material needs to resist self-demagnetization, which
occurs when the magnetic domains repel each other. Magnetic domains written too densely
together to a weakly magnetizable material will degrade over time due to rotation of the
magnetic moment one or more domains to cancel out these forces. The domains rotate sideways
to a halfway position that weakens the readability of the domain and relieves the magnetic
stresses. Older hard disk drives used iron(III) oxide as the magnetic material, but current disks
use a cobalt-based alloy.
[1]

A write head magnetizes a region by generating a strong local magnetic field, and a read head
detects the magnetization of the regions. Early HDDs used an electromagnet both to magnetize
the region and to then read its magnetic field by using electromagnetic induction. Later versions
of inductive heads included metal in Gap (MIG) heads and thin film heads. As data density
increased, read heads using magnetoresistance (MR) came into use; the electrical resistance of
the head changed according to the strength of the magnetism from the platter. Later development
made use of spintronics; in read heads, the magnetoresistive effect was much greater than in
earlier types, and was dubbed "giant" magnetoresistance (GMR). In today's heads, the read and
write elements are separate, but in close proximity, on the head portion of an actuator arm. The
read element is typically magneto-resistive while the write element is typically thin-film
inductive.
[2]

The heads are kept from contacting the platter surface by the air that is extremely close to the
platter; that air moves at or near the platter speed. The record and playback head are mounted on
a block called a slider, and the surface next to the platter is shaped to keep it just barely out of
contact. This forms a type of air bearing.
RAID
RAID (redundant array of independent disks, originally redundant array of inexpensive
disks
[1][2]
) is a storage technology that combines multiple disk drive components into a logical
unit. Data is distributed across the drives in one of several ways called "RAID levels", depending
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
on what level of redundancy and performance (via parallel communication) is required. In
October 1986, the IBM S/38 announced "checksum". Checksum was an implementation of
RAID- The implementation was in the operating system and was software only and had a
minimum of 10% overhead. The S/38 "scatter loaded" all data for performance. The downside
was the loss of any single disk required a total system restore for all disks. Under checksum,
when a disk failed, the system halted and was then shutdown. Under maintenance, the bad disk
was replaced and then a parity-bit disk recovery was run. The system was restarted using a
recovery procedure similar to the one run after a power failure. While difficult, the recovery
from a drive failure was much shorter and easier than without checksum.
RAID is an example of storage virtualization and was first defined by David Patterson, Garth A.
Gibson, and Randy Katz at the University of California, Berkeley in 1987.
[3]
Marketers
representing industry RAID manufacturers later attempted to reinvent the term to describe a
redundant array of independent disks as a means of disassociating a low-cost expectation from
RAID technology.
[4]

RAID is now used as an umbrella term for computer data storage schemes that can divide and
replicate data among multiple physical drives. The physical drives are said to be "in a RAID",
however the more common, incorrect parlance is to say that they are "in a RAID array".
[5]
The
array can then be accessed by the operating system as one single drive. The different schemes or
architectures are named by the word RAID followed by a number (e.g., RAID 0, RAID 1). Each
scheme provides a different balance between three key goals: resiliency, performance, and
capacity
Tertiary Storage
Tertiary storage or tertiary memory,
[4]
provides a third level of storage. Typically it involves a
robotic mechanism which will mount (insert) and dismount removable mass storage media into a
storage device according to the system's demands; these data are often copied to secondary
storage before use. It is primarily used for archiving rarely accessed information since it is much
slower than secondary storage (e.g. 560 seconds vs. 110 milliseconds). This is primarily useful
for extraordinarily large data stores, accessed without human operators. Typical examples
include tape libraries and optical jukeboxes.
When a computer needs to read information from the tertiary storage, it will first consult a
catalog database to determine which tape or disc contains the information. Next, the computer
will instruct a robotic arm to fetch the medium and place it in a drive. When the computer has
finished reading the information, the robotic arm will return the medium to its place in the
library.
File Organization
The arrangement of records in a file is known as file organization. It shows the arrangement of
data in a file, deals with the arrangement of data items in secondary storage device. The main
objectives of file organization are as follows:
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
1. To provide an efficient method to locate records neede for processing.
2. To facilitate file creation and its updation in future.
For organizing records efficiently in the form of a computer file, following things are important:
1. A logical method should be observed to organize records in a file.
2. File structure should be so designed that it would allow quick access to needed data
items.
3. Means of adding or deleting data items or records from files must be present.
Organization of Records in Files
A file may be organized as:
1. Serial file
2. Sequential file
3. Direct file
4. Indexed-sequential file

Serial file
In a serial file the records are placed one after another, however there is no specific order in the
arrangement of these records. On a magnetic tape storage device, the records are stored in a
serial manner.

Sequential file
A sequential file is a file in which the records are stored in some order i.e. either in ascending or
descending.

Direct access
A sequential file is not suitable for on-line enquiry, because in sequential access we have to
traverse the records from the beginning to the end. Random access file organization is best suited
for on-line processing systems where current information is the one that is always required.

Index-sequential file
www.5starnotes.com
www.5starnotes.com
w
w
w
.
5
s
t
a
r
n
o
t
e
s
.
c
o
m
Index- sequential files are also known as indexed sequential access method (ISAM) files. When
files may require supporting both batch and on-line processing at that time we have to us the
ISAM files. These files are basically a sequential file organized serially on key fields. In
addition, an index is maintained which speeds up the access of isolated records. The file is
divided into a number of blocks and the highest key in each block is indexed. Within each block
the record is searched sequentially. This method is much faster than searching the entire file
sequentially.

www.5starnotes.com
www.5starnotes.com

You might also like