Data Storage, Retrieval and DBMS

DATA STORAGE, RETRIEVAL AND DATA BASE MANAGEMENT SYSTEMS
Data
Data are raw facts or observations or assumptions or occurrence about physical

phenomenon or business transaction.
They are objective measurement of attributes of entities like people, place, things and
events.
Data is a collection of facts, which is unorganized but can be organized into useful
information.
Data should be accurate but need not be relevant, timely or concise.
It can exist in different forms e.g. picture, text, sound or all of these together.
CONCEPTS RELATED TO DATA
Double Precision: Real data values are commonly called single precision data because each
real constant is stored in a single memory location. This usually gives seven significant digits
for each real value. In many calculations, particularly those involving iteration or long
sequences of calculations, single precision is not adequate to express the precision required.
To overcome this limitation, many programming languages provide the double precision data
type. Each double precision is stored in two memory locations, thus providing twice as many
significant digits.
Logical Data Type: Use the Logical data type when you want an efficient way to store data that
has only two values. Logical data is stored as true (.T.) or false (.F.)
Characters: Choose the Character data type when you want to include letters, numbers,
spaces, symbols, and punctuation. Character fields or variables store text information such as
names, addresses, and numbers that are not used in mathematical calculations. For example,
phone numbers or zip codes, though they include mostly numbers, are actually best used as
Character values.
Strings: A data type consisting of a sequence of contiguous characters that represent the
characters themselves rather than their numeric values. A String can include letters, numbers,
spaces, and punctuation. The String data type can store fixed-length strings ranging in length
from 0 to approximately 63K characters and dynamic strings ranging in length from 0 to
approximately 2 billion characters. The dollar sign ($) type-declaration character represents a
String.
Variable is something that may change in value. E.g. - No. Of words in different pages of a
book.
33
SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)
KEY: relational means of specifying uniqueness. A database key is an attribute utilized to sort
and/or identify data in some manner. Each table has a primary key which uniquely identifies
records. Foreign keys are utilized to cross-reference data between relational tables.
The primary key of a relational table uniquely identifies each record in the table. It can either
be a normal attribute that is guaranteed to be unique (such as Social Security Number in a
table with no more than one record per person) or it can be generated by the DBMS (such as a
globally unique identifier, or GUID, in Microsoft SQL Server). Primary keys may consist of a
single attribute or multiple attributes in combination.
Examples:
Imagine we have a STUDENTS table that contains a record for each student at a university. The
student's unique student ID number would be a good choice for a primary key in the STUDENTS
table. The student's first and last name would not be a good choice, as there is always the chance
that more than one student might have the same name.
A candidate key is a combination of attributes that can be uniquely used to identify a

database record without any extraneous data. Each table may have one or more candidate
keys. One of these candidate keys is selected as the table primary key.
Referential integrity: A feature provided by relational database management systems

(RDBMS's) that prevents users or applications from entering inconsistent data. Most RDBMS's
have various referential integrity rules that you can apply when you create a relationship
between two tables.
For example, suppose Table B has a foreign key that points to a field in Table A. Referential
integrity would prevent you from adding a record to Table B that cannot be linked to Table A. In
addition, the referential integrity rules might also specify that whenever you delete a record from
Table A, any records in Table B that are linked to the deleted record will also be deleted. This is
called cascading delete. Finally, the referential integrity rules could specify that whenever you
modify the value of a linked field in Table A, all records in Table B that are linked to it will also be
modified accordingly. This is called cascading update.
Consider the situation where we have two tables: Employees and Managers. The Employees table
has a foreign key attribute entitled Managed By which points to the record for that employees
manager in the Managers table. Referential integrity enforces the following three rules:
1.
We may not add a record to the Employees table unless the Managed By attributes points
to a valid record in the Managers table.
2.
If the primary key for a record in the Managers table changes, all corresponding records in
the Employees table must be modified using a cascading update.
3.
If a record in the Managers table is deleted, all corresponding records in the Employees
table must be deleted using a cascading delete.
Alternate Key: The alternate keys of any table are simply those candidate keys which are not
currently selected as the primary key. An alternate key is a function of all candidate keys
minus the primary key.
34
Secondary Key: Secondary keys can be defined for each table to optimize the data access.
They can refer to any column combination and they help to prevent sequential scans over the
table. Like the primary key, the secondary key can consist of multiple columns. A candidate
key which is not selected as a primary key is known as Secondary Key.
Index Fields: are used to store relevant information along with a document.
Currency FieldsThe currency field accepts data in dollar form by default.
Date Fields The date field accepts data entered in date format.
Integer Fields The integer field accepts data as a whole number.
Text Fields The text field accepts data as an alpha-numeric text string.
Information
It is the data that has been converted into a meaningful and useful context for specific end
users.
To obtain information data form is aggregated, manipulated and organized, its content
analysed and evaluated and placed in proper context for human use.
Information exists as reports, in a systematic textual format or as graphics in an organized
manner.
Information must be relevant, timely, accurate, concise and complete and should apply to
the current situation.
It should be condensed into useable length.
Data storage hierarchy

Character: It is the basic building block of data which consists of letters,
numeric digits or special characters. These are put together in a FIELD.
Field: It is a meaningful collection of related characters. It is the smallest logical
data entity that is treated as a single unit in data processing. For example, If we
are processing employees data of a company, we may have
1. Employee code field
2. Employee name field
3. An hours worked field
4. Hourly pay rate field
5. Tax rate deduction field.
35
Record: Fields are grouped together to form a record. An employee

record would be a collection of fields of one employee.
Record can be divided into Physical and Logical Record
Basis
Physical Record
Logical Record
Meaning
A physical record refers to the A Logical record refers to the way a

actual portion of a medium on user views a record. It contains all
which data is stored.
the data related to a single item.
Independence
Portion of same logical record A logical record is independent of

may be located in different its physical environment
physical records or part of logical
records may be located in one
physical record.
Example
A group of pulses recorded on a It can be a payroll record for an

magnetic tape or disk, series of employee, or a record of all the
holes pushed into paper tape.
changes made by a customer in a
departmental store.
File: A file is a number of related records that are treated as a unit. For
example, a collection of employee records for one company would be
an employee file.
FILE
Employee 2
Employee 1
Employee No
XXX
Field
Salary
XXX
Character
36
Transaction File and Master File

Basis
Master File
Transaction File
Data Life
Master
file
contains
relatively These files contain temporary
permanent records for identification data which is to be processed
and
summarizing
statistical in combination with master file
information.
Content
It contains current or nearly current These files generally contain

data, which is updated regularly.
information used for updating
master files.
Data Size
It rarely contains detailed transaction It contains detailed data.

details.
Examples
The product files, customer files, Purchase orders, job cards,

employee files etc.
invoices, etc.
Access
method
These are usually maintained on direct These are usually maintained

access storage devices
on sequentially as well as direct
access storage devices.
Redundancy
It can never be redundant as it has to Once the transaction files are

be updated regularly.
used to update the master file,
it is no longer required and will
be considered redundant.
File Organization
I. Serial File Organization
Records are arranged one after the other in no particular order- other than,
chronological order in which records are added to the file. This type of organization is
commonly found with transaction data, where records are created in a file in the order
in which transaction takes place.
II. Sequential File Organization
1. In sequential file, records are stored one after another in an ascending or
descending order determined by the key field of the records.
2. In Payroll example, the records of the employee file may be organized
sequentially by employee code sequence.
37
3. Sequentially organized files that are processed by computer systems are

normally stored on storage media such as magnetic tape, punched paper,
punched cards or magnetic disks.
4. To access these records, the computer must read the file in sequence from the
beginning. The first record is read and processed first, then the second record
in the file sequence, and so on. To locate a particular record, the computer
program must read in each record in sequence and compare its key field to the
one that is needed. The retrieval search ends only when desired key matches
with the key field of the currently read record.
Merits:
Simple to understand
Only record key required to locate record.
Efficient and Economical if the activity rate is high i.e. proportion of file
records processing.
Inexpensive I/O devices may be used.
Reconstruction of files relatively easy since a built in back up us usually
available.
Demerits:
Even in low activity rate entire fields are processed.
Transaction must be stored and placed in sequence prior to processing.
When files are accumulated in between timelines of data deteriorates.
High data redundancy since same data stored in several files sequenced
on different key.
Applications:
Payroll systems
Electricity billing or any other billing where each record need to be
accessed.
38
III. Direct File Access Organization

DIRECT ACCESS
DIRECT SEQUENTIAL
ACCESS
SELF DIRECT
ADDRESSING METHOD
(A)
INDEX SEQUENTIAL
ADDRESSING METHOD
(B)
RANDOM ACCESS
ADDRESS GENERATION
METHOD
INDEXED RANDOM
A- Self -Addressing method: A record key is used as its relative address. Therefore, we can
computer the records address directly from the record key and the physical address of the
first record in the file.
B- Indexed Sequential File Organization :
1. A computer provides a better way to store information like the card catalogue; indeed,
most public libraries today keep their card catalogues on a computer. For each book in the
library, a data record is created that contains information gathered from the various card
catalogues. For example, the title of the book, the author's name, the physical location of
the book, and any other relevant information. A record is generally composed of several
fields, with each field used to store a particular piece of information. For example, we
might store the author's last name in one field and the first name in a separate field. All
the records (one for each book) are collected and stored in a file. The file containing the
records is typically called the data file.
2. Indexes are created so that a particular record in the data file can be located quickly. For
example, we could create an author index, a title index, and a subject index. The indexes
are typically stored in a separate file called the index file.
3. An index is a collection of "keys", one key for each record in the data file. A key is a subset
of the information stored in a record. When an index is created, the key values are
extracted from one or more fields of each record. The value of each key determines its
order in the index (i.e., the keys are sorted alphabetically or numerically). Each key has an
associated pointer that indicates the location in the data file of the corresponding
complete record. To find a particular record, a matching key is quickly located in the index,
and then the associated pointer is used to locate the complete record.
4. Consider the problem of locating a particular book in a library containing thousands of
books. Public libraries long ago developed the card catalogue as a means to efficiently
locate a particular book. Usually there were at least three card catalogues, one with cards
arranged in order by the name of the author, another arranged by the title of the book,
and a third arranged by subject heading. Each card contained information about the book,
most importantly its location in the library. Therefore, by knowing the name of the author,
the title of the book, or the appropriate subject heading, you could use the card catalogues
39
5.
6.
7.
8.
to quickly determine the location of a particular book. The card catalogues can be thought
of as indexes.
Consider the author index. There is a filing cabinet containing a card for each book in the
library, filed in alphabetical order by the author's name. Each drawer in the cabinet is
labelled, perhaps "A-E", "F-J", and so on. There are two broad kinds of searches that you
might want to perform on the author index.
First, you might want to make a list containing the name of every book in the library. To do
this you would start in the first drawer with the first card, and look at each card in order
until you reached the last card in the last drawer. This is called a "sequential" search
because you look at each card in the catalogue in sequential order.
Second, you might want to know the names of the books in the library that were written
by Thomas Jefferson. Instead of examining every card in the catalogue, you are first guided
by the labels on the drawers to the second drawer, the "F-J" drawer. You are then guided
by the tabs inside the drawer to the names that start with the letter "J". This is called a
"random" search. For any particular card, you can use the labels (or indexes) to go almost
directly to the desired card.
Actually locating the Thomas Jefferson card(s) involves both a random and sequential
search. We use random access to go directly to the correct drawer and correct tab inside
the drawer. The labels (or indexes) allow us to very quickly get close to the card of interest.
After locating the "J" tab inside the "F-J" drawer, we then use sequential access to locate
the particular Thomas Jefferson card(s) of interest.
Merits:
Allows efficient and economical use of sequential processing techniques
when activity rate is high.
Permits quick access to records in relatively sufficient way. This activity
is a small fraction of the total work load.
Demerits:
Less efficient in the use of storage space than other organization
Slow access to records because of using indexes. Relatively expensive
hardware and software resources are required.
Application:
Inventory control where sequential access and also inquiry required.
Students registration system.
C- Random File Organization

Randomizing procedure is characterised by the fact that records are stored in such a
way that there is no relationship between the keys of the adjacent records. The
technique provides for converting the record key number to a physical location
represented by a disk address through a computational procedure.
40
Transactions can be processed in any order and written at any location through the
stored file. The desired records can be directly accessed using randomizing
procedure without accessing all other records in the file.
Merits:
Access to records for inquiry and updating possible immediately.
Immediate updating of several files as a result of single transaction is possible.
No need for sorting.
Demerits:
Risk to records in the on-line file line, loss of accuracy, breach of security.
Special backup and reconstruction procedures are established.
Less efficient in the use of storage space than sequentially organized file.
Relatively expensive software and hardware resources required.
Application:
Any type of inquiry such as
Railway reservation or Air reservation system.
o The Best File Organization
File management involves logical organization of data supplied to a computer in a predetermined
way. Storing data in a particular place is called a FILE. The file is created using a set of instructions
called PROGRAM. The data created in the file depends on the following factors:1. Data Dependence
2. Data Redundancy
3. Data Integrity
File Management Software
It is a software package that helps the users to organize data into files, process them and
retrieve the information.
The users can create report, formats, enter data into records, search records, sort them
and prepare reports.
They are designed for micro computers and menu- driven allowing end users to create files
by giving easy to use instructions.
Following are the criteria in choosing file organisation method:
1. File Volatility
41
(i) File Volatility is the number of additions and deletions to the file in a given period
of time. E.g. Payroll file of a company where the employee register is constantly
changing is a highly volatile file, and therefore direct access method is better.
2. File Activity
(i) File activity refers to the proportion of records accessed on a run to the no. Of
records in a file.
(ii) In case of real time files where each transaction is processed immediately only one
master record is accessed at a time, direct access method is appropriate.
(iii) In case where almost every record is accessed for processing sequentially ordered
file is appropriate.
3. File Interrogation
(i) File interrogation refers to the retrieval of information from a file.
(ii) If the retrieval of individual records must be fast to support a real time
operation such as Airline reservation then some kind of direct
organization is required.
(iii) If on the other hand, requirements for data can be delayed, then all the
individuals requests of information can be batched and run in a single
processing run with a sequential file organization.
4. File Size
(i) Large files which require many individual references to records with immediate
response must be organized under direct access method.
(ii) In case of small files, it is better to search the entire file sequentially or with a more
efficient binary search, to find an individual record than to maintain complex
indexes or complex direct addressing schemes.
Problems of the File Processing Systems:
i.
Data Redundancy: Same data is stored in different files since the data files are
independent. This result in lot of duplicated data and a separate file maintenance program
is necessities to update each file.
ii.
Data Dependence: The component of a file processing system depends on one another,
and therefore changes were made in the format and structure of data in a file. Changes
have to be made in all programs that use this file.
iii.
Data integrity: The same data is found in different forms in different files. Checking the
validity of data could not be uniformly implemented with the result that data in one file
42
may be correct and in another file wrong. Special computer programs were written to
retrieve data from such independent files which are time consuming and expensive.
iv.
Data Availability: Since data is scattered in many files, it would be necessary to look into
many files before relying on a particular data. Due to non- uniformity in the file design, the
data may have different identification numbers in different files and obtaining the
necessary data will be difficult.
v.
Management control: Uniform policies and standards cannot be set since the data is
scattered in different files. It is difficult to relate such files and difficult to implement a
decision due to non- uniform coding of the data files.
DATA BASE MANAGEMENT SYSTEMS

Database
A Database is a collection of related and ordered information organised in such a way that
information can be assessed quickly and easily. Hence, an organised logical group of related files
would constitute a database.
According to G.M.Scott, - A database is a computer file system that uses a particular file
organisation to facilitate rapid updating of individual records, simultaneous updating of related
records, easy access to all records by all application programs and rapid access to all stored data
which must be bought together for a particular routine report or inquiry or a special purpose
report or inquiry
Types of Databases:1. Operational Databases: These databases keep the information needed to support the
operation of an organization .These are mainly day to day working database e.g.
customer, employee and inventory database, etc.
2. Management Databases: These databases keep the selected information and data
extracted mainly from operational and external database.
3. Information warehouse Databases: A Data warehouse stores the data of current and
previous years. It is a central source of data that has been standardized and integrated
so that it can be used by managers and other end user professionals throughout an
organization.
4. Distributed Databases: These are the databases of local work group and department at
branch offices, manufacture plants and other work sites, regional offices, etc. Main aim
of these databases is to ensure that organization database is distributed but updated
concurrently.
43
Advantages:
Local computer on the network offers immediate response to local needs.

Systems can be expanded in modular fashion as needed.
Since many small computers are used, the system is not dependent on one
large that could shut down the network if failed.
Equipment operating and management costs are often lower.
Micro computers tends to be less complex than large systems, therefore the
system is more useful to local users.
5. End user database: These databases consist of various data files of word, Excel and
database which end user has generated.
6. External Databases: These are also known as online databases provided by various data
banks or organizations at nominal fee.
7. Test Databases: These are informative databases available normally on CD- Rom disk
for certain price.
8. Images databases: These databases contain alpha numeric information. These are
available either on Internet or in CD at certain price.
9. Object oriented databases: This is a type of database structure developed to be
suitable to changing application needs. When integrated database structures were
developed, the need for OODB was felt. Database with relational qualities that are
capable of manipulating text, data, objects, images and audio/ video clips are used by
organisations. With OODB, OOP has been developed. In OOP (object oriented
programming), every object is described as a set of attributes describing what the
object is. The behaviour of the object is also included in the program. Objects with
44
similar qualities and behaviour can be grouped together. OOP is more useful in
decision making.
10. Partitioned Database (Partial Distribution): Some databases are centrally managed and
some managed in a decentralised manner. This approach is called partitioned
database. For e.g., financial, marketing, administrative data can be maintained in
headquarters whereas production data may be maintained in decentralised locations.
Factors to be addressed in maintaining a database:
1. Installation of Database:
Correct installation of the DBMS product.
Ensuring that adequate file space is available
Allocate the disc space for database properly.
Allocation of data files in standard sizes for input out balancing.
2. Memory usage:
How are buffers being used?
How the DBMS uses main memory?
What the programs in main memory have?
3. Input/ Output ( I/O) Contention:
Achieving maximum I/O performance is one of the most important

aspects if timing. Understanding how the data are accessed by endusers is critical to I/O contention.
Clock speed of CPU requires more time management of I/O.
Simultaneous or separate use of I/O Devices.
Spooling, buffering, etc. can be used.
4. CPU usage:
45
Multi programming and multi-processing improves performance in

query processing.
Monitoring CPU load.
Mixture of online/ background processing need to be adjusted.
Mark jobs that can be processed in run off period to unload the machine
during peak working hours.
Components of a Database Environment

1. Database files: These files have data elements stored in database file organization formats.
The database is created in such a way so as to balance the data management objective to
speed, multiple access paths, minimum storage, program data independence and
preservation of data integrity.
2. A Database Management System (DBMS): DBMS is a set of system software program that
manages the database files. Request for access to files, updating of records and retrieval of
data is done by DBMS. The DBMS has the responsibility for data security, which is vital in a
database environment since database is accessed by many users.
3. The users: Users consist of both traditional users and application programmers, who are
not traditionally considered as users. Users interact with the DBMS indirectly via
application programs or directly via a simple query language.
Classification of DBMS Users:
Nave users who are not aware of the presence of the database system supporting
the usage.
Online users who may communicate with database either directly through online
terminal or indirectly through user interface or application programs. Usually they
acquire some skill and experience in communicating with the database.
Application programmers who are responsible for developing the application

programs and user interfaces.
DBA who can execute centralized control and is responsible for maintaining the
database.
The user interaction with the DBMS includes the definition of the logical relationships in
the database, input and maintenance of data, changing and deletion and manipulation of
data.
4. A host interface system: This is that part of DBMS which communicates with the
application programs. The host language interface interprets instructions in high level
language application programs, such as COBOL and BASIC programs that requests data
from files so that the data needed can be retrieved. During this period the OS interacts
with the DBMS. Application programs do not contain information about the file, thus the
program is independent of a database system.
5. The application programs: These programs perform the same functions as they do in
conventional system, but they are independent of the data files and use standard data
definitions. This independence and standardisation make rapid special purpose program
development easier and faster.
46
6. A Natural Language Interface System: The query permits online update and inquiry by
users who are relatively un -sophisticated about computer systems. This language is often
termed English- like because instructions of this language are usually in form of a simple
command in English, which are used to accomplish an enquiry task. Query language also
permits online programming of simple routines by managers who wish to interact with the
data. The natural language may also facilitate managers to generate special reports.
7. The data dictionary: Data dictionary is a centralized depository of information, in a
computerized form, about the data in database. The data dictionary also contains the
scheme of the database i.e. the name of each item in the database and a description and
definition of its attributes along with the names of the programs that use them and who is
responsible for the data authorization tables that specify users and the data and programs
authorized for their use. Their descriptions and definitions are referred to as the data
standards. Maintenance of a data dictionary is the responsibility of the DBA.
8. Online access and update terminals: These may be adjacent to computer or even
thousands of miles away. They may be dumb terminals, smart terminals or
microcomputers.
9. The output system or report generators: This provides routine job reports, documents and
special reports. It allows programmers, managers and other users to design output reports
without writing an application program in a programming language.
10. File Pointer: It is pointers that is placed in the last field of a record and contains the
address of another related record thus establish a link between records. It directs the
computer system to move to that related record.
11. Linked List: A Linked list is a group of data records arranged in an order, which is based on
embedded pointers. An embedded pointer is a special data field that links one record to
another by referring to the other record. The field is embedded in the first record, i.e. It is
a data element within the record.
Factors contributing to the Architecture of a Database:
1. External View
It is also known as user view.
As the name suggests, it includes only those application programs which are user
concerned.
It is described by users/ programmers by means of external schema.
2. Conceptual View
It is also known as global view.
It represents the entire data base and includes all data base entries
47
It is defined by conceptual schema and describes all records, relationships,

constraints and boundaries.
3. Internal view
It is also known as physical view
It describes the data structure and the access methods
It is defined by internal schema and indicates how data will be stored
Out of the above three, External view is USER DEPENDENT and the rest two are
USER INDEPENDENT.
Ext. Schema
1
Ext. Schema
2
Ext. Schema
3
Conceptual Schema
Physical Schema
Data Independence
1. In a database an ability to modify a schema definition at one level is done without
affecting a schema in the next higher level.
2. It facilitates logical data independence
3. It assures physical data independence.
Structure of Database
The logical organizational approach of the database is called the Database structure. There are
three basic structures available, viz. Hierarchical, and Relational and Network database
structure.
48
Hierarchical Database Structure
In this type of architecture records are logically arranged into a hierarchy of

relationships.
Records are logically arranged in a tree pattern. Hierarchy structure implements one to
one and one to many relationships. All records in hierarchy are called nodes.
Each node is related to other in a parent- child relationship as each parent record may
have one or more child record but no child record may have more than one parent
record.
The top parent record in the hierarchy is called the root.
Features of Hierarchy Database:i.
Hierarchically structured database are less flexible than any other database structure
because the hierarchy of records must be determined and implemented before a
search can be conducted, or in other words, the relationships between records are
relatively fixed by the structure.
ii.
Managerial use of query language to solve the problem may require multiple searches
and proof which is very time consuming. Thus, analysis and planning activities, which
frequently involve ad-hoc management queries of the database, may not be supported
as effectively by a hierarchical DBMS as they are by other database structures.
iii.
Ad-hoc queries made by managers that require different relationships other than that
are already implemented in the database may be difficult or time consuming to
accomplish.
iv.
Records are logically structured in inverted tree pattern.
v.
It implements one to one and one to many relationships.
vi.
Each record or node in hierarchy is related to other records in a parent- child

relationship.
vii.
Child to many parents type logical structure finds difficulty in processing.
viii.
Processing with group records of natural relations can be done faster.
49
Relational Database Structure

An example of such a situation may be the representation of Actors, Movies, and Theatres.
In order to know who plays what and where, we need the combination of
These three attributes. However, they each relate to each other cyclically. So to resolve
this, we would need to establish parent tables with Actor - movie, movie - Theatre, and
Theatre - Actor. These would each contain a portion of the Primary Key in the Actor,
Movie, and Theatre table.
ACTOR
MOVIE
MOVIE
THEATRE
THEATRE
ACTOR
MOVIE
THEATRE
ACTOR
ACTOR
MOVIE
THEATRE
Kamalhaasan
Manmadhan Ambu
Satyam
Dhanush
Aadukalam
PVR
Karthi
Siruthai
INOX
Trisha
Manmadhan Ambu
Satyam
Tammanna
Siruthai
PVR
50
i.
This is a model where more than one data file is compared.
ii.
More than one file is compared at a time with the help of a common key field.
iii.
Each file is converted into a table and the analysis is done on the tables with the
help of common key field.
iv.
The row of the table represents the list of records and the column represents data
field.
v.
It is not necessary to maintain the entire file in a single physical location but it can
be maintained geographically at any place.
vi.
This is more suitable for wider analysis of data from different locations.
vii.
Queries are easily possible because software interacts with different records at the
same time.
Network Database Structure
This structure is more useful when data is transmitted from one place to another
place that is one-to-one mode, many-to-many model. This type of structure is
found in organizations where online data processing is carried out.
DBMS (Language)
I. Data Definition Language:
DDL defines the conceptual schema providing a link between the logical and physical
structures of the database. The logical structure of a database is schema. A subschema
is the way a specific application views the data from the database.
Following are the functions of DDL:
i.
They define the physical characteristics of each record, field in the record,
fields type and length, fields logical name and also specify relationships among
the records.
ii.
They describe the schema and subschema.
iii.
They indicate the keys of the record
iv.
They provide means for associating related records or fields
v.
They provide for data security measures.
vi.
They provide for logical and physical data independence.
II. Data manipulation Language

DML is a Database Language used by database users to retrieve, insert, delete and
update data in a database.
51
Following are the functions of DML:
They provide the data manipulation techniques like deletion, modification,

insertion, replacement, retrieval, sorting and display of data or records.
They facilitate use of relationships between the records
They enable the user and application program to be independent of the physical
data structure and database structures maintenance by allowing to process data on
a logical and symbolic basis rather than on a physical location basis.
They provide for independence of programming languages by supporting several

high-level procedural languages like COBOL, C++, etc.
STRUCTURE OF DBMS
I. DDL Compiler
It converts data definition statements into a set of tables.
Tables contain meta data (data about the data) concerning the database.
It gives rise to a format that can be used by other components of database.

II. Data Manager
It is the central software component
It is referred to as database control system
It converts operation in users queries to physical file system.

III. File manager
It is responsible for file structure
It is responsible for managing the space
It is responsible for acting block containing required record.
It is responsible for requesting block from disk manager.
It is responsible for transmitting required record to data manager.

IV. Disk Manager
52
It is a part of the operating system
It carries out all physical input/output operations.
It transfers block/page requested by file manager.
V. Query manager
It interprets users online query
It converts to an efficient series of operations.
In a form it is capable of being sent to data manager.
It uses data dictionary to find structure of relevant portion of database.
It uses information to modify query.
It prepares an optimal plan to access database for efficient data retrieval.
VI. Data Dictionary
It maintains information pertaining to structure and usage of data and meta data.
It is consulted by the database users to learn what each piece of data and various
synonyms of data field means.
DATA BASE ADMINISTRATOR
A DBA is a person who actually creates and maintains the database and also carries out the
policies developed by the DA. Job of the DBA is a technical one. He is responsible for defining the
internal layout of the database and also for ensuring that the internal layout optimizes system
performance, especially in main business processing areas.
Main functions of a DBA are:1. Determining the physical design of a database and specify the hardware resource
requirement for the purpose. This can be done by determining the data requirement
schedule and accuracy requirements, the way and frequency of data access, search
strategies, physical storage requirements of data, level of security needed and the
response time requirement.
2. Define the contents of the database.
3. Use of data definition language (DDL) to describe formats relationships among various
data elements and their usage.
4. Maintain standard and control to the database.
5. Specify various rules, which must be adhered to while describing data for a database.
6. Allow only specified users to access the database by using access controls thus prevent
unauthorised access.
7. DBA also prepares documentation which includes recording the procedures, standard
guidelines and data descriptions necessary for the efficient and continuous use of
database environment.
53
8. DBA ensures that the operating staff perform its database processing related
responsibilities which include loading the database, following maintenance and security
procedures, taking backups, scheduling the database for use and following, restart and
recovery procedures after some hardware or software failure in a proper way.
9. DBA monitors the database environments.
10. DBA incorporates any enhancements into the database environment, which may include
new utility program or new system releases.
Structured Query Language
SQL is a query language that enables to create relational database which are sets of related
information stored in tables.
It is a set of commands for creating, updating and accessing data from database.
It allows programmers, managers and other users to ask ad-hoc queries of the database
interactively without the aid of programmers. It is a set of about 30 English like commands
such as Select..From.where.
SQL has following features:
a. Simple English like commands
b. command syntax is easy
c. Can be sued by non- programmers.
d. Can be used for different type of DBMS
e. Allows user to create, update database.
f. Allows retrieving data from database without having detailed information about
structure of the records and without being concerned about the processes the DBMS users
to retrieve the data.
g. Has become standard practice for DBMS.
Since SQL is used in many DBMS, managers who understand SQL are able to use the same set of
commands regardless of the DBMS software that they may use.
PROGRAM LIBRARY MANAGEMENT SYSTEM
Program library management system provides several functional capabilities to facilitate effective
and efficient management of the data centre software inventory. The inventory may include
application and system software program code, job control statements that identify resources
used and processes to be performed and processing parameters which direct processing.
Some of the capabilities are as follows:
54
a. Integrity- each source program is assigned a modification number and version number and
each source statement is associated with a creation date. Security to program libraries, job
control language sets and parameters file is provided through the use of passwords,
encryption, data compression facilities and automatic backup creation.
b. Update- Library management systems facilitate the addition, deletion, re-sequencing, and
editing of library members.
c. Reporting- With use of its facilities a list of additions, deletions and modifications along
with library catalogue and library member attributes can be prepared for management
and auditor review.
d. Interface- Library software packages may interface with the operating system, job
scheduling, access control system and online program management.
Need for Documentation:
It provides a method to understand the various issues related with software

development.
It provides means to access details related to system study, system development,

system testing, system operational details.
It provides details associated with further modification of software.
4 types of documentation are required prior to delivery of customized software to

a customer :
Strategic and application plans
Application systems and program documentation
Systems software and utility program documentation
Database documentation, Operation manual, User manual, Standard

manual, Backup manual and others.
DATA WAREHOUSE
A Data warehouse is a computer database that collects, integrates and stores an organisations
data with the aim of producing accurate and timely management information and supporting data
analysis. It provides tools to satisfy the information needs of employees or all organizational levels
and not just for complex data queries. It made possible to extract archived operational data and
overcome inconsistencies between different legacy data formats.
A Data Mart is a subset of a Data Warehouse. Most organizations do start designing a data mart to
attend to immediate needs. To keep it simple, consider Data Mart as a data reserve that satisfies
55
certain aspect of business or just one application (or a process). Data Warehouse is a super set
that engulfs all such mini Data marts to form one big reservoir of information.
Characteristics of Data warehouse

1. It is subject oriented, means data are organized according to subject instead of
application. The organized data (according to subject) contains only the information
necessary for decision support processing.
2. Encoding of data is often inconsistent when the data resides in many separate
applications in the operational environment but when data are moved from the
operational environment into the data warehouse they assume a consistent coding
convention.
3. Data warehouse contains a place for storing historical data to be used for comparison,
trends and forecasting.
4. Data are not uploaded or changed in anyway once they enter the data warehouse but
are only loaded and accessed.
COMPONENTS OF A DATA WAREHOUSE (W.R.T figure)

Data Sources
Data sources refer to any electronic repository of information that contains data of interest for
management use or analytics. This definition covers mainframe databases (e.g. IBM
DB2, ISAM, Adabas, Teradata,
etc.),client-server databases
(e.g. IBM
DB2, Oracle
database, Informix, Microsoft SQL Server etc.), PC databases (eg Microsoft Access), spreadsheets
(e.g. Microsoft Excel) and any other electronic store of data. Data needs to be passed from these
56
systems to the data warehouse either on a transaction-by-transaction basis for real-time data
warehouses or on a regular cycle (e.g. daily or weekly) for offline data warehouses.
Data Transformation
The Data Transformation layer receives data from the data sources, cleans and standardises it,
and loads it into the data repository. This is often called "staging" data as data often passes
through a temporary database whilst it is being transformed. This activity of transforming data
can be performed either by manually created code or a specific type of software could be used
called an ETL tool. Regardless of the nature of the software used, the following types of activities
occur during data transformation:
Comparing data from different systems to improve data quality (e.g. Date of birth for a
customer may be blank in one system but contain valid data in a second system. In this
instance, the data warehouse would retain the date of birth field from the second system)
standardising data and codes (e.g. If one system refers to "Male" and "Female", but a
second refers to only "M" and "F", these codes sets would need to be standardised)
integrating data from different systems (e.g. if one system keeps orders and another stores
customers, these data elements need to be linked)
performing other system housekeeping functions such as determining change (or "delta")
files to reduce data load times, generating or finding surrogate keys for data etc.
Data Warehouse
The data warehouse is a relational database organised to hold information in a structure that best
supports reporting and analysis. Most data warehouses hold information for at least 1 year and
sometimes can reach half century, depending to the Business/Operations data retention
requirement. As a result these databases can become very large.
Reporting
The data in the data warehouse must be available to the organisation's staff if the data warehouse
is to be useful. There are a very large number of software applications that perform this function,
or reporting can be custom-developed. Examples of types of reporting tools include:
57
Business intelligence tools: These are software applications that simplify the
process of development and production of business reports based on data
warehouse data.
Executive information systems: These are software applications that are used to
display complex business metrics and information in a graphical way to allow rapid
understanding.
OLAP Tools: OLAP tools form data into logical multi-dimensional structures and
allow users to select which dimensions to view data by.
Data Mining: Data mining tools are software that allows users to perform detailed
mathematical and statistical calculations on detailed data warehouse data to
detect trends, identify patterns and analyse data.
Metadata
Metadata, or "data about data", is used to inform operators and users of the data warehouse
about its status and the information held within the data warehouse. Examples of data warehouse
metadata include the most recent data load date, the business meaning of a data item and the
number of users that are logged in currently.
Operations
Data warehouse operations comprises of the processes of loading, manipulating and extracting
data from the data warehouse. Operations also cover user management, security, capacity
management and related functions
Optional Components
In addition, the following components also exist in some data warehouses:
1. Dependent Data Marts: A dependent data mart is a physical database (either on the same
hardware as the data warehouse or on a separate hardware platform) that receives all its
information from the data warehouse. The purpose of a Data Mart is to provide a sub-set
of the data warehouse's data for a specific purpose or to a specific sub-group of the
organisation.
2. Logical Data Marts: A logical data mart is a filtered view of the main data warehouse but
does not physically exist as a separate data copy. This approach to data marts delivers the
same benefits but has the additional advantages of not requiring additional (costly) disk
space and it is always as current with data as the main data warehouse.
3. Operational Data Store: An ODS is an integrated database of operational data. Its sources
include legacy systems and it contains current or near term data. An ODS may contain 30
to 60 days of information, while a data warehouse typically contains years of data. ODS's
are used in some data warehouse architectures to provide near real time reporting
capability in the event that the Data Warehouse's loading time or architecture prevents it
being able to provide near real time reporting capability.
Different methods of storing data in a data warehouse
All data warehouses store their data grouped together by subject areas that reflect the general
usage of the data (Customer, Product, Finance etc.). The general principle used in the majority of
data warehouses is that data is stored at its most elemental level for use in reporting and
information analysis.
Within this generic intent, there are two primary approaches to organising the data in a data
warehouse.
The first is using a "dimensional" approach. In this style, information is stored as "facts" which are
numeric or text data that capture specific data about a single transaction or event, and
"dimensions" which contain reference information that allows each transaction or event to be
classified in various ways. As an example, a sales transaction would be broken up into facts such
as the number of products ordered, and the price paid, and dimensions such as date, customer,
product, geographical location and sales person. The main advantages of a dimensional approach
are that the Data Warehouse is easy for business staff with limited information technology
58
experience to understand and use. Also, because the data is pre-processed into the dimensional
form, the Data Warehouse tends to operate very quickly. The main disadvantage of the
dimensional approach is that it is quite difficult to add or change later if the company changes the
way in which it does business.
The second approach uses database normalisation. In this style, the data in the data warehouse is
stored in third normal form. The main advantage of this approach is that it is quite straightforward
to add new information into the database, whilst the primary disadvantage of this approach is
that it can be quite slow to produce information and reports.
The Advantages of using a Data Warehouse are:
1. Enhanced and user access to a wide variety of data.
2. Increased Data consistency
3. Increased productivity and decreased computational cost.
4. It is able to combine data from different sources, in one place.
5. It provides an infrastructure that could support change to data and replication of the
changed data back into the operational systems.
Concerns in using data warehouse
Extracting, cleaning and loading data could be time consuming.

Data warehousing project scope might increase.
Problems with compatibility with systems already in place e.g. transaction processing
system.
Providing training to end-users, who end up not using the data warehouse.
Security could develop into a serious issue, especially if the data warehouse is web
accessible.
Types of Data Warehouses

With improvements in technology, as well as innovations in using data warehousing techniques,
data warehouses have changed from Offline Operational Databases to include an Online
Integrated data warehouse.
Offline Operational Data Warehouses are data warehouses where data is usually copied and
pasted from real time data networks into an offline system where it can be used. It is usually the
simplest and less technical type of data warehouse.
Offline Data Warehouses are data warehouses that are updated frequently, daily, weekly or
monthly and that data is then stored in an integrated structure, where others can access it and
perform reporting.
Real Time Data Warehouses are data warehouses where it is updated each moment with the
influx of new data. For instance, a Real Time Data Warehouse might incorporate data from a Point
of Sales system and is updated with each sale that is made.
59
Integrated Data Warehouses are data warehouses that can be used for other systems to access
them for operational systems. Some Integrated Data Warehouses are used by other data
warehouses, allowing them to access them to process reports, as well as look up current data.
BACKUP AND RECOVERY
Recovery is a sequence of tasks performed to restore a database to some point-in-time.
'Disaster recovery' differs from a database recovery scenario because the operating system
and all related software must be recovered before any database recovery can begin.
Database files that make up a database: Databases consist of disk files that store Data.
When you create a database either using any database software command-line utility, a
main database file or root file is created. This main database file contains database tables,
system tables, and indexes. Additional database files expand the size of the database and
are called dbspaces.
A dbspace contains tables and indexes, but not system tables.
A transaction log is a file that records database modifications. Database modifications

consist of inserts, updates, deletes, commits, rollbacks, and database schema changes. A
transaction log is not required but is recommended. The database engine uses a
transaction log to apply any changes made between the most recent checkpoint and the
system failure. The checkpoint ensures that all committed transactions are written to disk.
During recovery the database engine must find the log file at specified location. When the
transaction log file is not specifically identified then the database engine presumes that
the log file is in the same directory as the database file.
A mirror log is an optional file and has a file extension of .mlg. It is a copy of a transaction
log and provides additional protection against the loss of data in the event the transaction
log becomes unusable.
Online backup, offline backup, and live backup: Database backups can be performed while
the database is being actively accessed (online) or when the database is shutdown (offline)
When a database goes through a normal shutdown process (the process is not being
cancelled) the database engine commits the data to the database files An online database
backup is performed by executing the command-line or from the 'Backup Database' utility.
When an online backup process begins the database engine externalizes all cached data
pages kept in memory to the database file(s) on disk. This process is called a checkpoint.
The database engine continues recording activity in the transaction log file while the
database is being backed up. The log file is backed up after the backup utility finishes
backing up the database. The log file contains all of the transactions recorded since the last
database backup. For this reason the log file from an online full backup must be 'applied'
to the database during recovery. The log file from an offline backup does not have to
participate in recovery but it may be used in recovery if a prior database backup is used.
60
A Live backup is carried out by using the backup utility with the command-line option. A
live backup provides a redundant copy of the transaction log for restart of your system on
a secondary machine in the event the primary database server machine becomes
unusable.
Full and Incremental database backup: Full backup is the starting point for all other types
of backup and contains all the data in the folders and files that are selected to be backed
up. Because full backup stores all files and folders, frequent full backups result in faster
and simpler restore operations.
Incremental backup stores all files that have changed since the last FULL, DIFFERENTIAL
OR INCREMENTAL backup. The advantage of an incremental backup is that it takes the
least time to complete.
For example, you're running a backup on Friday: this first backup always would be a
full backup by default. Then, upon your working with theses files on Monday, Leo Backup
performs the incremental backup: this backup will transfer only those files that changed since
Friday. A Tuesday backup will carry only those files that changed since Monday. And the same
course for the following days.
Core phases in developing a backup and recovery strategy

1. Create backup and recovery commands: The commands should be verified with the actual
results produced to ensure that desired results are produced.
2. Time estimates from executing backup and recovery commands help to get a feel for how
long will these tasks take. This information helps in identifying what command will be
executed and when.
61
3. Document the backup commands and create procedures outlining backups which are kept
in a file. Also identify the naming convention used as well as the king of backups
performed.
4. Incorporate health checks into the backup procedures to ensure that the database is not
corrupt. Database health check can be performed prior to backing up a database or on a
copy of the database from the back up.
5. Deployment of backup and recovery consists of setting up backup procedures on the
production server. Verification of the necessary hardware in place and any other
supporting software required to perform these tasks must be done. Modify procedures to
reflect the change in development.
6. Monitor backup procedures to avoid unexpected errors. Make sure that any changes in
the process are reflected in the documentation.
Data Centre and the challenges faced by the management of a data
centre:
i.
A Data centre is a centralized repository for the storage, management and dissemination
of data and information.
ii.
Data centre is a facility used for housing a large amount of electronic equipment, typically
computers and communication equipment.
iii.
The purpose of a data centre is to provide space and bandwidth connectivity for server in a
reliable, secure and scalable environment.
iv.
It also provides facilities like housing websites, providing data serving and other services
for companies. Such type of data centre may contain a network operation s centre (NOC)
which is restricted access area containing automated system that constantly monitor
server activity, web traffic, network performance and report even slight irregularities to
engineers so that they can stop potential problems before they occur.
Challenges:
Maintaining Infrastructure A Data centre needs to set up an infrastructure comprising of
a member of electronic equipment, typically computers and band width connectivity for
server in a reliable secure and saleable environment.
Skilled Human Resources a Data centre needs skilled staff expert at network management
having software and hardware operating skill.
Selection of Technology- A Data centre also faces the challenge of proper selection of
technology crucial to the operation of the data centre.
Maintaining system performance A Data centre has to maintain maximum uptime and
system performance, while establishing sufficient redundancy and maintaining security.
62
DATA MINING
Data mining is the extraction of implicit, previously unknown and potentially useful information
from data. It searches for relationship and global patterns that exist in large databases but are
hidden among the vast amount of data. These relationships represent valuable knowledge about
database and objects in the database that can be put to use in the areas such as decision support,
prediction, forecasting and estimation.
In other words, data mining is concerned with the analysis of data and the use of software
techniques used for finding patterns and regularities in sets of data. It is the computer responsible
for finding the patterns by identifying the underlying rules and features in the data.
Stages in data mining
1. Selection: Selecting or segmenting the data according to some criteria so that sub sets of
the data can be determined.
2. Pre- processing: This is the data cleansing stage where certain information is removed
which is deemed unnecessary and may slow down queries. Also the data is re-configured
to ensure a consistent format as there is a possibility of inconsistent formats because the
data is drawn from several sources.
3. Transformation: The data is not merely transferred across but transformed in that overlays
may be added. For example, Demographic overlays are commonly used in market
research. The data is made usable and navigable.
4. Data mining: This stage is concerned with the extraction of patterns from the data. A
pattern can be defined as a given set of facts. One popular example of data mining is using
past behaviour to rank customers. Such tactics have been employed by financial
companies for years as a means of deciding whether or not to approve loans and credit
cards.
5. Integration and Evaluation: The patterns identified by the systems are interpreted into
knowledge which can then be used to support human decision making. For example,
prediction and classification tasks, summarising the contents of a database or explaining
observed phenomenon.
63

Data Storage, Retrieval and DBMS

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Storage, Retrieval and DBMS

Uploaded by

Copyright:

Available Formats

DATA STORAGE, RETRIEVAL AND DATA BASE MANAGEMENT SYSTEMS

Data are raw facts or observations or assumptions or occurrence about physical

CONCEPTS RELATED TO DATA

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

A candidate key is a combination of attributes that can be uniquely used to identify a

Referential integrity: A feature provided by relational database management systems

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

Currency FieldsThe currency field accepts data in dollar form by default.

Integer Fields The integer field accepts data as a whole number.

Data storage hierarchy

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

Record: Fields are grouped together to form a record. An employee

A physical record refers to the A Logical record refers to the way a

Portion of same logical record A logical record is independent of

A group of pulses recorded on a It can be a payroll record for an

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

Transaction File and Master File

It contains current or nearly current These files generally contain

It rarely contains detailed transaction It contains detailed data.

The product files, customer files, Purchase orders, job cards,

These are usually maintained on direct These are usually maintained

It can never be redundant as it has to Once the transaction files are

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

3. Sequentially organized files that are processed by computer systems are

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

III. Direct File Access Organization

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

C- Random File Organization

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

DATA BASE MANAGEMENT SYSTEMS

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

Local computer on the network offers immediate response to local needs.

Equipment operating and management costs are often lower.

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

Correct installation of the DBMS product.

Ensuring that adequate file space is available

Allocate the disc space for database properly.

Allocation of data files in standard sizes for input out balancing.

How are buffers being used?

How the DBMS uses main memory?

What the programs in main memory have?

3. Input/ Output ( I/O) Contention:

Achieving maximum I/O performance is one of the most important

Clock speed of CPU requires more time management of I/O.

Simultaneous or separate use of I/O Devices.

Spooling, buffering, etc. can be used.

Multi programming and multi-processing improves performance in

Monitoring CPU load.

Mixture of online/ background processing need to be adjusted.

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

Components of a Database Environment

Application programmers who are responsible for developing the application

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

It is defined by conceptual schema and describes all records, relationships,

SREERAM ACADEMY (FORMERLY SREERAM COACHING POINT)

Hierarchical Database Structure

In this type of architecture records are logically arranged into a hierarchy of

Records are logically structured in inverted tree pattern.

It implements one to one and one to many relationships.

Each record or node in hierarchy is related to other records in a parent- child

Child to many parents type logical structure finds difficulty in processing.