Week 12 Student

WEEK 12
Dr. A. Brennan
NOTE:
Efficient Physical DB design produces technical

specifications to be used during the DB
implementation phase
For efficient physical DB design, certain

info. needs to be gathered:
Normalised relations with estimates of table volume (number

of rows in each table)
Attribute (field) definitions and possible max. length
Descriptions of data usage (when and where data are
entered, deleted, retrieved, updated etc.)
Response time expectations
Data security needs
Backup/recovery needs
Integrity expectations
What DBMS technology will be used to implement the
database
What DB architecture to use
Once this info. is gathered, the designer

has to decide on a range of issues:
Suitable storage format (i.e. data types) for each

attribute (in order to minimise storage space and
maximise data integrity)
Grouping attributes from the logical model into physical
records (denormalisation)
File organisation (arranging similarly structured records
in secondary memory for the purpose of storage, fast
and efficient retrieval and update, protection of data
and its recovery after errors are found)
Query optimisation
Physical Design
What is it?
translate the logical description of data into technical

specifications for storing and retrieving of data
Why?
Good performance , database integrity, security and

recoverability.
Input and Output for Physical Design
Normalised relations with

estimates of table volume (number
of rows in each table)
Attribute definitions (and possible
maximum length)
Descriptions of data usage (when
and where data are entered,
retrieved, deleted, updated)
Response time expectations
Data security needs
Backup/recovery needs
Integrity expectations
Description of DBMS technology
Suitable storage format (data

type) for each attribute in the
logical data model in order to
minimise storage space and
maximise data integrity
Grouping attributes from the
logical model into physical records
File organisation
Selection of indexes and database
architectures for storing and
connecting files to efficiently
retrieve related data
Query optimisation
Data Types
CHAR fixed-length character
VARCHAR2 variable-length character (memo)
LONG large number
NUMBER positive/negative number
DATE actual date
BLOB binary large object (good for graphics,
sound clips, etc.)
Goals
Data type
Goals = minimise storage space, represent all possible

values, improve data integrity, support all data
manipulations
Data integrity controls

Default
value, range control (constraints/validation

rules), null value control (i.e. PK cannot be null)
Referential integrity (FK cannot be null)
Integrity Controls
Default value - assumed value if no explicit value is entered for an

instance of the field (this reduces data entry time and helps prevent
entry errors for the most common value);
Range control imposes allowable value limitations (constraints or
validation rules). This may be a numeric lower to upper bound, or a set
of specific values. This approach should be used with caution, since the
limits may change over time
Null value control allows or prohibits empty fields (e.g. each primary
key must have an integrity control that prohibits a null value);
Referential integrity a form of a range control (and null value
allowances) for foreign-key to primary-key match-ups. It guarantees
that only some existing cross-referencing value is used
Physical Records
A Physical Record is a group of fields that are stored in adjacent secondary memory
locations and are retrieved and written together as a unit by particular DBMS
Scope:
Efficient use of secondary storage (influenced by both the size of
the record and the structure of the secondary storage)
Data processing speed.
Computer operating systems read data from secondary memory in units called pages.
A page is the amount of data read or written by an operating system in one operation.
Blocking Factor is the number of physical records per page.
Normalization
Normalization is a logical database design that is

structurally consistent and has minimal redundancy.
Normalization forces us to understand completely
each attribute that has to be represented in the
database. This may be the most important factor
that contributes to the overall success of the system.
What is Denormalization?
Denormalization a process of transforming normalised
relations into unnormalised physical record

specifications
Denormalization can also be referred to a process

in which we combine two relations into one new
relation, and the new relation is still normalized but
contains more nulls than the original relations
Denormalization
In addition, the following factors have to be
considered:
Application specific;
Denormalization may speed up retrievals but it
slows down updates
Size of tables
Coding
Answer
15
Efficient data processing (second goal of physical record

design after efficient use of storage space) in most cases,
dominates the design process.
The speed of data processing depends on how close
together the related data are.
Benefits and Possible Problems

16
Benefits:
Can improve performance (speed)
Due to data duplication
Problems:
Wasted storage space
Data integrity/consistency threats
Denormalisation How?
Option one: Combine attributes from several logical
relations together into one physical record in order to avoid
doing joins (one to one, many to many, one to many)
Option two: Partition a logical relation into several
physical records (multiple tables);
Option three: Data replication; or a combination of the
two options above.
Denormalisation Option 1
1. Two entities with a
one-to-one relationship
Mapping
Logical Model: Normalised Relations
Select * from EMPLOYEE, PARKING

WHERE
EMPLOYEE.Employee-ID = PARKING.Employee-ID
1. Two entities with a
one-to-one relationship
Try this!
Employee
PPS
EMPLOYEE
Address
Manager
ID
Name
Manages
MANAGER
Expertise
EMPLOYEE(EmployeePPS, Name, Address,

ManagerID)
MANAGER(ManagerID, Expertise)
Select * from EMPLOYEE, MANAGER

WHERE
EMPLOYEE.ManagerID = MANAGER.ManagerID
ManagerID
Expertise
EmployeePPS
Name
Address
Many-to-many relationship (associative entity)
2.
with non-key attributes
Physical Model: Denormalised Relation
3 One to many relationship
Logical Model: Normalised Relations Resulting from One-to-Many (1:M) Relationship
Physical Model: Denormalised Relation
Option 2 : Partitioning of logical relation into multiple tables
Horizontal partitioning - places different rows of a table into several physical files, based
on common column values.
Vertical partitioning distributing the columns of a table into several separate files,
repeating the primary key in each one of them
CUSTOMER
CustID
CUSTOMERA
CUSOMTERB
FirstName
CustID
CustID
MiddleName
FirstName
CreditLimit
LastName
MiddleName
SalesTaxRate
Address1
LastName
Address2
Address1
City
Address2
County
City
Country
County
Phone
Country
CreditLimit
Phone
SalesTaxRate
Fax
Fax
Email
Email
31
Vertical partitioning
32
CUSTOMER
CUSTOMERA-M
CUSTOMERN-Z
CustID
CustID
CustID
FirstName
FirstName
FirstName
MiddleName
MiddleName
MiddleName
LastName
LastName
LastName
Address1
Address1
Address1
Address2
Address2
Address2
City
City
City
County
County
County
Country
Country
Country
Phone
Phone
Phone
CreditLimit
CreditLimit
CreditLimit
SalesTaxRate
SalesTaxRate
SalesTaxRate
Fax
Fax
Fax
Email
Email
Email
Horizontal partitioning
Advantages and disadvantages of

partitioning
33
Efficiency
Local optimisation
Recovery
Slow retrieval
Complexity
Extra space and time for updates
34
Option 3 : Data replication; or a combination of the other two

options
Data Replication - the same data is purposely stored in multiple locations

of the database.
Data replication improves performance by allowing multiple users to

access the same data at the same time with minimum contention.
Denormalisation Disadvantages
The potential for loss of integrity is considerable.

Additional time that is required to maintain consistency
automatically every time a record is inserted, updated,
or deleted
Increase in storage space resulting from the duplication
Whose responsibility?
36
DBMS
Database Designer
File Organisation
37
1. Sequential File Organisation
2. Indexed File Organisation

3. Hashed File Organisation
38
Sequential File Organisation
The records are stored in sequence according to a

primary key value. To locate a particular record, a
program must scan the file from its beginning until
the desired record is located
https://www.youtube.com/watch?v=zDzu6vka0rQ
40
Indexed File Organisation
The records are stored either sequentially or non

sequentially, and an INDEX is created allowing the
application software to locate the individual records
Indexed files have the capability of creating

multiple indexes e.g. library where there
are indexes on author, title, subject etc.
Therefore Indexes are most useful

for
Larger tables
Attributes which are referenced in ORDER BY or
GROUP BY clauses
https://www.youtube.com/watch?v=h2d9b_nEzoA
43
Hashed File Organisation
The address of each file is determined using a

hashing algorithm
A hashing algorithm is a routine that converts a PK
value into a record address
A hash index table uses hashing to map a key into a
location in an index, where there is a pointer to the
data record matching the hash key
DB Architecture
Note
46
De-normalisation should only take place after a

satisfactory level of normalisation has taken place
Goal of Physical DB Design
The goal of physical DB design is to create technical

specifications from the logical descriptions of data
that will provide adequate data storage and
performance and will ensure database integrity,
security and recoverability
DATA AND DATABASE

ADMINISTRATION
Data within the organisation
Data are a resource to be translated into

information
Data is constantly being produced and analysed to
create even
more data
Database use in the organisation
Top management
strategic
Middle management
tactical
decision making, planning and policy
decisions and planning
Operational management
support
company operations
MIS
DSS
Database
TPS
Management data
Two recognised roles
Data/database administration
Data administration is responsible for:

planning and analysis function responsible for setting data
policy and standards
promoting companys data as a competitive resource
providing liaison support to systems analysts during
application development
Database administration
operationally oriented
responsible for day-to-day monitoring and management of
active database
liaison and support during application development
Data administrator
Data coordination
Data standards
keep track of update, responsibilities and interchange

e.g naming standards
Liaison with systems analysts and programmers,

including design
Training managers, users, developers
Arbitration of disputes and usage authorization
Documentation and internal publicity
Promotion of datas competitive advantage
Database administrator
Responsible for the day-to-day administration of

the database
Monitors performance to maximize efficiency
Provides central point for troubleshooting
Monitors security and usage (audit log)
Responsible for operational aspects of data
dictionary
Carries out data and software maintenance
Involved in database design
Database Administrator in DB Design
Define conceptual schema
Define internal schema
decide physical database design
Liaise with users
what data to be held; what entities; what attributes
ensure the data they need is available
Define security needs

Define backup and recovery
Monitor performance
respond to changing requirements
A Summary of DBA Activities
db activity
db service
planning
end-user support
organising
policies, procedures and standards
testing
of
data security, privacy and integrity
monitoring
data backup and recovery
delivering
data distribution and use
Tools for Database Administration
Information is kept about all corporate resources,

including data
This data about data is termed metadata
The database which holds this metadata is the data
dictionary
Two types data dictionary
stand-alone
or passive
integrated or active
Metadata in Access
Data Dictionary
Passive data dictionary

self-contained
database
all data about entities are entered into the dictionary
requests for metadata information are run as reports
and queries as necessary
Active data dictionary
Data dictionary: relationships
Table construction
Security
which tables or files are on which disks
Program data requirements
which programs might be affected by changes to which tables
Physical residence
which people have access to which databases or tables
Impact of change
which attributes appear in which tables
which programs use which tables or files
Responsibility
who
is responsible for updating which databases or tables
Introducing a Database: Considerations
Three important aspects

technological:
managerial:
cultural:
DBMS software and hardware

Administrative functions
Corporate resistance to change
Social impact databases
Data collection is extensive

both
voluntary and involuntary
Data is a commodity
DATABASE SECURITY
Security - types threat

external
Loss or corruption to data due to sabotage
Loss or corruption to data due to error

Disclosure of sensitive data
Fraudulent manipulation of data
internal
Threats to data security
Controlling unauthorised access
Physical access to building

Access to hardware
Monitor any unusual activity
Controlling unauthorised access
Developing user profiles

care
over decisions on what data and resources can be

accessed (and type access) for each end user
user training and education
Firewalls
Encryption
Plugging known security holes
using
patches available for known problems
Developing user profiles
Every user is given an identifier for authentication

Users are given privileges to access data
dependent on what is essential for their work
insert
update
delete
Most DBMS provide an approach called

Discretionary Access Control (DAC)
SQL standard supports DAC through the GRANT
and REVOKE commands
DAC and MAC
DAC has certain weaknesses in that an unauthorized

user can trick an authorized user into disclosing sensitive
data
An additional approach is required called Mandatory
Access Control (MAC)
MAC based on system-wide policies that cannot be
changed by individual users
each database object is assigned a security class
each user is assigned a clearance for a security class
rules are imposed on reading and writing of database
objects by users
SQL standard does not include support for MAC
secret
Firewalls
Firewall controls network

traffic
Encryption
Encryption: decoding or scrambling data to make it unintelligible
to those without the key
encryption
Controlling loss DP facilities
Redundancy
Virus protection
Disaster protection
Minimise error
Alert network managers to problems
Minor disruptions require on-going monitoring
Protect against error
Educate all employees

Reminders to save
Should you overwrite existing files?
incorporate
safety nets on deletion
Include integrity checks on data

range checks
validation
check digits
hash totals
cross checking
batch totals
Software Invasion
Cruise virus
Worm
attacks for profit
makes copies of itself
Exploits the networks weakest link - transmits copies to other machines

you
difficult to access to disable
attacks through the public domain
Trapdoor
waits to reach its target

reports successful penetration
simulates regular entry
delivers payload
or bypasses normal
security procedures
Trojan horse
looks like something else
difficult to detect that it

has been run
once launched, too late!
Stealth viruses
encrypt and hides tracks
Logic bomb
event driven
Protecting against virus attacks
Prepare a company policy on viruses

Educate on the destructive power of viruses
Control the source of software purchasing
Ensure new or upgraded software is installed by system
administrator on quarantined machine
Control use of bulletin boards
Install anti-virus software where necessary
Make regular back-ups
data and programs separately
store back-up copies off-site once software opened
Be aware of software holes in systems software
How security can be compromised
Poor security management

Poor connections to the outside world
Shoddy system control
Human folly
Lack of security ethic
And the answer is:
Education!!
DISTRIBUTED DATABASE MANAGEMENT SYSTEMS
Distributed databases
Distributed database
a logically interrelated collection of shared data (and

description of this data) physically distributed over a
computer network
Distributed DBMSs (DDBMS)

the software systems that permits the management of the
distributed database and makes the distribution apparent
to users
must perform all the functions of a centralized DBMS
must handle all necessary functions imposed by the
distribution of data and processing
Distributed processing/database
Distributed Processing
Shares data processing chores over
sites using communications network
Database resides at one site only
Distributed Database
Each site has a data fragment
which might be replicated at
other sites
Requires distributed processing
DDBMS
Advantages
Reflects organisational
structure
Faster data access and
processing
Improved communications in
org.
Reduced operating costs
Improved share-ability and
local autonomy
Less danger of single-point
failure
Modular growth easier
Disadvantages
Complexity management
and control
Security
Integrity control more
difficult
Lack of standard comms.
protocols for dbs
Increased training costs
Database design more
complex
Characteristics DDBMS
A collection of logically related shared data

The data is spilt into a number of fragments
Fragments may be replicated
Fragments/replicas are allocated to sites
Sites linked by a communications network
Data at each site is under control of a DBMS
DBMS at each site can handle local applications
autonomously
Each DBMS participates in at least one global
application
DDBMS features
Application interface to interact with end user or

application programs and with other DBMs
Validation to analyse data results
Transformation: to determine which data requests
are distributed and which local
Query optimization to find best access strategy
Mapping to determine location fragments
I/O interface
Formatting to prepare data for presentation
Distributed database design
Data fragmentation (divide)
need to decide how to split into fragments
OR
Data replication (copy)
a copy of a fragment (or all) may be held at several sites
THEN
Data allocation:
need to decide where to locate those fragments and

replicas : each fragment is stored at the site with optimal
distribution
Data fragmentation
Users work with views, so appropriate to work with

subsets data
Cheaper to store data closest to where it is used
May give reduced performance for global
applications
Integrity control may be difficult if data and
functional dependencies are at different sites
Data fragmentation must be done carefully
Data fragmentation
Breaks single object into two or more segments or

fragments
Each fragment can be stored at any site over a
computer network
Information about data fragmentation is stored in
the distributed data catalog (DDC), from which it is
accessed by the transaction processor
Strategies for fragmentation
For successful fragmentation, must ensure:

completeness:
each data item must appear in at least
one fragment
reconstruction: should be able to define a relational
operation that will reconstruct relation from fragments
disjointness: a data item appearing in one fragment
should not appear in another
Strategies for data fragmentation
Horizontal fragmentation
division of a relation into subsets (fragments) based on

tuples (rows)
Vertical fragmentation
division
of a relation into attribute (column) subsets
Mixed fragmentation
combination
Data replication
Storage of data copies at multiple sites served by a computer

network
Fragment copies can be stored at several sites to serve specific
information requirements
can enhance data availability and response time

can help to reduce communication and total query costs
Data replication
Fully replicated database:

stores multiple copies of each database fragment at
multiple sites
can be impractical due to amount of overhead
Partially replicated database:

stores multiple copies of some database fragments at
multiple sites
most DDBMSs are able to handle the partially replicated
database well
Unreplicated database:
stores each database fragment at a single site
no duplicate database fragments
Data allocation
Data allocation is closely related to the way the

database if fragmented : leads to decisions on which
data is stored where
Centralized
Partitioned/ fragmented
entire database is stored at one site

database divided into several fragments and stored at
several sites
Replicated
copies of one or more database fragments (selective

replication) are stored at several sites
Strategies for data allocation
BIG DATA, SMALL DATA
Big Data
Big data is the term for data sets so large and complex that it
becomes difficult to process them using on-hand database
management tools or traditional data processing applications
We are collecting more data than ever
We have streamlined our processes through normal channels
electronics enables us to do so (RFID)

storage is cheap
computing has enabled us to improve what we do and
businesses are looking for new ways to have a competitive edge
By looking at patterns in this data we can find out useful things
From a McKinsey report
$600 to buy a disk which can store all of eth

worlds music
Internet of Things
Ubiquitous Broadband
Reduction in connectivity
costs
RFID enables unique

addressability
Increasingly, we are including sensors in everyday objects

These often have communicative capacity and link to source
through the internet
Use of Big Data
We can gain additional information derivable from analysis of

a single large set of related data (rather than a large number
of small sets)
Correlations can be found which "spot business trends,
determine quality of research, prevent diseases, link legal
citations, combat crime, and determine real-time roadway
traffic conditions"
The business case (Mc Kinsey)

1.
2.
Big data can unlock significant value by making information

transparent and usable at much higher frequency.
Organizations can collect more accurate and detailed
performance information on everything from product
inventories to sick days, and therefore expose variability and
boost performance. Leading companies are using data
collection and analysis to conduct controlled experiments to
make better management decisions; others are using data for
basic low-frequency forecasting to high-frequency nowcasting
to adjust their business levers just in time.
The business case (Mc Kinsey)

3.
4.
5.
Big data allows ever-narrower segmentation of customers and

therefore much more precisely tailored products or services.
Sophisticated analytics can substantially improve decision-making.
Big data can be used to improve the development of the next
generation of products and services. For instance, manufacturers
are using data obtained from sensors embedded in products to
create innovative after-sales service offerings such as proactive
maintenance (preventive measures that take place before a failure
occurs or is even noticed).
http://www.mckinsey.com/insights/business_technology/big_data_the_next_
frontier_for_innovation
Big Data in Health
Big data is enabling a new understanding of the molecular biology

of cancer. The focus has changed over the last 20 years from the
location of the tumor in the body (e.g., breast, colon or blood), to the
effect of the individuals genetics, especially the genetics of that
individuals cancer cells, on her response to treatment and sensitivity
to side effects. For example, researchers have to date identified
four distinct cell genotypes of breast cancer; identifying the cancer
genotype allows the oncologist to prescribe the most effective
available drug first.
http://strata.oreilly.com/2013/08/cancer-and-clinical-trials-the-roleof-big-data-in-personalizing-the-health-experience.html
Big Data in banking
IBMs Watson can do analysis with unstructured data such as those

found in e-mails, news reports, books and websites. Citigroup has
hired Watson to help it decide what new products and services to
offer its customers and to try to cut down on fraud and look for
signs of customers becoming less creditworthy. In most financial
institutions the immediate use of big data is in containing fraud and
complying with rules on money-laundering and sanctions.
Big credit card companies are getting better at recognising patterns
Solutions are getting cheaper even for smaller banks
Banks also use the data to sell products (eg insurance) by looking at
the type of transactions customers make
http://www.economist.com/node/21554743
Some geospatial uses
The Climate Corporation, an insurance company, combines modern

Big Data techniques, climatology and agronomics to analyse the
weathers complex and multi-layered behaviour to help the worlds
farmers adapt to climate change.
McLarens Formula One racing team uses Big Data to identify issues
with its racing cars using predictive analytics and takes corrective
actions pro-actively. They spend 5% of their budget on telemetry. An
F1 car is fitted with about 130 sensors. In addition to the engine
sensors, video and GPS is used to work out the best line to take
through each bend. The sensor data is helping in traffic smoothing,
energy-optimising analysis and drivers direction determination.
e.g.new Pirelli tyres this year meant teams had to watch for tyre
wear, grip, temperature under different weather conditions and
tracks, relating all that to driver acceleration, braking and steering.
Some geospatial uses
Vestas Wind Systems is implementing a big data solution that is

significantly reducing data processing time and helping faster and
more accurate predict weather patterns prediction at potential sites
to increase turbine energy production. They currently store 2.8
petabytes in a wind library covering over 178 parameters, such as
temperature, barometric pressure, humidity, precipitation, wind
direction and wind velocity from the ground level up to 300 feet.
Nokia need a technology solution to support the collection, storage
and analysis of virtually unlimited data types and volumes. They
leverage data processing and complex analyses in order to build
maps with predictive traffic and layered elevation models, to source
information about points of interest around the world, to understand
the quality of phones and more. www.geospatialworld.net
More geospatial uses
US Xpress, transportation solutions, collects about a thousand

data elements ranging from fuel usage to tyre condition to truck
engine operations to GPS information, and uses this for optimal
fleet management and to drive productivity, saving millions of
dollars in operating costs. When an order is dispatched, it is
tracked using an in-cab system installed on a DriverTech tablet
with speech recognition capability. US Xpress constantly connects
to the devices to monitor progress of the lorry. The video camera
on the device could be used to check if the driver is nodding off.
All the data collected is analysed in real time using geospatial
data, integrated with driver data and truck telematics. They can
minimise delays and ensure trucks are not left waiting when they
arrive at a depot for maintenance. www.geospatialworld.net
Big data in the university
Huddersfield University
Purdue University, Indiana
when student logs into a course website, they see a traffic

light signal (and advice how to move to green)
University of Derby
linked library data to identify learning styles. Now including

lecture attendance records
VLE use, sports, car parking
Loughborough University
analyses staff-student interaction

www.theguardian.com/education/2013/aug/05
Role of Cloud Computing
Enables easier gathering, storage and processing of Big

Data
Cloud computing provides accessibility any time, any
place
Large scale data gathering is possible from multiple
locations
Sharing of data easier
Large scale storage
Processing power also available with virtual machines
provision to analyse data
Can be utilised on an ad-hoc basis
Analysing Big Data
Data mining
Analytics
Machine learning
Visualisation
blend of applied statistics and artificial intelligence

neural networks, cluster analysis, genetic algorithms, decision trees,
support vector machines
interactive rather than static graphs help to understand patterns
Shift of skills to digital analysis and visualisation techniques
Data mining
Who interprets?
A new set of tools make it easier to do a variety of data

analysis tasks. Some require no programming, while other tools
make it easier to combine code, visuals, and text in the same
workflow. They enable users who arent statisticians or data
geeks, to do data analysis. While most of the focus is on
enabling the application of analytics to data sets, some tools
also help users with the often tricky task of interpreting results.
In the process users are able to discern patterns and evaluate
the value of data sources by themselves, and only call upon
expert data analysts when faced with non-routine problems
http://strata.oreilly.com/2013/08/data-analysis-tools-target-nonexperts.html
Issues
Problems with algorithms

can
magnify misbehaviour (e.g. selection bias)
Privacy and security

anonymity:
profiling individuals
Over-reliance on technology
Need for skilled workers with deep analytics skills
www.internetofthings.eu
House Keeping
110
Groups and group names

Project distribution
Weighting (65% exam : 35% CA)
35% CA = 28% project, 7% SQL CAs(approx.)
FINI

Week 12 Student

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 12 Student

Uploaded by

Copyright:

Available Formats

WEEK 12

Efficient Physical DB design produces technical

For efficient physical DB design, certain

Normalised relations with estimates of table volume (number

Once this info. is gathered, the designer

Suitable storage format (i.e. data types) for each

translate the logical description of data into technical

Good performance , database integrity, security and

Input and Output for Physical Design

Normalised relations with

Suitable storage format (data

Goals = minimise storage space, represent all possible

Data integrity controls

value, range control (constraints/validation

Default value - assumed value if no explicit value is entered for an

Blocking Factor is the number of physical records per page.

Normalization is a logical database design that is

relations into unnormalised physical record

Denormalization can also be referred to a process

Efficient data processing (second goal of physical record

Benefits and Possible Problems

Logical Model: Normalised Relations

Select * from EMPLOYEE, PARKING

EMPLOYEE(EmployeePPS, Name, Address,

Select * from EMPLOYEE, MANAGER

Logical Model: Normalised Relations Resulting from One-to-Many (1:M) Relationship

Physical Model: Denormalised Relation

Advantages and disadvantages of

Option 3 : Data replication; or a combination of the other two

Data Replication - the same data is purposely stored in multiple locations

Data replication improves performance by allowing multiple users to

The potential for loss of integrity is considerable.

1. Sequential File Organisation

2. Indexed File Organisation

Sequential File Organisation

The records are stored in sequence according to a

Indexed File Organisation

The records are stored either sequentially or non

Indexed files have the capability of creating

Therefore Indexes are most useful

Hashed File Organisation

The address of each file is determined using a

De-normalisation should only take place after a

Goal of Physical DB Design

The goal of physical DB design is to create technical

DATA AND DATABASE

Data within the organisation

Data are a resource to be translated into

Database use in the organisation

decision making, planning and policy

decisions and planning

Two recognised roles

Data administration is responsible for:

keep track of update, responsibilities and interchange

Liaison with systems analysts and programmers,

Responsible for the day-to-day administration of

Database Administrator in DB Design

Define conceptual schema

Define internal schema

decide physical database design

Liaise with users

what data to be held; what entities; what attributes