You are on page 1of 34

Subject oriented

Subject oriented

Integrited

Time Varient

Time Varient

Architecture of Data
WareHouse

Architecture of a Data
Warehouse with a Staging Area

Architecture of a Data
Warehouse with a Staging Area
and Data Marts

Distributed Data Ware


House

Incorrect Data in the Data


warehouse.
The architect needs to know what is to do
about incorrect data in the data warehouse.
The first assumption is that incorrect data
arrives in the data warehouse on an
exception basis.
If the data is being incorrectly entered in
the data warehouse on a wholesale basis,
then
It is the duty of the architect to find the
offending and make adjustment.

How to correct
To correct the offending an architect can do
three things.
Example: suppose on july 1 Rs 500 is made in
to operational system on july 2 a snapshot
taken in data warehouse and on july 15 it
discovered that it was a entry of 250 rather
than 500 on july 1.
Then
choice 1. go back to july 2 and update 250
inspite of 500. but it can create problem if any
report has been taken between july 2 to july 15.

How to correct
choice 2.
Enter offsetting entry i.e make two
entry first debit 500 then credit 250.
some time it also can create problem.
Choice 3.
Reset the account to the proper value.
but it will not correct the error.
So depending on the situation you can
make any decision.

Structuring Data in Data


Warehouse
The simplest and most common data structure
found in the data warehouse is The simplest
cumulative structure i.e daily transactions being
reported from the operational environment.
Example: jan 1, jan2 jan3 data
Rolling summary data
After that they are summarized into data ware
house records,
Example: Rolling summary data
Week1 data, week2 data, month1 data month2
data.

Reporting and the architected


environment
Once the data warehouse has been constructed all
reporting and infromational processing will be done
from there.
1. Operational reporting for clerical level
It focus on the line item(detailed information).
Example: A cashier has to check whole day
transaction in the evening for balance check.
2. Data ware house reporting for management level.
It focus on summary information.
Example: A bank vice president has to take decision
how many ATM machine has to place in that
particular city so he does not need one day
transactions but he needs one month or one year
summary of data to take decision.

Purging Ware house Data


Data purging is nothing but deleting
your data from DW.
Data does not just pour into a data
ware house. It has its own life cycle
within the data warehouse.
It does not means it is fully removed
it means it rolled up to high level of
summary. Where details is lost.

Granularity
Refers to the level of details of the Data
Dual level of Granularity: 1. Low Level of Detail(More details)
2. High Level of detail( less details i.e
Summary)
Mostly Data in Data warehouse is in High level
But it has Low Level of Detail also for atomic
query.

Data Granularity
Data Granularity
A significant difference between an operational system
and a data warehouse is the granularity of the data
stored.
An operational system typically stores data at the
lowest level of granularity: the maximum level of detail.
However, because the data warehouse contains data
representing a long period in time, simply storing all
detail data from an operational system can result in an
overworked system that takes too long to query.
A data warehouse typically stores data in different
levels of granularity or summarization, depending on
the data requirements of the business. If an enterprise
needs data to assist strategic planning, then only highly
summarized data is required.

Granularity
The lower the level of granularity of data required by
the enterprise, the higher the number of resources
(specifically data storage) required to build the data
warehouse. The different levels of summarization in
order of increasing granularity are:
Current operational data
Historical operational data
Aggregated data
Metadata
Current and historical operational data are taken,
unmodified, directly from operational systems.
Historical data is operational level data no longer
queried on a regular basis, and is often archived
onto secondary storage.

You might also like