You are on page 1of 56

Introduction To Data Warehouse

Using Cognos 8 BI

Created By : Gourav Atalkar

Reviewed By: Amit Sharma

Contact Point : bisp.consulting@gmail.com


Course Roadmap
• Data Warehousing - An Overview
• Data Warehouse Architecture
• Data Modeling for Data Warehousing
• Overview (OLAP)
• Multidimensional Analysis
Ø Multidimensional Analysis Introduction
Ø Operations In multidimensional Analysis
Ø Multidimensional Data Model
Ø Multi-Dimensional vs. Relational
Objectives

• At the end of this lesson, you will know :


– What is the Need of Data Warehousing (Scenarios)
– What is Data Warehousing
– The evolution of Data Warehousing
– Need for Data Warehousing
– OLTP Vs Warehouse Applications
– Data marts Vs Data Warehouses
– Data Warehouse Schemas
– Reporting fundamentals
Business Scenario –I

You are a database administrator for a company that is called TBC: The
FMCG Company. The company manufactures daily needs products for
sale to other businesses. The financial department wants to track,
analyze, and forecast the sales revenue across geographic regions on a
periodic basis for all products sold.

•What is the most effective distribution channel ?


•What product promotions have the biggest impact on revenue?
•Who are my customers and what products are they buying?
•Which are our lowest/highest margin customers ?
•What impact will new products/services have on revenue and margins?
•Which customers are most likely to go to the competition ?
Business Scenario -I
Data Input O
L
Delhi A
P
Sales per product type
per branch S
Mumbai
for first quarter. E
R Sales Manager
Kolkata V
E
Bhopal R
Solution: I
Extract sales information from each database.
Store the information in a common repository at a single site.
Data Input

Query &Analysis tools Report


Delhi
Data Output via
Business
Intelligence Tool
Mumbai Data Ware (i.e. Cognos, MSBI,
House Hyperion)
Sales Manager
Kolkata

Bhopal
Business Scenario –II

One Stop Shopping Super Market has huge operational database.


Whenever Executives wants some report the OLTP system
becomes slow and data entry operators have to wait for some
time.
Business Scenario –II
Data Entry Operator

Report
Management

Wait
Operational
Database

Data Entry Operator


Solution: II

Extract data needed for analysis from operational database.


Store it in warehouse. Refresh warehouse at regular interval
so that it contains up to date information for analysis.
Warehouse will contain data with historical perspective.
Solution: II
Data Entry Operator

Report

Transaction Extract
Operational data Data Ware
Database House
Management
Data Entry Operator
Business Scenario –III

Cakes & Cookies is a small, new company. President of


the company wants his company should grow. He needs
information so that he can make correct decisions.
Solution: III
Improve the quality of data before loading it into the warehouse.
Perform data cleaning and transformation before loading the data.
Use query analysis tools to support ad-hoc queries.
Improvement

Query &Analysis tools

Data Output via


Business
Data Ware Intelligence Tool
House (i.e. Cognos,
MSBI, Hyperion)
President
What is a Data ware House ?

A single, complete and consistent store of data obtained


from a variety of different sources made available to end
users in a what they can understand and use in a business
context.

A process of transforming data into information and making


it available to users in a timely enough manner to make a
difference
Characteristics of Data Warehouse

• A data warehouse is a
Subject oriented
Integrated
Time varying
Non-volatile
collection of data that is used primarily in
organizational decision making.
Subject-oriented Characteristics of a Data Warehouse

Operational Data
Warehouse

Leads Inventory Customers Products

Quotes Regions Time


Orders
Integrated Characteristics of a Data Warehouse
• Data Warehouse is constructed by integrating multiple
heterogeneous sources.
• Data Preprocessing are applied to ensure consistency.

RDBMS

Data
Legacy Warehouse
System

Flat File Data Processing


Data Transformation
Time Variant Characteristics of a Data Warehouse

Operational Data
Warehouse

Current Value data Snapshot data


• time horizon : 60-90 days • time horizon : 5-10 years
• key may not have element of time • key has an element of time
• data warehouse stores historical
data
Non Volatile Characteristics of a Data Warehouse
insert change Only Select

Operational Data
Warehouse
insert
delete
load

replace
change
Data Warehouse Architecture

Relational
Databases
Optimized Loader

ERP
Extraction
Systems Cleansing

Data Warehouse
Engine Analyze
Purchased Query
Data

Legacy
Data Metadata Repository
OLTP vs Data Warehouse
• OLTP • Warehouse (DSS)
– Application Oriented – Subject Oriented
– Used to run business – Used to analyze business
– Detailed data – Summarized and refined
– Current up to date – Snapshot data
– Isolated Data – Integrated Data
– Repetitive access – Ad-hoc access
– Clerical User – Knowledge User
(Manager)
Online analytical Process[OLAP]

OLAP is a category of software tools that provides analysis of data stored


in a database. With OLAP, analysts, managers, and executives can gain
insight into data through fast, consistent, interactive access to a wide
variety of possible views.

Product

Data Ware
House
Online analytical Process[OLAP]
OLAP is a category of software tools that provides analysis of
data stored in a database. With OLAP, analysts, managers, and
executives can gain insight into data through fast, consistent,
interactive access to a wide variety of possible views.

•What is an OLAP Cube? As you saw in the definition of OLAP,


the key requirement is multidimensional. OLAP achieves the
multidimensional functionality by using a structure called a
cube. The OLAP cube provides the multidimensional way to
look at the data. The cube is comparable to a table in a
relational database.
Features of Cube Representation
Slicing: A slice is a subset of a multidimensional array
corresponding to a single value for one or more members of
the dimensions not in the subset.
Features of Cube Representation
Dicing : A related operation to slicing is dicing. In the case of
dicing, you define a sub-cube of the original space. The data
you see is that of one cell from the cube. Dicing provides you
the smallest available slice.
Features of Cube Representation
Rotating : Rotating changes the dimensional orientation of
the report from the cube data. For example, rotating may
consist of swapping the rows and columns, or moving one of
the row dimensions into the column dimension.
Features of Cube Representation
Dimension :A dimension represents descriptive categories of
data such as time or location. In other words, dimensions are
broad groupings of descriptive data about a major aspect of
a business, such as dates, markets, or products.
Features of Cube Representation
Measure : The measures are the actual data values that occupy the
cells as defined by the dimensions selected. Measures include facts or
variables typically stored as numerical fields, which provide the focal
point of investigation using OLAP. For instance, you are a
manufacturer of cellular phones. The question you want answered is
how many xyz model cell phones (product dimension) a particular
plant (location dimension) produced during the month of January
2003 (time dimension).
Data Warehouse Schema

ØStar Schema
ØFact Constellation Schema
ØSnowflake Schema
Fact:

Definition : Facts are numeric measurements (values) that


represent a specific business activity.
Facts are stored in a FACT table I.e. the center of the star
schema . Facts are used in business data analysis, are units,
cost, prices and revenues

Example: sales figures are numeric measurements that


represent product and/or service sales.
Fact:

The Fact Table holds the measures, or facts. The measures are
numeric and additive across some or all of the dimensions.
For example, sales are numeric and users can look at total
sales for a product, or category, or subcategory, and by any
time period. The sales figures are valid no matter how the
data is sliced.
The centralized table in a star schema is called as FACT table,
that contains facts and connected to dimensions.
Fact:

A fact table typically has two types of columns:


Ø Contain facts and
Ø Foreign keys to dimension tables.

The primary key of a fact table is usually a composite key that is


made up of all of its foreign keys.
A fact table might contain either detail level facts or facts that
have been aggregated (fact tables that contain aggregated facts
are often instead called summary tables). A fact table usually
contains facts with the same level of aggregation.
Dimension

Definition : Qualifying characteristics that provide additional


perspective to a given fact.

Example: sales might be compared by product from region to


region and from one time period to the next.
Here sales have product, location and time dimensions.
Such dimensions are stored in DIMENSIONAL TABLE.
Dimension Table

Definition : The dimensions of the fact table are further


described with dimension tables

Fact table:

Sales (Market_id, Product_Id, Time_Id, Sales_Amt)


Dimension Tables:

Market (Market_Id, City, State, Region)


Product (Product_Id, Name, Category, Price)
Time (Time_Id, Week, Month, Quarter)
What is Star Schema?
• Definition: Star Schema is a relational database schema for
representing multidimensional data. It is the simplest form of
data warehouse schema that contains one or more dimensions
and fact tables.
• It is called a star schema because the entity-relationship
diagram between dimensions and fact tables resembles a star
where one fact table is connected to multiple dimensions.
• The center of the star schema consists of a large fact table
and it points towards the dimension tables.
• The advantage of star schema are slicing down, performance
increase and easy understanding of data.
Steps in designing Star Schema
ØIdentify a business process for analysis(like sales).

ØIdentify measures or facts (sales dollar).

ØIdentify dimensions for facts(product dimension, location


dimension, time dimension, organization dimension).

ØList the columns that describe each dimension.(region name,


branch name, region name).

ØDetermine the lowest level of summary in a fact table(sales


dollar).

ØIn a star schema every dimension will have a primary key.


Steps in designing Star Schema

ØIn a star schema, a dimension table will not have any parent
table.

Ø Whereas in a snow flake schema, a dimension table will have


one or more parent tables.

ØHierarchies for the dimensions are stored in the dimensional


table itself in star schema.

ØWhereas hierarchies are broken into separate tables in snow


flake schema. These hierarchies helps to drill down the data
from topmost hierarchies to the lowermost hierarchies.
Star Schema Examples

Fact table provides sales Dimension tables


statistics broken down by contain descriptions about
product, period and store subjects of the business
dimensions

1:N relationship between


fact and dimension tables
Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical
joins.
Snowflake Schema
ØRepresent dimensional hierarchy directly by normalizing
the dimension tables

ØEasy to maintain

ØSaves storage, but is alleged that it reduces effectiveness of


browsing

ØA single , large and central fact table and one or more


tables for each dimension.

ØDimension tables are normalized i.e. split dimension table


data into additional tables.
Snowflake Schema Example

Product Dim.
Product_id

Region Dim. Sales Fact Product Desc


Store Dim.
Region_id Store_id Product Name
Store_id
City Product_id Product Line
Store Name
State Time_id Product Type
Store Add.
Country measure
Region id Time Dim.
Time_id
Year
Quarter
Month

Drawbacks: Time consuming joins , report generation slow


Fact Constellation

Fact Constellation

Multiple fact tables that share many dimension tables

Booking and Checkout may share many dimension tables in the


hotel industry

This schema is viewed as collection of stars hence called galaxy


schema or fact constellation.

Sophisticated application requires such schema.


Fact Constellation Example
Product Dim.
Period Key
Shipping Fact Sales Fact
Product Desc
Shipper Key Product Name Store Key
Store Key Product Line
Product Key
Product Key Product Type
Period Key
Period Key
Store Dim. measure
Price
Store Key
Store Name
Store Add.
City
From the Data Warehouse to Data Marts

Information

Individually Less
Structured

History
Departmentally
Normalized
Structured
Detailed

Organizationally More
Structured Data Warehouse

Data
Reporting Fundamental Case Study

• DSS Books & Music is a new company which Sales


books,music and videos items.

• There products are sold in different region of the world.

• They have sales units at Mumbai, Pune , Ahemdabad ,Delhi


and Baroda.

• The President of the company wants sales information.


Sales Measures & Dimensions

• Measure – Units sold, Amount.


• Dimensions – Product ,Time , Region.
Sales Data Ware House Tables

Store Dimensions Table


Sales Data Ware House Tables

Region Dimensions Table


Sales Data Ware House Tables

Product Dimensions Table


Sales Data Ware House Tables

Time Dimensions Table


Sales Data Ware House Tables

Sales Fact Table


Sales Data Ware House Model
Sales Information
The product details which has minimum Amount Sales less than
50000 rupees.
Sales Information
The Top N Store details which has maximum Amount Sales.
Sales Information
sales by Store Type to determine which Store are generating the
most revenue and the highest sales volume.
Sales Information
Contribution that each Country makes to revenue.
Questions
Thanks You

Contact Us: bisp.consulting@gmail.com


bispsolutions.wordpress.com
learnhyperion.wordpress.com

You might also like