Dimensional Modeling

Dimensional Design
Presented by
Dr. Debashis Parida
1
Course Agenda
 Rationale for dimensional modeling
 Dimensional modeling basics
 Dimensional modeling details
 Fact table details
 Dimension table details
 Design process
 Aggregate schemas
 Multiple fact tables
 Architected data marts
2
Rationale for Dimensional
Modeling
3
OLTP Design Characteristics
 Focus of OLTP Design
 Individual data elements
 Data relationships
 Design goals
 Accurately model
business
 Remove redundancy
4
OLTP Design Shortcomings
 Complex
 Unfamiliar to business
people
 Incomplete history
 Slow query
performance
5
Emergence of Dimensional
Model
 Logical modeling technique
 For designing relational database structures
 Addresses OLTP design shortcomings
 For use in analytic systems
 First developed early 1980's
 Packaged goods industry
 Popularized by Ralph Kimball, PhD.
 1996 book: 'The Data Warehouse Toolkit'
6
Dimensional Modeling
Basics
7
Process Measurement
 Measures
 Metrics or indicators by
which people evaluate a
Coffee Maker Fulfillment Report
business process
 Referred to as “Facts” Brand Product Units Sold Units Shipped % Shipped
Captain Standard 5,000 3,800 76%
Examples
Coffee Coffee
 Maker
Thermal 2,400 1,632 68%

 Margin Coffee
Maker
Inventory Amount
Deluxe
 Coffee 2,073 1,658 80%
Maker
 Sales Dollars All

Products 9,473 7,090 75%
 Receivable Dollars
 Return Rate
Facts
8
Perspective Focus
Product Sales and Customer

Development Operations Marketing Services
G/L
category Product, supplier
account
warehouse
9 Process-oriented business perspectives

Process Perspectives
 Dimensions
 The parameters by which
measures are viewed
 Used to break out, filter or
Coffee Maker Fulfillment Report
roll up measures
 Often found after the word Brand Product Units Sold Units Shipped % Shipped
“by” in a business question Captain

Coffee
Standard
Coffee
5,000 3,800 76%
Maker
 Descriptive business terms Thermal 2,400 1,632 68%
Coffee
 Examples Maker
Deluxe
Coffee
Product
2,073 1,658 80%
 Maker
All
 Warehouse Products 9,473 7,090 75%
 Customer
 Supplier
10
Dimensions
Dimensional Model
 Definition
 Logical data model used to represent the
measures and dimensions that pertain to one or
more business subject areas
 Dimensional Model = Star Schema
 Serves as basis for the design of a relational
database schema
 Can easily translate into multi-dimensional
database design if required
 Overcomes OLTP design shortcomings
11
Dimensional Model Advantages
 Understandable
 Systematically
represents history
 Reliable join paths
 High performance query
 Enterprise scalability
12
Schema Simplicity
 Fewer tables Store
 Denormalized Time
Facts
 Consolidated
 Dimensional
 Familiar to users
 Facts go in the fact tables
Product
 Dimensions in dimension
tables
 Increases
understandability
13 Star Schema
Data Familiarity
 Adding business context
 Single source field
ord_date
 Expanded into parts
 Decoded into business
terms
 Add special indicators
and flags Time Dimension
year
 e.g. time dimension
quarter
month
date
 Increases day of the week
understandability holiday flag
14
Representing History
Store
 Time dimension
Time
 Part of every star schema Dimension
Facts
 Marks the date when the year
facts (process quarter
month
measurements) occurred date
 Allows the schema to day of the week
holiday flag Product
easily add and query
data over time
 Especially useful for
performing comparison
queries
15
Time Dimension
Fewer Join Paths
 Star schema joins
 Defined during schema
design - not runtime
 Business people can
easily understand these
relationships
 One-to-many relations
between dimensions and
facts
 Referential integrity
always enforced
16
High Performance Design
 Fewer joins means
less 'expensive'
queries
 Deterministic query
patterns
 Star schema query
optimization
supported by all
major RDBMS
vendors
17
Subject Area Models
Subject
area E/R
models
Manufacturing and Shipping and Sales Order Entry Customer Support
Process Control Inventory and Campaign and Relationship
Management Management Management
Product Sales and Customer

Development Operations Marketing Services
Subject area
dimensional
models
18
Enterprise Models
Enterprise
Scope E/R
model
Enterprise
scope
dimensional
model
19
Dimensional Design
Details
20
Star Schema Dimension Tables
 Dimension tables Dimension
 Store dimension Dimension

values
 Textual content
 Dimension tables
usually referred to
simply as Dimension
'dimensions'
 Spend extra effort to
add dimensional
attributes
21
Dimension Keys
 Synthetic keys Dimension
 Each table assigned a Dimension
key
unique primary key, key

specifically generated
for the data warehouse
 Primary keys from

source systems may be Dimension
present in the key
dimension, but are not

used as primary keys
in the star schema
22
Dimension Columns
Dimension
 Dimension attributes
Key
 Specify the way in Dimension
attribute
which measures are Key
attribute
viewed: rolled up, attribute
attribute
broken out or attribute
summarized attribute
 Often follow the word
“by” as in “Show me Dimension
Sales by Region and Key

Quarter” attribute
 Frequently referred to attribute
as 'Dimensions' attribute
23
Star Schema Fact Table
 Process measures
 Start by assigning one
fact table per business Fact Table
subject area
 Fact tables store the
process measures (aka fact1
Facts) fact2
 Compared to fact3
dimension tables, fact

tables usually have a
very large number of
rows
24
Fact Table Primary Key
 Every fact table
 Multi-part primary key
added Fact Table
 Made up of foreign key
key
keys referencing key
dimensions fact1
fact2
fact3
25
Fact Table Sparsity
 Sparsity
 Term used to describe the very common situation
where a fact table does not contain a row for
every combination of every dimension table row
for a given time period
 Because fact tables contain a very small

percentage of all possible combinations, they are
said to be "sparsely populated" or "sparse"
26
Fact Table Grain
 Grain
 The level of detail
represented by a row in Fact Table
the fact table
 Must be identified early
 Cause of greatest
confusion during design
process
 Example
 Each row in the fact table
represents the daily item
sales total
27
Designing a Star Schema
 Five initial design steps
 Based on Kimball's six steps
 Start designing in order
 Re-visit and adjust over project life
28
Step One
1. Identify fact table
Start by naming the fact table with the name

of the business subject area
29
Step Two
2. Identify fact table grain
Describe what a row in the fact table

represents - in business terms
30
Step Three
3. Identify dimensions
31
Step Four
4. Select facts
32
Step Five
5. Identify dimensional
attributes
33
Fact Table Details
34
Example Fact Table
Sales Facts
model_key
dealer_key
time_key
revenue
quantity
35
Facts
 Fully additive
 Can be summed across any and all dimensions
 Stored in fact table
 Examples: revenue, quantity
36
Facts
 Semi-additive
 Can be summed across most dimensions but not
all
 Anything that measures a “level”
 Must be careful with ad-hoc reporting
 Often aggregated across the “forbidden
dimension” by averaging
37
Facts
 Non-Additive
 Cannot be summed across any dimension
 All ratios are non-additive
 Break down to fully additive components, store
them in fact table
38
Factless Fact Table
 A fact table with no measures in it
 Nothing to measure...
 …Except the convergence of dimensional
attributes
 Sometimes store a “1” for convenience
 Examples: Attendance, Customer
Assignments, Coverage
39
Dimension Table
Details
40
Example Dimension Tables
Time
Model time_key
model_key year
quarter
brand month
category date
line
model
Dealer
dealer_key
region
state
city
dealer
41
Dimension Tables
 Characteristics
 Hold the dimensional attributes
 Usually have a large number of attributes (“wide”)
 Add flags and indicators that make it easy to
perform specific types of reports
 Have small number of rows in comparison to fact
tables (most of the time)
42
Don’t Normalize Dimensions
 Saves very little space
 Impacts performance
 Can confuse matters when multiple
hierarchies exist
 A star schema with normalized dimensions is
called a "snowflake schema"
 Usually advocated by software vendors whose
product require snowflake for performance
43
Slowly Changing Dimensions
 Dimension source data may change over time
 Relative to fact tables, dimension records
change slowly
 Allows dimensions to have multiple 'profiles'
over time to maintain history
 Each profile is a separate record in a
dimension table
44
Slowly Changing Dimension
Example
 Example: A woman gets married
 Possible changes to customer dimension
• Last Name
• Marriage Status
• Address
• Household Income
 Existing facts need to remain associated with her
single profile
 New facts need to be associated with her married
profile
45
Slowly Changing Dimension
Types
 Three types of slowly changing dimensions
 Type 1
• Updates existing record with modifications
• Does not maintain history
 Type 2
• Adds new record
• Does maintain history
• Maintains old record
 Type 3:
• Keep old and new values in the existing row
• Requires a design change
46
Designing Loads to Handle SCD
 Design and implementation guidelines
 Gather SCD requirements when designing data
mapping and loading
 SCD needs to be defined and implemented at the
dimensional attribute level
 Each column in a dimension table needs to be
identified as a Type 1 or a Type 2 SCD
 If one Type 1 column changes, then all Type 1
columns will be updated
 If one Type 2 column changes, then a new record
will be inserted into the dimension table
47
Designing Loads to Handle SCD
 Design and implementation guidelines
 For large dimension tables, change data capture
techniques may be used to minimize the data
volume
 For smaller dimension tables, compare all OLTP
records with dimension table records
 Balance data volume with change data capture
logic complexities
48
Degenerate Dimensions
 Dimensions with no other place to go
 Stored in the fact table
 Are not facts
 Common examples include invoice numbers
or order numbers
49
Dimensional Design
Process
Project Context
50
Data Mart Development
 Dimensional modeling is a critical part of the
data mart development effort
Development Deployment
Design Phase
Phase Phase
51
Data Mart Development
 Design phase
 Determine requirements and design schema
 Development phase
 Iterative build and feedback
 Deployment phase
 Automate load, document, train users
52
Project Deliverables
 Design  Deployment
 Project definition  Automation
document  Documentation
 Project plan  Training materials
 Schema design
 Mapping document
 Report design
 Development
 Populated data mart
 Load routines
(Sagent “Plans”)
 Query and reporting
53 environment
Project Approach
 The dimensional model is developed during
the design stage
 Scope of the project has already been
determined
Design Phase
Phase Phase
54
Design Stage Activities
 Gather requirements through requirements
workshops
 Develop star schema
 Conduct design review
Design Phase
Phase Phase
55
Gather Requirements
 Requirements definition
 User workshops
 Spreadsheets
 Sample reports
 Source systems analysis

 DBA interviews
 Copybooks
 E/R diagrams
56
Design Deliverables
 Deliverables
 The star schema itself
 Load mapping document
 How these primary components are delivered

will depend on needs and format chosen
 Modeling tools
 Spreadsheets
 Text documents
57
Notation
 No recognized standard
 ER semantics unnecessary
 Clarity is the only characteristic that really
matters
58
Design Naming Standards
 Responsibility of data administration
 Extended to the data warehouse
 Important to start early in the project
 Suggested conventions
 Fact tables
 Dimension tables
 Aggregate tables
 Keys
59
Data Element Definitions
 Clear descriptions
 Facts
 Calculated formulae
 Dimensional attributes
 Multiple meanings/synonymous terms
 Aliases
60
Data Element Instances
 Example of Data
 As it will exist in the warehouse
 After decoding
 Adds to model understanding
 Removes ambiguity/uncertainty
61
Data Element Mapping
 Where is the data coming from

 Source system
 Table
 Column
 Record
 Field
62
Data Transformation
 Changing the data

 Serves as spec for ETL process
 Decodes
 Type conversion
 Conditional logic
 Handling of NULL’s
63
Aggregates Schemas
64
Aggregate Designs
 Aggregates
 Pre-stored fact summaries
 Along one or more dimensions
 The most effective tool for improving performance
 Examples
 Summary of sales by region, by product, by
category
 Monthly sales
65
Aggregate Background
 Aggregate rationale
 Improve end user query performance
 Reduce required CPU cycles
 Powerful cost saving tool
 Restrictions
 Additive facts only
 Must use dimensional design
66
Aggregate Guidelines
 Don’t start with aggregates

 Design and build based on usage
 Sooner or later you'll need to build
aggregates
67
Aggregate Types
 Level field
 Separate fact tables
68
Aggregate Types
 Level field
 Old technique
 Requires “level” attribute in appropriate dimensions
 Aggregates and base-level facts stored in same
table
 Same number of total fact records as separate
table approach
 Drawbacks
 Every query must constrain on the level field
 Possibility of double counting
69
Aggregate Types
 Separate Tables
 Separate fact table for every aggregate
 Separate dimension table for every aggregate
dimension
 Same number of fact records as level field tables
 Advantage
 Removes possibility of double counting
 Schema clarity
 Caveat
 Requires software with aggregate navigation
capability
70
Aggregate Pitfalls
 Sparsity failure
 Term used to describe the result of building too
many aggregate fact that do not summarize
enough rows.
 When Sparsity failure occurs, a relatively small
star schema can grow (in terms of disk size)
thousands of times.
 Sparsity failure = aggregate explosion
71
Aggregate Design Guidelines
 Rule of twenty
 To avoid aggregate explosion
 Make sure each aggregate record summarizes 20
or more lower-level records
 Remember
 Total number of possible fact tables in any given
dimensional model = cartesian product of all
levels in all the dimensions
72
Hierarchies & Aggregate Design
 Hierarchy diagram
 Helps visualize Time
options for building 5 years Year (1)
aggregates
 Adding cardinalities 20 quarters Quarter (4)
insures following the
rule of 20
60 months Month (12)
 Not required to build
initial star schema
1825 days Date (365)
73
Aggregate Navigation
 Description
 Function provided by software layer: Aggregate
Navigator
 Directs user queries to the most favorable
available aggregate
 Transparent to the end user
74
Aggregate Framework
Business View
Designer View
75
Aggregate Deployment
 Incremental
 Based on usage
 Transparent to users
 Typically warehouse DBA responsibility
76
Aggregate Deployment
Build Subject Build Subject Build Subject Build Subject

Area 1 Area 2 Area 3 Area 4
No aggregates No aggregates No aggregates No aggregates
Build Build Build

aggregates aggregates aggregates
for for for
Subject area 1 Subject area 2 Subject area 3
Some re-work required

77
Multiple Fact Tables
78
Multiple Fact Tables
 Different business processes usually require
different fact tables
 There are also several cases where a single
business process will require multiple fact
tables
 Core and custom
 Snapshot and transaction
 Coverage
 Aggregates
79
Different Business Processes
 Different business processes usually require
different fact tables
 In practice, it may be hard to identify what a
“process” is
 Sometimes you can spot different processes
because measures are recorded
 With different dimensions
 At differing grains
80
Different Dimensions or Grain
 Don’t take shortcuts with grain

 The 'not applicable' dimension value
 Using a 'not applicable' row in a dimension
confuses the grain and can introduce reporting
difficulty
81
Different Points in Time
 Sometimes, it is not easy to identify the
discrete business processes
 All measures may have the same
dimensionality or grain
 Different measures are recorded at different
times
 Quantity sold is not recorded at the same time as
quantity shipped
82
Different Timing
 Building a single fact table would require
recording zero or null for measures that are
not applicable at a point in time
 Reports would contain a confusing
combination of zeros, nulls, and absence of
data
83
Identifying Different Processes
 Look at the measures in question

 Sort them into fact tables based on
 Dimensions
 Grain
 Differing timings of events measured
84
Design Tools for Multiple Tables
 Create a set of matrices
 Facts vs dimension
 Facts vs dimensional attributes
 Mark where facts apply to dimensions
 Mark where facts apply to dimensional
attributes
 When facts don't apply, assume separate fact
table
85
Multiple Fact Table Summary
 Different processes need different tables
 Identified with
 Grain
 Dimensionality
 Timing
 Same process may need multiple fact tables
 Heterogeneous attributes
 Coverage
 Snapshot and transaction
 Aggregates
86
Architected Data Marts
87
Data Mart
 Meaning of the term 'data mart' has shifted
over the last several years...
88
Data Mart Architecture 1993
E.T.L. E.T.L.
Query &
Software Software Reporting
Software
Operational Data
Data Marts Analysis Users
Systems Warehouse
89
Data Mart Architecture 1997
E.T.L. Query &

Software Reporting
Operational Analysis Users
Data Marts Software
Systems
90
Architected Data Marts
Query &
E.T.L
Reporting
Software
Software
Data Mart
Operational
Analysis Users
Systems Data Warehouse
91
Data Mart
 Warehouse Subject Area

 Incremental warehouse development
 Centralized architecture
 Not new
 Well - suited to star schemas
92
“Stovepipe” Data Marts
Time
(Day)
Store Sales
Facts
 “Stovepipe” data
marts Product
 Inconsistent and Time

(Day) Warehouse
overlapping data Shipments

Facts
 Difficult and costly to

maintain
Redundant data load Product Month

Warehouse Inventory
Facts
 Can’t drill across
 Integration requires
starting over Product
 Dimensions not
conformed
93
Conformed Dimensions
 Definition
 Dimensions are conformed when they are the
same
-or-
 When one dimension is a strict rollup of another
94
 Same dimensions must:
1. ... have exactly the same set of primary keys

and
2. ... have the same number of records
95
 Rolled up dimension
 When one dimension is a strict rollup of another
 Which means
 Two conformed dimensions can be combined into
a single logical dimension by creating a union of
the attributes
96
 Description
 Shared common dimensions
 Integrates logical design
 Ensures consistency between data marts
 Allows incremental development
 Independent of physical location
 Some re-work may be required
97
 Advantages
 Enables an incremental development approach
 Easier and cheaper to maintain
 Drastically reduces extraction and loading
complexity
 Answers business questions that cross data marts
 Supports both centralized and distributed
architectures
98
Interlocking Star Schemas
Time
Store Dimension
Dimension Sales Shipment
Facts Facts
Product Warehous
Dimension e
Dimension
Inventory
Facts
Month
Dimension
99
Kimball’s Data Warehouse Bus
Sales Shipment Inventory
Facts Facts Facts
100 Store Product Day Warehouse Month

Course Review
 Rationale for dimensional modeling
 Dimensional modeling basics
 Dimensional modeling details
 Fact table details
 Dimension table details
 Design process
 Aggregate schemas
 Multiple fact tables
 Architected data marts
101

Dimensional Modeling

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dimensional Modeling

Uploaded by

Copyright:

Available Formats

Dimensional Design

Dr. Debashis Parida

Captain Standard 5,000 3,800 76%

Thermal 2,400 1,632 68%

 Sales Dollars All

Product Sales and Customer

9 Process-oriented business perspectives

“by” in a business question Captain

 Reliable join paths

 High performance query

understandability holiday flag

Product Sales and Customer

 Store dimension Dimension

unique primary key, key

 Primary keys from

present in the key

dimension, but are not

Sales by Region and Key

dimension tables, fact

 Because fact tables contain a very small

1. Identify fact table

Start by naming the fact table with the name

2. Identify fact table grain

Describe what a row in the fact table

 Source systems analysis

 How these primary components are delivered

 Where is the data coming from

 Changing the data

 Don’t start with aggregates

Build Subject Build Subject Build Subject Build Subject

Build Build Build

Some re-work required

 Don’t take shortcuts with grain

 Look at the measures in question

E.T.L. Query &

 Warehouse Subject Area

 Inconsistent and Time

overlapping data Shipments

 Difficult and costly to

1. ... have exactly the same set of primary keys

100 Store Product Day Warehouse Month

You might also like