You are on page 1of 101

Dimensional Design

Presented by

Dr. Debashis Parida

1
Course Agenda
 Rationale for dimensional modeling
 Dimensional modeling basics
 Dimensional modeling details
 Fact table details
 Dimension table details
 Design process
 Aggregate schemas
 Multiple fact tables
 Architected data marts
2
Rationale for Dimensional
Modeling

3
OLTP Design Characteristics
 Focus of OLTP Design
 Individual data elements
 Data relationships

 Design goals
 Accurately model
business
 Remove redundancy

4
OLTP Design Shortcomings

 Complex
 Unfamiliar to business
people
 Incomplete history
 Slow query
performance

5
Emergence of Dimensional
Model
 Logical modeling technique
 For designing relational database structures
 Addresses OLTP design shortcomings
 For use in analytic systems
 First developed early 1980's
 Packaged goods industry
 Popularized by Ralph Kimball, PhD.
 1996 book: 'The Data Warehouse Toolkit'

6
Dimensional Modeling
Basics

7
Process Measurement
 Measures
 Metrics or indicators by
which people evaluate a
Coffee Maker Fulfillment Report
business process
 Referred to as “Facts” Brand Product Units Sold Units Shipped % Shipped

Captain Standard 5,000 3,800 76%

Examples
Coffee Coffee
 Maker

Thermal 2,400 1,632 68%


 Margin Coffee
Maker

Inventory Amount
Deluxe
 Coffee 2,073 1,658 80%
Maker

 Sales Dollars All


Products 9,473 7,090 75%

 Receivable Dollars
 Return Rate
Facts
8
Perspective Focus

Product Sales and Customer


Development Operations Marketing Services

G/L
category Product, supplier
account
warehouse

9 Process-oriented business perspectives


Process Perspectives
 Dimensions
 The parameters by which
measures are viewed
 Used to break out, filter or
Coffee Maker Fulfillment Report
roll up measures
 Often found after the word Brand Product Units Sold Units Shipped % Shipped

“by” in a business question Captain


Coffee
Standard
Coffee
5,000 3,800 76%

Maker
 Descriptive business terms Thermal 2,400 1,632 68%
Coffee
 Examples Maker

Deluxe
Coffee
Product
2,073 1,658 80%
 Maker

All
 Warehouse Products 9,473 7,090 75%

 Customer
 Supplier

10
Dimensions
Dimensional Model
 Definition
 Logical data model used to represent the
measures and dimensions that pertain to one or
more business subject areas
 Dimensional Model = Star Schema
 Serves as basis for the design of a relational
database schema
 Can easily translate into multi-dimensional
database design if required
 Overcomes OLTP design shortcomings
11
Dimensional Model Advantages

 Understandable
 Systematically
represents history

 Reliable join paths

 High performance query

 Enterprise scalability

12
Schema Simplicity
 Fewer tables Store

 Denormalized Time
Facts
 Consolidated
 Dimensional
 Familiar to users
 Facts go in the fact tables
Product
 Dimensions in dimension
tables
 Increases
understandability
13 Star Schema
Data Familiarity
 Adding business context
 Single source field
ord_date
 Expanded into parts
 Decoded into business
terms
 Add special indicators
and flags Time Dimension

year
 e.g. time dimension
quarter
month
date
 Increases day of the week

understandability holiday flag

14
Representing History
Store
 Time dimension
Time
 Part of every star schema Dimension
Facts
 Marks the date when the year
facts (process quarter
month
measurements) occurred date
 Allows the schema to day of the week
holiday flag Product
easily add and query
data over time
 Especially useful for
performing comparison
queries
15
Time Dimension
Fewer Join Paths
 Star schema joins
 Defined during schema
design - not runtime
 Business people can
easily understand these
relationships
 One-to-many relations
between dimensions and
facts
 Referential integrity
always enforced
16
High Performance Design
 Fewer joins means
less 'expensive'
queries
 Deterministic query
patterns
 Star schema query
optimization
supported by all
major RDBMS
vendors
17
Subject Area Models
Subject
area E/R
models
Manufacturing and Shipping and Sales Order Entry Customer Support
Process Control Inventory and Campaign and Relationship
Management Management Management

Product Sales and Customer


Development Operations Marketing Services

Subject area
dimensional
models
18
Enterprise Models

Enterprise
Scope E/R
model

Enterprise
scope
dimensional
model

19
Dimensional Design
Details

20
Star Schema Dimension Tables
 Dimension tables Dimension

 Store dimension Dimension


values
 Textual content

 Dimension tables

usually referred to
simply as Dimension
'dimensions'
 Spend extra effort to

add dimensional
attributes
21
Dimension Keys
 Synthetic keys Dimension
 Each table assigned a Dimension
key

unique primary key, key


specifically generated
for the data warehouse

 Primary keys from


source systems may be Dimension

present in the key

dimension, but are not


used as primary keys
in the star schema

22
Dimension Columns
Dimension
 Dimension attributes
Key
 Specify the way in Dimension
attribute
which measures are Key
attribute
viewed: rolled up, attribute
attribute
broken out or attribute
summarized attribute
 Often follow the word
“by” as in “Show me Dimension

Sales by Region and Key


Quarter” attribute
 Frequently referred to attribute

as 'Dimensions' attribute

23
Star Schema Fact Table
 Process measures
 Start by assigning one
fact table per business Fact Table
subject area
 Fact tables store the
process measures (aka fact1
Facts) fact2
 Compared to fact3

dimension tables, fact


tables usually have a
very large number of
rows

24
Fact Table Primary Key
 Every fact table
 Multi-part primary key
added Fact Table
 Made up of foreign key
key
keys referencing key
dimensions fact1
fact2
fact3

25
Fact Table Sparsity
 Sparsity
 Term used to describe the very common situation
where a fact table does not contain a row for
every combination of every dimension table row
for a given time period

 Because fact tables contain a very small


percentage of all possible combinations, they are
said to be "sparsely populated" or "sparse"

26
Fact Table Grain
 Grain
 The level of detail
represented by a row in Fact Table
the fact table
 Must be identified early
 Cause of greatest
confusion during design
process
 Example
 Each row in the fact table
represents the daily item
sales total

27
Designing a Star Schema
 Five initial design steps
 Based on Kimball's six steps
 Start designing in order
 Re-visit and adjust over project life

28
Step One

1. Identify fact table

Start by naming the fact table with the name


of the business subject area

29
Step Two

2. Identify fact table grain

Describe what a row in the fact table


represents - in business terms

30
Step Three

3. Identify dimensions

31
Step Four

4. Select facts

32
Step Five

5. Identify dimensional
attributes

33
Fact Table Details

34
Example Fact Table

Sales Facts
model_key
dealer_key
time_key

revenue
quantity

35
Facts
 Fully additive
 Can be summed across any and all dimensions
 Stored in fact table
 Examples: revenue, quantity

36
Facts
 Semi-additive
 Can be summed across most dimensions but not
all
 Anything that measures a “level”
 Must be careful with ad-hoc reporting
 Often aggregated across the “forbidden
dimension” by averaging

37
Facts
 Non-Additive
 Cannot be summed across any dimension
 All ratios are non-additive
 Break down to fully additive components, store
them in fact table

38
Factless Fact Table
 A fact table with no measures in it
 Nothing to measure...
 …Except the convergence of dimensional
attributes
 Sometimes store a “1” for convenience
 Examples: Attendance, Customer
Assignments, Coverage

39
Dimension Table
Details

40
Example Dimension Tables
Time

Model time_key

model_key year
quarter
brand month
category date
line
model
Dealer

dealer_key

region
state
city
dealer
41
Dimension Tables
 Characteristics
 Hold the dimensional attributes
 Usually have a large number of attributes (“wide”)
 Add flags and indicators that make it easy to
perform specific types of reports
 Have small number of rows in comparison to fact
tables (most of the time)

42
Don’t Normalize Dimensions
 Saves very little space
 Impacts performance
 Can confuse matters when multiple
hierarchies exist
 A star schema with normalized dimensions is
called a "snowflake schema"
 Usually advocated by software vendors whose
product require snowflake for performance

43
Slowly Changing Dimensions
 Dimension source data may change over time
 Relative to fact tables, dimension records
change slowly
 Allows dimensions to have multiple 'profiles'
over time to maintain history
 Each profile is a separate record in a
dimension table

44
Slowly Changing Dimension
Example
 Example: A woman gets married
 Possible changes to customer dimension
• Last Name
• Marriage Status
• Address
• Household Income
 Existing facts need to remain associated with her
single profile
 New facts need to be associated with her married
profile

45
Slowly Changing Dimension
Types
 Three types of slowly changing dimensions
 Type 1
• Updates existing record with modifications
• Does not maintain history
 Type 2
• Adds new record
• Does maintain history
• Maintains old record
 Type 3:
• Keep old and new values in the existing row
• Requires a design change
46
Designing Loads to Handle SCD
 Design and implementation guidelines
 Gather SCD requirements when designing data
mapping and loading
 SCD needs to be defined and implemented at the
dimensional attribute level
 Each column in a dimension table needs to be
identified as a Type 1 or a Type 2 SCD
 If one Type 1 column changes, then all Type 1
columns will be updated
 If one Type 2 column changes, then a new record
will be inserted into the dimension table
47
Designing Loads to Handle SCD
 Design and implementation guidelines
 For large dimension tables, change data capture
techniques may be used to minimize the data
volume
 For smaller dimension tables, compare all OLTP
records with dimension table records
 Balance data volume with change data capture
logic complexities

48
Degenerate Dimensions
 Dimensions with no other place to go
 Stored in the fact table
 Are not facts
 Common examples include invoice numbers
or order numbers

49
Dimensional Design
Process
Project Context

50
Data Mart Development
 Dimensional modeling is a critical part of the
data mart development effort

Development Deployment
Design Phase
Phase Phase

51
Data Mart Development
 Design phase
 Determine requirements and design schema
 Development phase
 Iterative build and feedback
 Deployment phase
 Automate load, document, train users

52
Project Deliverables
 Design  Deployment
 Project definition  Automation
document  Documentation
 Project plan  Training materials
 Schema design
 Mapping document
 Report design

 Development
 Populated data mart
 Load routines
(Sagent “Plans”)
 Query and reporting
53 environment
Project Approach
 The dimensional model is developed during
the design stage
 Scope of the project has already been
determined

Development Deployment
Design Phase
Phase Phase

54
Design Stage Activities
 Gather requirements through requirements
workshops
 Develop star schema
 Conduct design review

Development Deployment
Design Phase
Phase Phase

55
Gather Requirements
 Requirements definition
 User workshops
 Spreadsheets
 Sample reports

 Source systems analysis


 DBA interviews
 Copybooks
 E/R diagrams

56
Design Deliverables
 Deliverables
 The star schema itself
 Load mapping document

 How these primary components are delivered


will depend on needs and format chosen
 Modeling tools
 Spreadsheets
 Text documents

57
Notation
 No recognized standard
 ER semantics unnecessary
 Clarity is the only characteristic that really
matters

58
Design Naming Standards
 Responsibility of data administration
 Extended to the data warehouse
 Important to start early in the project

 Suggested conventions
 Fact tables
 Dimension tables
 Aggregate tables
 Keys

59
Data Element Definitions
 Clear descriptions
 Facts
 Calculated formulae
 Dimensional attributes
 Multiple meanings/synonymous terms
 Aliases

60
Data Element Instances

 Example of Data
 As it will exist in the warehouse
 After decoding
 Adds to model understanding
 Removes ambiguity/uncertainty

61
Data Element Mapping

 Where is the data coming from


 Source system
 Table
 Column
 Record
 Field

62
Data Transformation

 Changing the data


 Serves as spec for ETL process
 Decodes
 Type conversion
 Conditional logic
 Handling of NULL’s

63
Aggregates Schemas

64
Aggregate Designs
 Aggregates
 Pre-stored fact summaries
 Along one or more dimensions
 The most effective tool for improving performance

 Examples
 Summary of sales by region, by product, by
category
 Monthly sales

65
Aggregate Background
 Aggregate rationale
 Improve end user query performance
 Reduce required CPU cycles
 Powerful cost saving tool

 Restrictions
 Additive facts only
 Must use dimensional design

66
Aggregate Guidelines

 Don’t start with aggregates


 Design and build based on usage
 Sooner or later you'll need to build
aggregates

67
Aggregate Types

 Level field
 Separate fact tables

68
Aggregate Types
 Level field
 Old technique
 Requires “level” attribute in appropriate dimensions
 Aggregates and base-level facts stored in same
table
 Same number of total fact records as separate
table approach
 Drawbacks
 Every query must constrain on the level field
 Possibility of double counting
69
Aggregate Types
 Separate Tables
 Separate fact table for every aggregate
 Separate dimension table for every aggregate
dimension
 Same number of fact records as level field tables
 Advantage
 Removes possibility of double counting
 Schema clarity
 Caveat
 Requires software with aggregate navigation
capability
70
Aggregate Pitfalls
 Sparsity failure
 Term used to describe the result of building too
many aggregate fact that do not summarize
enough rows.
 When Sparsity failure occurs, a relatively small
star schema can grow (in terms of disk size)
thousands of times.
 Sparsity failure = aggregate explosion

71
Aggregate Design Guidelines
 Rule of twenty
 To avoid aggregate explosion
 Make sure each aggregate record summarizes 20
or more lower-level records
 Remember
 Total number of possible fact tables in any given
dimensional model = cartesian product of all
levels in all the dimensions

72
Hierarchies & Aggregate Design
 Hierarchy diagram
 Helps visualize Time
options for building 5 years Year (1)
aggregates
 Adding cardinalities 20 quarters Quarter (4)
insures following the
rule of 20
60 months Month (12)
 Not required to build
initial star schema
1825 days Date (365)

73
Aggregate Navigation
 Description
 Function provided by software layer: Aggregate
Navigator
 Directs user queries to the most favorable
available aggregate
 Transparent to the end user

74
Aggregate Framework

Business View

Designer View

75
Aggregate Deployment

 Incremental
 Based on usage
 Transparent to users
 Typically warehouse DBA responsibility

76
Aggregate Deployment

Build Subject Build Subject Build Subject Build Subject


Area 1 Area 2 Area 3 Area 4
No aggregates No aggregates No aggregates No aggregates

Build Build Build


aggregates aggregates aggregates
for for for
Subject area 1 Subject area 2 Subject area 3

Some re-work required


77
Multiple Fact Tables

78
Multiple Fact Tables
 Different business processes usually require
different fact tables
 There are also several cases where a single
business process will require multiple fact
tables
 Core and custom
 Snapshot and transaction
 Coverage
 Aggregates

79
Different Business Processes
 Different business processes usually require
different fact tables
 In practice, it may be hard to identify what a
“process” is
 Sometimes you can spot different processes
because measures are recorded
 With different dimensions
 At differing grains

80
Different Dimensions or Grain

 Don’t take shortcuts with grain


 The 'not applicable' dimension value
 Using a 'not applicable' row in a dimension
confuses the grain and can introduce reporting
difficulty

81
Different Points in Time
 Sometimes, it is not easy to identify the
discrete business processes
 All measures may have the same
dimensionality or grain
 Different measures are recorded at different
times
 Quantity sold is not recorded at the same time as
quantity shipped

82
Different Timing
 Building a single fact table would require
recording zero or null for measures that are
not applicable at a point in time
 Reports would contain a confusing
combination of zeros, nulls, and absence of
data

83
Identifying Different Processes

 Look at the measures in question


 Sort them into fact tables based on
 Dimensions
 Grain
 Differing timings of events measured

84
Design Tools for Multiple Tables
 Create a set of matrices
 Facts vs dimension
 Facts vs dimensional attributes
 Mark where facts apply to dimensions
 Mark where facts apply to dimensional
attributes
 When facts don't apply, assume separate fact
table

85
Multiple Fact Table Summary
 Different processes need different tables
 Identified with
 Grain
 Dimensionality
 Timing
 Same process may need multiple fact tables
 Heterogeneous attributes
 Coverage
 Snapshot and transaction
 Aggregates
86
Architected Data Marts

87
Data Mart
 Meaning of the term 'data mart' has shifted
over the last several years...

88
Data Mart Architecture 1993

E.T.L. E.T.L.
Query &
Software Software Reporting
Software

Operational Data
Data Marts Analysis Users
Systems Warehouse

89
Data Mart Architecture 1997

E.T.L. Query &


Software Reporting
Operational Analysis Users
Data Marts Software
Systems
90
Architected Data Marts

Query &
E.T.L
Reporting
Software
Software

Data Mart
Operational
Analysis Users
Systems Data Warehouse
91
Data Mart

 Warehouse Subject Area


 Incremental warehouse development
 Centralized architecture
 Not new
 Well - suited to star schemas

92
“Stovepipe” Data Marts
Time
(Day)
Store Sales
Facts

 “Stovepipe” data
marts Product

 Inconsistent and Time


(Day) Warehouse

overlapping data Shipments


Facts

 Difficult and costly to


maintain
Redundant data load Product Month

Warehouse Inventory
Facts
 Can’t drill across
 Integration requires
starting over Product

 Dimensions not
conformed
93
Conformed Dimensions

 Definition
 Dimensions are conformed when they are the
same
-or-
 When one dimension is a strict rollup of another

94
Conformed Dimensions
 Same dimensions must:

1. ... have exactly the same set of primary keys


and
2. ... have the same number of records

95
Conformed Dimensions
 Rolled up dimension
 When one dimension is a strict rollup of another

 Which means
 Two conformed dimensions can be combined into
a single logical dimension by creating a union of
the attributes

96
Conformed Dimensions
 Description
 Shared common dimensions
 Integrates logical design
 Ensures consistency between data marts
 Allows incremental development
 Independent of physical location
 Some re-work may be required

97
Conformed Dimensions
 Advantages
 Enables an incremental development approach
 Easier and cheaper to maintain
 Drastically reduces extraction and loading
complexity
 Answers business questions that cross data marts
 Supports both centralized and distributed
architectures

98
Interlocking Star Schemas
Time
Store Dimension
Dimension Sales Shipment
Facts Facts

Product Warehous
Dimension e
Dimension

Inventory
Facts
Month
Dimension

99
Conformed Dimensions
Kimball’s Data Warehouse Bus
Sales Shipment Inventory
Facts Facts Facts

100 Store Product Day Warehouse Month


Course Review
 Rationale for dimensional modeling
 Dimensional modeling basics
 Dimensional modeling details
 Fact table details
 Dimension table details
 Design process
 Aggregate schemas
 Multiple fact tables
 Architected data marts
101

You might also like