You are on page 1of 79

IBM Software Group

Seminario: lEcosistema DataWarehouse Trend Tecnologici , Best Practices, Esperienze di progetto


Fabrizio Napolitano, IBM Data Warehouse Architect Roma, 09-17 Aprile 2010

2009 IBM Corporation

Agenda
Introduzione
in che ecosistema si posiziona il Data Warehouse? Trend attuali del settore

Il Ciclo di vita di un progetto di Data Warehouse best practices e principali errori da evitare: un caso di studio Modellazione del Data Warehouse problematiche attuali:
Consolidamento ambienti come utilizzare modelli logici settoriali per semplificare il processo Implementazione decentrata di un DWH consolidato, un caso di studio

Trend Tecnologici: L'era delle DWA - Data Warehouse Appliances Integrazione dei Dati (17/04/2010) :
Metodologie e Best Practices per la fase di sviluppo dei flussi di ETL Limportanza della gestione dei Metadati

2009 IBM Corporation

What Do Companies Need for Business Intelligence & Analytics?


Strategy and Implementation Services BI & Performance Management Tools Metadata
3

Data Governance

Industry Models

Master Data Management

Data Warehouse

Information Integration Servers and Storage


2009 IBM Corporation

Challenges
Integration costs & skills What Do Companies Need Metadata synchronization for Business Intelligence & Analytics? Performance optimization Administration costs & skills Maintenance costs & skills Upgrade synchronization Services Ongoing integration certification

Strategy and Implementation

BI & Performance Management Tools Metadata


4

Data Governance

Industry Models

Master Data Management

Data Warehouse

Information Integration Servers and Storage


2009 IBM Corporation

Whats Happening Out There? (Trends)


1. Many mature warehouses are being re-architected.

According to Gartner Group almost 1/3 of data warehouse projects will be do over. Whats behind this trend?
Lack of ROI
A Gartner Group study show that only 40 percent of enterprises measure ROI for their data warehousing initiatives How do you know if you succeeded if you do not measure it?

The big push to consolidation of data


Currently cross LOB analysis is one of the hottest subject in BI

Focus is shifting from performance to changing business needs


The warehouse that is architected only for performance may not react well to changes. Focus on agility and reuse not just pure performance

2009 IBM Corporation

More on Trends
2. Cost and delivery pressure (anyone not have that?).

The need for data to answer a specific business need in a compressed time period causes (more and more) data proliferation Costs!!! DW operational costs appear to outweigh benefits and the pressure to reduce costs is severe to most DW organizations (remember the ROI problem?)

3.

Warehouses have become more active and critical at the same time!.

Warehouses are not only becoming more active, but they are also becoming more critical (did you plan for that ?) This drives the need for a completely different architecture and things like HA and DR. Batch windows shrinking, queries becoming more complex, need for more sophisticated analytics (all at once!)

2009 IBM Corporation

and more
4. In comes the Appliance.

Isnt appliance just a cool word for having a prescribed solution that works and lessens the time to market?

Doing it yourself is so out


..you could build your own appliance. It would probably take three years, you would need some highly skilled engineers who you have to pay at a commensurate rate but, yes, you could do that. You could also build your own ERP system that had all the features of SAP in it, but just because you could doesnt mean that it would make sense.
> Phillip Howard, Bloor Research

Appliance = reduced time to market + built for data warehousing + hard to ignore!

2009 IBM Corporation

Data Design Trends


1. Back to the single source of truth aka Enterprise BI, Enterprise Intelligence.

Data that is used is data that is exposed Compliance laws Need for more detailed data 2. Right-time replaces real time

Match need to application 3. Dont just load your data- MASTER your data!
Ye Shall master thy Data

Reuse is key

2009 IBM Corporation

IBM Software Group

Seminario: lEcosistema DataWarehouse


Il Ciclo di vita di un progetto di Data Warehouse , best practices e principali errori da evitare
Fabrizio Napolitano, IBM Data Warehouse Architect Roma, 09 Aprile 2010

2009 IBM Corporation

The Top 10 Best Practices for a successful Data Warehouse


no- thats not a typo they are all number 1 Have a business based strategy and get sponsorship Market the warehouse internally (early and often) Have the right organization to help you manage the warehouse Data Governance and Stewardships Build Towards Consolidation Balance increasing costs with increasing value Have a solid data architecture Architect for change, not only performance Have a disaster recovery plan Never neglect information quality

Gathered from customers and analyst interviews

10

2009 IBM Corporation

Datawarehouse Project Most common Mistakes


The Anti-Architect - Kimball
Mistake 1: Rely on past consultants or other IT staff to tell you the data warehouse requirements Mistake 2: Live with the assumption that the administrators of the major OLTP source systems of the enterprise are too busy Mistake 3: After the data warehouse has been rolled out, set up a planning meeting to discuss ongoing communications with the end users, if the budget allows Mistake 4: Make sure all the data warehouse support personnel have nice offices in the IT building Mistake 5: Declare end-user success at the end of the first training class Mistake 6: Assume that sales, operations, and finance end users will naturally gravitate to the good data and will develop their own killer apps Mistake 7: Make sure that before the data warehouse is implemented you write a comprehensive plan that describes all possible data assets of your enterprise and all the intended uses of information
11 2009 IBM Corporation

Datawarehouse Project Most common Mistakes


The Anti-Architect - Kimball
Mistake 8: Don't bother the senior executives of your organization with the data warehouse until you have it up and running and can point to a significant success Mistake 9: Encourage the end users to give you continuous feedback throughout the development cycle Mistake 10: Agree to deliver a high-profile customer-centric data mart as your first deliverable Mistake 11: Define your professional role as the authority on appropriate use of the data warehouse Mistake 12: Collect all the data in a physically centralized data warehouse before interviewing any end users or releasing any data marts

12

2009 IBM Corporation

Business Sponsorship Can Save Your Warehouse

One of the most common, yet potentially fatal disorders involves the sponsorship of the DW/BI environment. A business sponsor disorder is often the contributing factor to data warehouse stagnation.
Margy Ross , Ralph Kimball

Business Sponsor

13

2009 IBM Corporation

Datawarehouse Project: A Telco case Study


The Project scope
2006
Productgericht

2007
Input voor klantinteractie

2010
Input voor klantinteractie

CRM Data-analyse Prototypes Prototypes Geen klantbeeld Internet Mobiel Vast TV

CRM Data-analyse Internet Mobiel Vast TV C.C. DWH Prototyping

CRM Data-analyse Customer Centric DWH

Bestaande bronnen

Bestaande bronnen

Nieuwe bronnen (VaMo)

CRM Foundation/ One Billing

Geen klantbeeld
14

Quick vamo
2009 IBM Corporation

Eind 2007 (i.p.v. 2009) eerste formele klantbeeld als input voor klantinteractie

Datawarehouse Project: A Telco case study


The Issue
BI / DWH Project Sponsored by CRM director (IT) Seen as Technical Enabler -> not Business Driven IT Organization changes impact heavily the project Many IT DWH Projects in different department
Not all IT Manager sponsoring / supporting the new DWH Project

Lack of overview of status, deliverables, interdependency of all CRM-data related projects and insight in support of project objectives to objectives of CLM and ZM Klantbeeld. Limited insight if information requirements as outlined by business are covered in running and future CRM data-related projects, how and when. No matching CRM-data model (compliant with SID/Siebel for ZM Klantbeeld and therefore no sufficient guidance from desired Klantbeeld towards feasible and coherent IT projects. Limited business involvement in running BI Program and CRM-data related projects. Limited alignment of data-related efforts between demand (business) versus supply (IT NL). Fragmented processes, unclear ownership, roles and responsibilities related to CRMdata projects and maintenance. Limited steering on CRM data-related projects possible
15 2009 IBM Corporation

Background
Within xx, several projects have recently been started by business and IT that should improve the quality and availability of CRM data for analytical and operational CRM activities and contribute to the 360view of the customer. With regard to these projects, the following issues are perceived by KPN: Lack of overview of status, deliverables, interdependency of all CRM-data related projects and insight in support of project objectives to objectives of CLM and ZM Klantbeeld. Limited insight if information requirements as outlined in ZM Klantbeeld are covered in running and future CRM data-related projects, how and when. No matching CRM-data model (compliant with SID/Siebel for ZM Klantbeeld and therefore no sufficient guidance from desired Klantbeeld towards feasible and coherent IT projects. Limited business involvement in running BI Program and CRM-data related projects. Limited alignment of data-related efforts between demand (business) versus supply (IT NL). Fragmented processes, unclear ownership, roles and responsibilities related to CRM-data projects and maintenance. Limited steering on CRM data-related projects possible. In order to start solving these issues, KPN wants to improve data governance for KPN ZM CRM data related projects. As a first step, KPN ZM wants to start a project to agree on a roadmap on the delivery of ZM Klantbeeld information requirements, to define a data architecture and to define, implement and pilot a pragmatic governance framework around the running and future CRM-data related projects.
16 2009 IBM Corporation

Datawarehouse Project: A case Study


The Parallel Project
Improve management of a trusted and controlled quantity and quality of information within xxx through: definition of a high level CRM klantgegevensmodel based on the ZM Klantbeeld information requirements; definition of a roadmap, agreed by ZM (Klantbeeld) and IT, of the delivery of existing business information requirements as defined in the ZM Klantbeeld; defining a basic governance framework which improves the efficiency, effectiveness and control of projects that have an impact on Analytical and Operational CRM-data aspects enable active involvement and ownership from business and IT (e.g. data stewards, owner per subject area)

17

2009 IBM Corporation

Datawarehouse Project: A case Study


Lessons Learned what you should do

Business

IT
18

2009 IBM Corporation

Datawarehouse Project: A case Study


Lessons Learned how could you do it
Align with Business strategy Communicate to the right level Includes the set up of a Business Glossary Data Governance BI Governance Use a DWH tailored Project Lifecycle methodology

19

2009 IBM Corporation

Its All About the Value, NOT the Technology

In the end, data warehouse implementation shouldnt be the focus; its a means. The goal is to deliver a solution to support an immediate business need.

Baseline Consulting

Hitting the target Means expressing Business value

20

2009 IBM Corporation

How do I best align to the business strategy


First, keep asking yourself the question: why does it matter to the business? The business strategy for the warehouse can be found everywhere

What is the company mission and how can the warehouse play a role in supporting that? (Its on your wall, on your website, on your annual report)
Create a business advisory committee for the warehouse Who on the committee is the most vocal and passionate? Look for more than one sponsor for true success in the enterprise (yes have a sponsor redundancy program!)

Technology

Business Need

21

2009 IBM Corporation

Sample DW Program Structure


Executive Sponsorship

Executive Sponsorship Data Stewardship Steering Committee

Data Warehouse Program Management And Oversight

Program Management Office (PMO)

DW Program Manager
DW Technical Architect Data Quality Coordinator Metadata Coordinator

Change Control Requirements Coordinator Resource Coordinator

DW Development & Maintenance

Project Teams
Project Manager Business Analysts Source Analysts Data Modeler ETL Developer DBA BI Tool Developer Testing Coordinator Implementation Coordinator

DW Maintenance
Metadata Management Source Extract Support ETL Support Reporting/Analytic Support DBAs Data Modelers

Tools Support
ETL Specialist Query & Reporting Specialist OLAP Specialist Data Quality Specialist Data Mining Specialist

22

2009 IBM Corporation

Focus Communication to the Business Users


Have a mission statement for the warehouse Communicate milestones that map to that mission Make the warehouse a raving success in the business. Do not get caught up in communicating the wrong milestones DO Communicate what business questions can be answered, problems resolved and opportunities identified DONT over communicate hardware upgrades, OS changes, new investments that do not bring new value
DW Stats

I think I speak for everyone when I say - what in Gods name are you talking about????
23 2009 IBM Corporation

Communicate again
Communication early, Communicate often!!!

How often do you talk about what the warehouse is doing today with the executives?
Push out a scorecard monthly

How many business questions did the warehouse answer last month? A

query is a BUSINESS Question!!!!!!


Use the warehouse to establish leadership externally Know your warehouse stats like your childrens birthdays!

Example (JPMC)

775 end users, 276 Source systems,8729 attributes 15 TB database growing to 20 TB over next 18 months 28,000 Batch ETL jobs/month 2,000 5000 Queries / Day

24

2009 IBM Corporation

A Model for BI Governance


Align and Manage:
Processes and people that manage the alignment of BI resources to BI strategies. Management of interdependent efforts and initiatives.

Data Governance:
Management of enterprise data assets to increase the use and trust of the data.

Process Governance:

Data

Process Align and Manage

Business oversight of the decisions to align planning, measurement, and analysis efforts across the organization.

Organizational Governance:
Processes, people and structure that enable the ongoing management and control of BI initiatives.

Technology Governance:

Organization

Technology

Ensuring that the right portfolio of tools and technologies are in the place to deliver the right BI capabilities to the business.

25

2009 IBM Corporation

Components of Integrated BI Governance


Data Process
Enterprise Data Management Business Performance Data Stewardship Management Data Quality Management Integrated Planning Data Integration Forecasting & Budgeting Management (Defining a KPI Rationalization Single Version of the Truth) Decision Making Processes Meta Data Management Alignment and Management BI Steering Committee, BI Guiding Principles, Strategy & Roadmap Governance, BI Program Organization Technology Management (PMO) Organization Structure Tool and Technology Constructs CoC, PMO Standards Work Group Design Common reference and Skills & Behavior solution architecture Development Training Job Design Roles and Responsibilities Accountability & Decision making
26 2009 IBM Corporation

Datawarehouse Project Life Cycle


(The Kimball Lifecycle diagram )

27

2009 IBM Corporation

The Business Intelligence Method


Our BI method embeds key themes throughout the lifecycle and is tightly linked with our BI Reference Architecture
BI Strategy and Planning Solution Outline

Access Layer

Analytics Layer
ro ac M

Data Repository Layer


ic M ro

Data Integration Layer


BI Reference Architecture

Bu ild

Data Quality Metadata Technical Infrastructure Program Management & Organizational Change Quality Assurance Security & Privacy
28 2009 IBM Corporation Iterative Incremental

o pl De y

Key Themes

The Business Intelligence Method


Based on an industry leading set of phases, activities, and tasks
Define Infrastructure Requirements Define Organization Create Logical Create Logical Data Repositories Analytics Design Design Create Physical Create Physical Data Repositories Analytics Design Design Perform Build/Extend Data Repositories Analytics Build Components Perform Acceptance Testing

Review Client Business & IT Environment

Review Client Environment

Create Logical Access Design

Create Physical Access Design

Build/Test Access Components

BI Strategy and Planning

Identify Solution Areas Outline Solution Requirements

Create Logical Data Repositories Design

Create Physical Data Repositories Design

Perform Data Repositories Build

Setup Production Environment

Macro Design

Micro Design

Define Technical Solution Strategy

Outline Solution Strategy Determine Analytics Requirements Determine Data Repository Requirements Determine Data Integration Requirements Assess Business Impact

Create Logical Data Integration Design

Create Physical Data Integration Design

Build Cycle

Build Data Integration Code

Deployment

Define Business Solution Strategy

Solution Outline

Deploy Client Support

Outline Architecture Model Assess Infrastructure Impact Confirm BI Strategy and Planning

Design Architecture Model

Refine Architecture Model

Prepare for Testing

Design Solution Plans

Perform Static Testing

Perform Development Testing Perform System Testing

Cutover to Production

Design Test Specifications

Define Training and User Support

Note: For clarity, all


activities are not shown
29
Confirm Solution Outline

Build Development Environment

Plan Development

Plan Deployment

Implementation Checkpoint

2009 IBM Corporation

IBM Software Group

Seminario: lEcosistema DataWarehouse


Modellazione del Data Warehouse problematiche attuali
Fabrizio Napolitano, IBM Data Warehouse Architect Roma, 09 Aprile 2010

2009 IBM Corporation

Achieving the Goal- One Source of Truth for All


Despite their best intentions, CIOs are struggling to deliver consistent data that provides a single view across the enterprise. CIOs who seek this so-called single version of the truth must feel like they are playing an endless game of Whack-a-Mole every time they stamp out a renegade analytic silo, another pops up elsewhere. TDWI Research Report

Whack O MARTS

31

2009 IBM Corporation

Current Issues for a Data Warehouse Architect


Data Warehouse Consolidation Merges and Acquisitions Data Mart Consolidation One Version of the truth Reduce complexity from Data Mart Explosion Data Warehouse Standardization Multiple line of businesses Global corporation

32

2009 IBM Corporation

IBM Industry Models Introduction

The True Cost of Inflexible Data Models


Most data warehouse logical data models tend to be optimized (i.e., biased) towards: 1. Source systems
difficult to use for integrating data from any other application

2. Current application query patterns (Business requirements)


evolve and become more sophisticated over time exceed initial design assumptions

Failure of the solution to keep pace with the business Diminishing business value Much of the effort involved in modifying a traditionally designed data warehouse is associated with rewriting the DDL, ETL processes and SQL, for creating, loading and querying the data warehouse respectively

33

IBM Industry Models Introduction

Case Study: North Europe Telco with many companies around the World
1DW implementation

Develop DWH once using a reference model in a first country pilot Reuse many time to deploy on the other countries

TZ implementation DRC implementation xx implementation TZ tests DRC tests xx tests

Realignment on Millicom unified model Limited BI solution experience


34

IBM Industry Models Introduction

According to best practices TDWM should be fed and separated from operational systems via staging area Telco Operational systems
Switches & Gateways
On-net DLD On-net IDD Off-net DLD Off-net IDD
Involved Party All Inbound & Outbound

Telco Data Warehouse Model

Originating Service Providers Terminating Service Providers Subscribers and Inbound Roamers Postpaid Subscriptions Prepay Cards Interconnect Agreements Service Level Agreements Pricing Agreements

Unrated CDRS

The invoice table will store billing history for each call

STAGING AREA

Arrangement

Rating & Mediation

Interconnect Settlement

Invoice Header Invoice Detail Billing Rate

Service Usage Usage Component

Call Detail Records (1 for each call)

Rateable and Chargeablle

Rated CDRs

Each Service Usage has multiple components for each rate basis Applicable Internal and external rates (i.e., billing rates, interconnect rates, network costs, VAT, etc.) Circuits Switches Gateways

Wireless

Charging Rate

Other

Billing

Network Component Billable (above entitled amount)

Invoice Detail

Interconnecting Network Network

Service Provider (IP)


35

Interconnecting Service Provider

IBM Industry Models Introduction

while the Paktel implementation already deviated from these best practices in favour of country a specific implementation Telco Operational systems
Switches & Gateways
On-net DLD On-net IDD Off-net DLD Off-net IDD
All Inbound & Outbound

Telco Data Warehouse Model


Paktel did not implement staging area Data modified in Data Warehouse

Involved Party

Originating Service Providers Terminating Service Providers Subscribers and Inbound Roamers Postpaid Subscriptions Prepay Cards Interconnect Agreements Service Level Agreements Pricing Agreements

Unrated CDRS
??

The invoice table will store billing history for each call

Arrangement

Rating & Mediation

Interconnect Settlement

Invoice Header Invoice Detail Billing Rate

Service Usage Usage Component

Call Detail Records (1 for each call)

Rateable and Chargeablle

Rated CDRs

??

Each Service Usage has multiple components for each rate basis Applicable Internal and external rates (i.e., billing rates, interconnect rates, network costs, VAT, etc.) Circuits Switches Gateways

Wireless

Charging Rate

Other

Billing

??

MSC CDR Network Component

Billable (above entitled amount)

Invoice Detail

No Interconnect System in Pakistan Incoming calls stored in a new local table not based on TDWM Table layout based on source MSC_CDR layout, not TDWM Rating logic replicated in the Data Warehouse Analysis area and reports are changed accordingly

Interconnecting Network Network

Service Provider (IP)

Interconnecting Service Provider

36

IBM Industry Models Introduction

with further deviations for TZ rather than realigning on TDWM Telco Operational systems
Switches & Gateways
On-net DLD On-net IDD Off-net DLD Off-net IDD
All Inbound & Outbound

Telco Data Warehouse Model


Tanzania did not implement staging area Data modified in Data Warehouse

Involved Party

Originating Service Providers Terminating Service Providers Subscribers and Inbound Roamers Postpaid Subscriptions Prepay Cards Interconnect Agreements Service Level Agreements Pricing Agreements

Unrated CDRS
??

The invoice table will store billing history for each call

Arrangement

Rating & Mediation

Interconnect Settlement

Invoice Header Invoice Detail Billing Rate

Service Usage Usage Component

Call Detail Records (1 for each call)

Rateable and Chargeablle

Rated CDRs

??

Each Service Usage has multiple components for each rate basis Applicable Internal and external rates (i.e., billing rates, interconnect rates, network costs, VAT, etc.) Circuits Switches Gateways

Wireless

Charging Rate

Other

Billing

??

NUM_CALL Network Component

Billable (above entitled amount)

Invoice Detail

Interconnect System in Tanzania Incoming calls stored in a new table based on Paktel approach, not TDWM Table layout based on source system layout, not TDWM Analysis area and reports are changed accordingly

Interconnecting Network Network

Service Provider (IP)

Interconnecting Service Provider

37

IBM Industry Models Introduction

assuming that all reports and models are identical for all countries, only the source data and ETL1/2 processing being potentially different
Country
ETL1/2
ETL Sources Country A ETL ETL ETL ETL BO Universe

Local

Corporate

ETL REPLICA of MIC Corporate DW Solution

Identical Different

MIC Corporate DW Solution


ETL ETL ETL ETL BO Universe

ETL

ETL

Sources Country B

Different

Source Independent Staging Area

System Of Record

Summary Area

DataMarts

Local Business Reports

Identical

ETL ETL ETL ETL ETL ETL REPLICA of MIC Corporate DW Solution Sources Country Z BO Universe

Corporate Business Reports

Sources can be different by country Country specific development is limited to ETL1/2

All reports and models are identical for all countries All other components, including ETL3/4, are exactly identical to xxx Corporate DW Solution

38

IBM Industry Models Introduction

The Importance of Flexible, Generic Models

Trade off :
information model optimization for normalization and generics

Improve:
longer term model manageability extensibility synchronization between source and target applications and business processes
Layer 2 TDW Star Schemas BSTs (Denormalized) (Denormalized) Optimizaed Summaries TDW Summary Area TDW3NF Detail Data System of Record

Query Performance increases with the Aggregation Level

Layer 1

Load Performance increases with the Normalization Level.

39

IBM Industry Models Introduction

A Simplistic Example of Generic Modeling Models for Complex Data


Company

Simple

Division

To extend traditional data models for new concepts requires new tables to be created, with all the associated DDL, ETL and SQL code to create, load and access them. Generic models are much more flexible.
Inter-subject area associative tables

Time-variant, perspectivebased hierarchies

Department

Party

Customer Invoice Billing Account Payment Service Instance

Product Group

Arrangement

Event

Condition

Product

Product

Service

Rate Group

User

Usage

Rate

New requirements are added as DATA, not as structural changes

40

IBM Industry Models Introduction

xDW Data Models three interlinked models

xSDM
Classification model for defining business meaning across all models, applications and mapping databases

mapping

xBST

mapping

Business Solution Templates


Logical Measure/Dimension Model for defining user information requirements

Data Warehouse Model


Logical E-R Model for designing central data warehouse

xDWM

41

IBM Industry Models Introduction

xDW Architecture
Logical Design
Data Warehouse model for specific industry provides full enterprise data warehouse blueprint Enterprise DW design can be generated over a series of manageable phases

Overall corporate data classification model with common language & terms Data Mart templates enable fast accurate requirements gathering Data Mart DB design can be generated from Templates

Mapping between BSTs and DW Model enable rapid scoping

Sources
Billings

Essbase

Business Applications
Rel. Mgt Usage

Enterprise Data Warehouse


ROLAP

Physical Design

Front Office & Apps Accounting Systems

Staging Area

System Of Record Classified Sources

Summary Analysis Feedback


OLAP Server *

Profitability Ops & Fin

Relational

Other

CIF Market Data Other Sources

ETL/Messaging

Data Analysis & Reporting


Data Mining Predictive Modeling

Data Mart DB Structures

Warehouse Mgmt & Admin Metadata Mgmt & Metadata Repository

Mgt Reporting

42

IBM Industry Models Introduction

Using TDW for Data Warehouse Model Engineering


Business Definition Synchronization Scoped TSDM TSDM
Map

Scoped TBSTs TBSTs


Star Schema Star Schema Engineering Engineering

EDW ERD EDW ERD Engineering Engineering

Legacy OSS/BSS OE/OM OE/OM OE/OM OE/OM Billing Billing OE/OM OE/OM OE/OM OE/OM Campaign Mgmt Campaign Mgmt OE/OM OE/OM OE/OM OE/OM CRMS CRMS OE/OM OE/OM OE/OM OE/OM Retail POS Retail POS OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM Provisioning Provisioning OE/OM OE/OM OE/OM OE/OM Network Ops Network Ops OE/OM OE/OM A/P,OE/OM A/R, G/L A/P,OE/OM A/R, G/L OE/OM OE/OM OE/OM OE/OM Collections Collections

Scoped TDWM TDWM


ETL

Data Data Warehouse Warehouse


Clas sificatio n Clas sificatio n Clas sificatio n Clas sificatio n Entity N ame Entity N ame Field-1 (P K) F ield-1 (P K) Field-2 (F K) F ield-2 (F K) Field-3 F ield-3 Field-4 F ield-4 Field-5 F ield-5 Field-6 F ield-6 Field-8 F ield-8 Field-9 F ield-9 Field-10 F ield-10 Classificatio n Entity Na me Field-1 (P K) Field-2 (P K) Entity Na me Entity Na me Field-1 (P K) Field-1 (P K) Field-2 (F K) Field-2 (F K) Field-3 (F K) Field-3 (F K) Field- 44 FieldField- 55 FieldField- 66 FieldC las sification Class ification Class ification Entity Name Entity Name FF ield-1(PK ) ) ield-1 (PK FF ield-2 ield-2 FF ield-3 ield-3 V ery Lo ng FF ield-4 V ery Lo ng ield-4 V ery Lo ng FF ield-5 V ery Lo ng ield-5 Entity Name Entity Name FF ield-1(PK ) ) ield-1 (PK FF ield-2(FK ) ) ield-2 (FK FF ield-3 ield-3 FF ield-4 ield-4 FF ield-5 ield-5 FF ield-6 ield-6 Entity Name Entity Name FF ield-1(PK ) ) ield-1 (PK FF ield-2(FK ) ) ield-2 (FK FF ield-3(FK ) ) ield-3 (FK FF ield-4 ield-4 FF ield-5 ield-5 FF ield-6 ield-6 Entity Na me Field-1 (P K) Field-2 (F K) Field-3 (F K) Entity Name Entity Name Field- 11 (PK ) Field- (PK ) Field- 22 (FK ) Field- (FK ) Field- 33 FieldField- 44 FieldField- 55 FieldField- 66 FieldClassificatio n Classificatio n Entity Name Entity Name Field- 11 (PK ) Field- (PK ) Field- 22 (FK ) Field- (FK ) Field- 33 FieldField- 44 FieldField- 55 FieldField- 66 FieldVer yy Long F ield-7 Ver Long F ield-7 Ver yy Long F ield-8 Ver Long F ield-8 Entity Na me Entity Na me Field -1 (P K) Field -1 (P K) Field -2 Field -2 Field -3 Field -3 C las sification

MOLAP
Aggregations

ROLAP

Entity Na me Field-1 (P K) Field-2 (P K)

Classificatio n Clas sification

Scores & Propensities

Data Mining

Extract Vectors

Metadata Synchronization (e.g., Informatica, DataStage)


43
43

2006 IBM Corporation

Best Practices: Using Information Templates for Data Warehouse Consolidation


OSS/BSS OE/OM OE/OM Billing Billing Campaign Mgmt. Campaign Mgmt. CRMS CRMS Retail POS Retail POS
Bespoke ETL ETL

OSS/BSS General Ledger General Ledger Billing Billing

DW #1 DW #1 (Marketing) (Marketing)

DW #2 DW #2 (Finance) (Finance)
Profiling

Bespoke ETL

A/P, A/R A/P, A/R Collections Collections Retail POS Retail POS

Scoped TSDM TSDM


XML ER Model Engineering
Classificatio n Entity Na me Entity Na me Field- 1 (PK) Field -1 (PK) Field- 2 (FK) Field -2 (FK) Field- 3 (FK) Field -3 (FK) Field-4 4 Fie ldField-5 5 Fie ldField-6 Fie ld- 6

Classification Classification

Change Data Capture and Incremental ETL

Scoped TDWM TDWM

Classification Classification

Entity Na me Entity Na me Fie ld- 11(PK) Field- (PK) Fie ld- 22(FK) Field- (FK) Fie ld- 3 Field- 3 Fie ld- 4 Field- 4 Fie ld- 55 FieldFie ld- 66 FieldFie ld- 88 FieldFie ld- 99 FieldFie ld- 10 Field- 10

Entity Na me Fie ld- 1 (PK) Fie ld- 2 (PK)

Classificatio n

Classification

Classification

Entity Name Entity Na me Fie ld- 11 PK) Field- ( (PK) Fie ld- 22 FieldFie ld- 33 FieldVery Lo ng F Field-4 Very Lo ng ield -4 Very Lo ng F ield -5 Very Lo ng Field-5

Entity Name Entity Name Field- 1 (PK) Field -1 (PK) Field- 2 (FK) Field -2 (FK) Field- 3 Field -3 Field- 4 Field -4 Field- 5 Field -5 Field- 6 Field -6

Entity Name Entity Name Field -1 (PK) Fie ld- 1 ( PK) Fieldld- 2 ( FK) Fie -2 (FK) Fieldld- 3 ( FK) Fie -3 (FK) Fie ld- 44 FieldFie ld- 55 FieldFie ld- 66 Field-

Entity Name Field-1 (PK) Field-2 (FK) Field-3 (FK)

Entit y Na me Entity Name Field -1 (PK) Field-1 (PK) Field -2 (FK) Field-2 (FK) Field -3 Field-3 Field -4 Field-4 Field -5 Field-5 Field -6 Field-6

Classif icatio n

Classif icatio n

Allfusion ERWin

Entity Na me Fie ld- 1 (PK) Fie ld- 2 (PK)

Entity Name Entity Na me Field-1 (PK) Fie ld- 1 (PK) Field-2 2 (FK) Fie ld- (FK) Field-3 3 Fie ldField-4 4 Fie ldField-5 5 Fie ldField-6 Fie ld- 6 Very Lon g Field- 7 Very Lo ng Field -7 Very Lon g Field- 8 Very Lo ng Field -8

Entity Name Entity Na me Field-1 1 (PK) Fie ld- (PK) Field-2 2 Fie ldField-3 3 Fie ld-

Classification

Classificatio n Classificat ion

Platform Specific DDL

Physical Database Engineering

Consolidated Consolidated EDW EDW


44 2006 IBM Corporation

ETL Extract, Transfer and Load


2009 IBM Corporation

Best Practices: Using Information Template for Data Mart Consolidation


OSS/BSS OE/OM OE/OM Billing Billing Campaign Mgmt. Campaign Mgmt. CRMS CRMS Retail POS Retail POS MOLAP ROLAP
Bespoke ETL ETL

OSS/BSS General Ledger General Ledger Billing Billing

DW #1 DW #1 (Marketing) (Marketing)
Aggregations

DW #2 DW #2 (Finance) (Finance)

Bespoke ETL

A/P, A/R A/P, A/R Collections Collections Retail POS Retail POS

Profiling

Scoped TBSTs
TDW Standard Measures and Dimensions MOLAP ROLAP
DB2 OLAP Server Business Objects Cognos Impromptu Microstrategy

DW #1 DW #1 Consolidated Consolidated (Marketing) (Marketing) EDW EDW DW #2 DW #2 (Finance) (Finance)


45

Aggregations

Consolidated Consolidated Data Mart Data Mart


2006 IBM Corporation 2009 IBM Corporation

Best Practices: Using Information Frameworks for Standardization across Operating Subsidiaries
Legend
Legacy TDW Standardized Shared TDW Standardized & Shared

GROUP
Sales and Marketing OE/OM NOPs and Provisioning Usage and Retention Billing and Finance Customer Care
Concepts & KPIs
Customer Prospect Product (Offered) Campaign Business Partner Channel Time Period -------------------------------# Hits # Unique visitors # Prospects Response % Campaign Perf. Achievement % Acquisition Cost Gross Sales Net Sales Commission Amt.

MOLAP

ROLAP

Concepts & KPIs


Customer/Prospect Customer Hierarchy Billing Account Product (Ordered) Product Component Rateplan Location Payment Method Campaign Time Period -------------------------------# Customers # Prospects # Cross-sells # Up-sells Response % Campaign Perf. Achievement % Acquisition Cost Gross Sales Net Sales Commission Amt.

Concepts & KPIs


Customer Billing Account Product (Technical) Service Location Service Instance VAS CPE, Make/Model Network Element QoS Time Period -------------------------------# Customers # Opening Orders # Ordered Opened # Orders Closed # Closing Orders Installation Expense Network Expense CPE Subsidies CPE Revenue Capacity Used Capacity Available

Concepts & KPIs


Customer Billing Account Product (Service) Service Provider Origination Termination Direction Completion Type Time Period ------------------------------# Customers # of Connections # of End Users Failure % Actual Duration Rated Duration Rated Volume Rated Amount Charged Duration Charged Volume Charged Amount Churn Rate Rotational Churn IDD defection % Retention Expense Win-back Expense

Concepts & KPIs


Customer Billing Account G/L Account Cost Center Product (Billed) Billing Channel Invoice Hdr./Det. Payment Method Time Period ------------------------------# of Invoices Billed Amount Entitled Amount Opening Balance Adjusted Amount Waived Amount Closing Balance Total Revenue ARPU Fixed Cost Variable Cost Gross Profit

Concepts & KPIs


Customer Billing Account Product (Technical)) Service Location Service Instance VAS CPE, Make/Model Network Element Warranty Time Period ------------------------------Opening Tickets Tickets Opened Tickets Closed Closing Tickets # of Threads # of Network Faults Availability % Avg. time to resolve # of QoS violations Total Expense

GROUP EDW

No ODS TDWM Based EDW TBST based ROLAP/MOLAP Some shared EDW views Some shared ROLAP/MOLAP

Subsidiary A
Sales and Marketing OE/OM NOPs and Provisioning Usage and Retention Billing and Finance Customer Care
Concepts & KPIs
Customer Prospect Product (Offered) Campaign Business Partner Channel Time Period -------------------------------# Hits # Unique visitors # Prospects Response % Campaign Perf. Achievement % Acquisition Cost Gross Sales Net Sales Commission Amt.

Enterprise KPIs

Subsidiary B
TDW based EAI ODS DW partly reengineered with TDWM TBST based ROLAP/MOLAP TBST based ROLAP reports (some shared)
Sales and Marketing OE/OM NOPs and Provisioning Usage and Retention Billing and Finance Customer Care
Concepts & KPIs
Customer Prospect Product (Offered) Campaign Business Partner Channel Time Period -------------------------------# Hits # Unique visitors # Prospects Response % Campaign Perf. Achievement % Acquisition Cost Gross Sales Net Sales Commission Amt.

Concepts & KPIs


Customer/Prospect Customer Hierarchy Billing Account Product (Ordered) Product Component Rateplan Location Payment Method Campaign Time Period -------------------------------# Customers # Prospects # Cross-sells # Up-sells Response % Campaign Perf. Achievement % Acquisition Cost Gross Sales Net Sales Commission Amt.

Concepts & KPIs


Customer Billing Account Product (Technical) Service Location Service Instance VAS CPE, Make/Model Network Element QoS Time Period -------------------------------# Customers # Opening Orders # Ordered Opened # Orders Closed # Closing Orders Installation Expense Network Expense CPE Subsidies CPE Revenue Capacity Used Capacity Available

Concepts & KPIs


Customer Billing Account Product (Service) Service Provider Origination Termination Direction Completion Type Time Period ------------------------------# Customers # of Connections # of End Users Failure % Actual Duration Rated Duration Rated Volume Rated Amount Charged Duration Charged Volume Charged Amount Churn Rate Rotational Churn IDD defection % Retention Expense Win-back Expense

Concepts & KPIs


Customer Billing Account G/L Account Cost Center Product (Billed) Billing Channel Invoice Hdr./Det. Payment Method Time Period ------------------------------# of Invoices Billed Amount Entitled Amount Opening Balance Adjusted Amount Waived Amount Closing Balance Total Revenue ARPU Fixed Cost Variable Cost Gross Profit

Concepts & KPIs


Customer Billing Account Product (Technical)) Service Location Service Instance VAS CPE, Make/Model Network Element Warranty Time Period ------------------------------Opening Tickets Tickets Opened Tickets Closed Closing Tickets # of Threads # of Network Faults Availability % Avg. time to resolve # of QoS violations Total Expense

Concepts & KPIs


Customer/Prospect Customer Hierarchy Billing Account Product (Ordered) Product Component Rateplan Location Payment Method Campaign Time Period -------------------------------# Customers # Prospects # Cross-sells # Up-sells Response % Campaign Perf. Achievement % Acquisition Cost Gross Sales Net Sales Commission Amt.

Concepts & KPIs


Customer Billing Account Product (Technical) Service Location Service Instance VAS CPE, Make/Model Network Element QoS Time Period -------------------------------# Customers # Opening Orders # Ordered Opened # Orders Closed # Closing Orders Installation Expense Network Expense CPE Subsidies CPE Revenue Capacity Used Capacity Available

Concepts & KPIs


Customer Billing Account Product (Service) Service Provider Origination Termination Direction Completion Type Time Period ------------------------------# Customers # of Connections # of End Users Failure % Actual Duration Rated Duration Rated Volume Rated Amount Charged Duration Charged Volume Charged Amount Churn Rate Rotational Churn IDD defection % Retention Expense Win-back Expense

Concepts & KPIs


Customer Billing Account G/L Account Cost Center Product (Billed) Billing Channel Invoice Hdr./Det. Payment Method Time Period ------------------------------# of Invoices Billed Amount Entitled Amount Opening Balance Adjusted Amount Waived Amount Closing Balance Total Revenue ARPU Fixed Cost Variable Cost Gross Profit

Concepts & KPIs


Customer Billing Account Product (Technical)) Service Location Service Instance VAS CPE, Make/Model Network Element Warranty Time Period ------------------------------Opening Tickets Tickets Opened Tickets Closed Closing Tickets # of Threads # of Network Faults Availability % Avg. time to resolve # of QoS violations Total Expense

No ODS Legacy DW Legacy ROLAP/MOLAP TBST ROLAP (some shared)

OE/OM OE/OM OE/OM OE/OM Billing Billing OE/OM OE/OM OE/OM OE/OM Campaign Mgmt Campaign Mgmt OE/OM OE/OM OE/OM OE/OM CRMS CRMS OE/OM OE/OM OE/OM OE/OM Retail POS Retail POS OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM Provisioning Provisioning OE/OM OE/OM OE/OM OE/OM Network Ops Network Ops OE/OM OE/OM A/P,OE/OM A/R, G/L A/P,OE/OM A/R, G/L OE/OM OE/OM OE/OM OE/OM Collections Collections

EAI ODS EAI ODS

MOLAP

ROLAP

ROLAP

MOLAP

DW-A

DW-B

OE/OM OE/OM OE/OM OE/OM Billing Billing OE/OM OE/OM OE/OM OE/OM Campaign Mgmt Campaign Mgmt OE/OM OE/OM OE/OM OE/OM CRMS CRMS OE/OM OE/OM OE/OM OE/OM Retail POS Retail POS OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM OE/OM Provisioning Provisioning OE/OM OE/OM OE/OM OE/OM Network Ops Network Ops OE/OM OE/OM A/P,OE/OM A/R, G/L A/P,OE/OM A/R, G/L OE/OM OE/OM OE/OM OE/OM Collections Collections

46

2006 IBM Corporation

2009 IBM Corporation

IBM Software Group

Seminario: lEcosistema DataWarehouse


Trend Tecnologici: L'era delle DWA - Data Warehouse Appliances
Fabrizio Napolitano, IBM Data Warehouse Architect Roma, 09 Aprile 2010

2009 IBM Corporation

What is a DWA?
One Purpose sole purpose is supporting data warehouse processing One Package tested, ordered, and delivered as a single system One Install installed and maintained as a single system One Support single point of service provided by a single vendor

Native Data Warehouse Appliance The hardware and software is tightly integrated into a single data warehouse solution. The software and hardware are not individually licensed and cannot be separated. Examples of vendors here include DATAllegro, Netezza, and Teradata. Software Data Warehouse Appliance Commercial or open source relational DBMS software is designed and/or optimized for data warehouse processing. The software supports hardware solutions purchased from one or more third-party vendors. Examples of vendors here include Greenplum and Sybase (Sybase IQ). Packaged Data Warehouse Appliance Commercial software and hardware is tuned for data warehousing, is packaged and supplied by a single vendor, and is installed and maintained as a single system. Examples of vendors here include HP (NeoView), IBM (Smart Analytics System), and Sun/Greenplum (Data Warehouse Appliance) Data Management Appliance Offloads data intensive operations from a host computer. The offloaded workload may involve operational, specialized analytics, or archival processing. Examples of vendors here include ParAccel and Dataupia

48

2009 IBM Corporation

Which Workload type each DWA type can handle?

49

2009 IBM Corporation

What are the main Infrastructure Architecture?

50

2009 IBM Corporation

What are the technological trends?

Next generation Data Warehouse Platforms Philip Russom (TDWI Best Practice Report)
51 2009 IBM Corporation

New and Growing Demands on the Data Warehouse

Scalability Data Explosion Extreme Performance Mixed workloads Traditional complex query Short OLTP queries Real time load and updates Advanced Workload management Integrated analytics
52 2009 IBM Corporation

DWA- An Example:
IBM Smart Analytics Systems
The IBM Smart Analytics System is the The IBM Smart Analytics System is the complete analytics solution comprised of precomplete analytics solution comprised of pretested, scalable and fully-integrated system tested, scalable and fully-integrated system components of Software, Server and Storage components of Software, Server and Storage Deeply Optimized by IBM Experts Flexible Growth to Meet Changing Business Needs

Analytics Software Options Business Intelligence Capabilities (Cognos) Cubing Services (InfoSphere Warehouse - ISW) Text Analytics & Data Mining (ISW) Powerful Data Warehouse Warehousing Platform (ISW) Advanced Workload Management (ISW) System Automation (Tivoli System Automation) Hardware & Services Server Platform (IBM p6 or xSeries) Storage Capacity (IBM DS storage systems) Build, Deploy, Health Check & Premium Support Services

53

2009 IBM Corporation

IBM Smart Analytics System


Out-of-the-box Solution

One Package One Package


Testing and Testing and Validation Validation

All in one: software, All in one: software, hardware and services hardware and services Pre-configured Pre-configured package installed on package installed on data center floor data center floor One phone number to One phone number to fix your problem fix your problem

One Install One Install


Installation Installation and and Configuration Configuration Acquire Acquire Components Components
PrePreimplementation implementation System sizing System sizing

One Support One Support

IBM Smart Analytics IBM Smart Analytics

Build from Scratch


54

Pre-built Solution
2009 IBM Corporation

MPP systems: Predictable Scaling


Double the data, double system resources Each partition processes the same amount of data as before
Response times and throughput will remain constant

Double the system resources, same data Each partition processes the amount of data as before
Response times will be 2x faster, and throughput will double

Keep system resources constant, double the data Each partition processes double the amount of data as before
Response times should double, and throughput will be cut in half

55

2009 IBM Corporation

Parallel Query Processing Automatic Data Distribution


select sum(x) from table_a,table_b where a = b connect

46
Get statistics

Sum

Optimize
Join Read A Read B

Coord

sum() Catalog HASH (trans_id) HASH (trans_id) sum=12 sum=13 sum=11

DISTRIBUTE BY
sum=10

Agent
Sum Join

Agent
Sum Join B A

Agent
Sum Join B A

Agent
Sum Join B A

Part1

Part2

Part3

PartN

table_a table_b
56 2009 IBM Corporation

Predictable Scaling

Users network

IBM Smart Analytics System

User Module User Module User Module User Module


SMP server SMP server

Private GigE network

DB2 DB2 Partition Partition

DB2 DB2 Partition Partition

DB2 DB2 Partition Partition

DB2 DB2 Partition Partition

SMP server

SMP server

I/O Channels Storage server

57

2009 IBM Corporation

Traditional Large Scans Result in I/O Wait

58

2009 IBM Corporation

DB2 Database Partitioning Feature = Divide I/O


Database Partition 1 Database Partition 2 Database Partition 3

59

2009 IBM Corporation

Add Range Partitioning to Further Reduce I/O


Database Partition 1 Database Partition 2 Database Partition 3

January

February

March

60

2009 IBM Corporation

Add MDC to Further Reduce I/O


Database Partition 1 Database Partition 2 Database Partition 3

January

February

March

61

2009 IBM Corporation

Compression Further Reduces I/O by a Factor of 4


Database Partition 1 Database Partition 2 Database Partition 3

January

February

March

62

2009 IBM Corporation

InfoSphere Warehouse Data Compression


Compression looks for repeating patterns across the entire table When pattern found, string replaced by a 12bit symbol Symbols are stored in a dictionary for fast lookup

Dictionary Name Zikopoulos Katsopoulos Dept 510 500 Salary 56105 82475 City Whitby Whitby Province ONT ONT Postal_Code L4N5R4 L4N5R4 01 02

opoulos WhitbyONTL4N5R4

Zikopoulos

510

56105

Whitby

ONT

L4N5R4

Katsopoulos
Unique to InfoSphere

500

82475

Whitby

ONT

L4N5R4

Zik (01)

510

56105

(02)

Kats (01)

500

82475

(02)

63

2009 IBM Corporation

Improving the Best Compression in the Industry


Multiple algorithms for automatic index compression
Unique in the industry

Automatic compression for temporary tables


Table Order By Temp Table Order By Temp

Unique in the industry

Intelligent compression of large objects and XML

64

2009 IBM Corporation

Storage Savings from Compression


With DB2 9, were seeing compression rates up to 83% on the Data Warehouse. The projected cost savings are more than $2 million initially with ongoing savings of $500,000 a year. - Michael Henson

81% Smaller

79% Smaller

SALES Table

81% Smaller

PRODUCT Table
2009 IBM Corporation

78% Smaller

65

Performance Speedup from Compression

40% Faster

66

2009 IBM Corporation

Workload Manager

Identification and control of applications Enabling Enterprise Data Warehouse Direct control of the execution environment Tight integration with SO WLM Detection and control of rogue queries Prevent bad queries from executing Query concurrency Optimize query throughput Advanced monitoring Real time monitoring of query execution
67 2009 IBM Corporation

Workload Manager Example


InfoSphere Warehouse User Requests
Marketing Marketingapps

Managers Marketingmgrs

Default Workload Default User Class

System Requests
Default System Class

68

2009 IBM Corporation

Tiered Approach to WLM

New

69

2009 IBM Corporation

Case Study for DILLARD'S INC


The Challenge
Focus on four areas of its business:
Dillards, Inc. (Dillards) is a major department store chain in the United States operating about 330 stores in 30 states, covering the Sunbelt and the central US.

Revenue growth Cost saving Customer relationship Operational efficiency

The Benefits
Provide the business insights to right people at right time
Customer segmentation Market basket analysis Improve customer loyalty Improve profitability
Client quote
"Now I can take markdowns by market its a 1hour process instead of two days." "I see winners and losers more quickly in 20 minutes I have the facts!" "Saves me at least 8 hours a week!" "Its a competitive imperative without it, wed be behind the eight ball!"

The Solution
Dillards extensively uses components of IBMs Smart Analytics System (embedded Mining products). Using mining analytics, Dillard's is able to obtain valuable insights into inventory management, vendor relationship management and customer spending patterns, which has resulted in increased efficiencies for the company.

70

2009 IBM Corporation

Examples of DILLARD'S Business Requirements


? ?
How to improve promotion effectiveness based on womens shoes How to characterize distinct shopping behavioral segments for customers who have previously purchased womens shoes
What do my womens shoes customers look like? Which of these customers should I target in a promotion? How can I improve customer loyalty and customer advocates?

? ?

For each of these customer segments, how to discover affinities among womens shoes and other items in other departments
Which products should I use for a promotion? Which products should I replenish in anticipation of a promotion?

How to identify the items that a womens shoes customer is most likely to purchase next?

Data Mining
Intelligent Miner for Data Intelligent Miner Modeling Intelligent Miner Scoring Intelligent Miner Visualization

71

2009 IBM Corporation

Data Mining Solution Process


Business Requirements
Selected Data Source Data Transformed Data
Y=f(x,z)

Discovered Information
A B

Assimilated Knowledge

Applied Knowledge

Data Preparation Process

Data Mining Process

Deployment

Measure

Select

Transform

Mine

Analyze

Score

Explore

Aggregate

Model

Understand

Deploy

Calculate

Validate

Data Enhancement

Model Refinement

72

2009 IBM Corporation

Data Mining Approach


Data Mining Approach
Shoe Customer Purchasing Behavior Table 3 million customers Shoe Customer Transactions Table Shoe customers table 80+ million transactions Average ~ 3 transactions per customer per month

Customer Purchase History Table

2. Create shoe customers attributes 3. Select shoe customers transactions

Segmentation

1. Select all customers who purchased womens shoes in past 12 months

MBA

73

2009 IBM Corporation

Customer Segmentation and MBA


Customer segmentation
Who are our customers? Which of these customers should I target in a promotion?

Who will respond to discounting?

Who were not classified as VIP, shopped as if they were?

Which of these customers should I target in a promotion?

Market basket analysis


Which products should I use for a promotion? What does a customer is mostly likely to purchase next? How to place the items with close proximity?
74 2009 IBM Corporation

Business Insights of Data Mining


Customer segmentation
Dillards discovered a segment of shoppers who were not classified as VIP, however, shopped as if they were. Furthermore, this newly discovered segment made large purchases, responding to discounts more than other VIP segments, and became a targeted segment that increased sales and profit for the company.

MBA (market basket analysis)


Traditional perception
Womens shoes draw a large percentage of our customers These come to Dillards only for womens shoes These are our most profitable customers

Mining result
Certain segments of customers buy shoes as a secondary purchase These cross-shop the store and are our most profitable customers Those who purchase shoes as a primary or only purchase are not our most profitable customers

75

2009 IBM Corporation

Bibliography
DataWarehouse Life Cycle by Ralph Kimball et al. John Wiley & Sons 2008 (636 pages) ISBN:9780470149775 DataWarehouse Toolkit by Ralph Kimball and Margy Ross John Wiley & Sons 2002 (436 pages) ISBN:9780471200246 The Anti-Architect Ralph Kimball , article on Intelligent Enterprise, January 14, 2002 http://intelligententerprise.informationweek.com/020114/502warehouse1_2.jhtml Top Ten Data Warehouse Best Practices Nancy Kopp, IBM, Session 2162 - IBM IOD 2006 Conference 10 Mistakes to Avoid in a Business Intelligence Delivery Lalitha Chikkatur , Information Management Special Reports, September 16, 2008 http://www.information-management.com/specialreports/2008_97/100019351.html?pg=1
76 2009 IBM Corporation

Bibliography
What Not to Do Ralph Kimball , article on Intelligent Enterprise http://intelligententerprise.informationweek.com/011024/416warehouse1_1.jhtml Brave New Requirements for Data Warehousing Ralph Kimball , article on Intelligent Enterprise http://intelligententerprise.informationweek.com/db_area/archives/1998/9810/warehouse.jhtml Next generation Data Warehouse Platforms Philip Russom (TDWI Best Practice Report) Data Warehouse Appliances: Evolution or Revolution? by Richard Hackathorn, Colin White (BeyeResearch) http://www.beyeresearch.com/study/4639 Are Data Warehouse Appliances in Your Future? Plan On It! (G00174689) Gartner Group
77 2009 IBM Corporation

Bibliography
Appliance Power: Crunching Data Warehousing Workloads Faster And Cheaper Than Ever James Kobielus, Forrester Data Warehouse Architecture Best Practice and Guiding Principles (G00171980) Gartner Group Fundamentals of Data Warehousing for the CIO (G00167390) Gartner Group Changing the Dynamics of the Business with Analytics Lou Agosta , PhD , Indipendent IT Industry Analyst Operational BI: Expanding BI Through New, Innovative Analytics Going Beyond the Traditional Data Warehouse Claudia Imhoff, Ph.D Powering Next Generation BI Systems Madan Sheina, OVUM Mixed Articles from Kimball Group Archive http://www.ralphkimball.com/html/articles.html
78 2009 IBM Corporation

Additional Bibliography
Building and Maintaining a Data Warehouse by Fon Silvers Auerbach Publications 2008 (330 pages) ISBN:9781420064629 Mastering Data Warehouse Design: Relational and Dimensional Techniques by Claudia Imhoff, Nicholas Galemmo and Jonathan G. Geiger John Wiley & Sons 2003 (438 pages) ISBN:9780471324218 A Manager's Guide to Data Warehousing by Laura L. Reeves John Wiley & Sons 2009 (480 pages) ISBN:9780470176382 Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals by Paulraj Ponniah John Wiley & Sons 2001 (544 pages) ISBN:9780471412540 Data Warehouse Performance by W.H. Inmon, Ken Rudin, Christopher K. Buss and Ryan Sousa John Wiley & Sons 1999 (444 pages) ISBN:9780471298083 Building the Data Warehouse, Fourth Edition by W. H. Inmon John Wiley & Sons 2005 (574 pages) ISBN:9780764599446
79 2009 IBM Corporation

You might also like