You are on page 1of 7

Allan Peda

Enterprise Application Integration


Y52.3250
Prof. Aldrich Wright

Allan Peda
Week 3, Kimball vs Inmon Data Warehouse Designs

Due Date: February 17, 2007, 10PM

Page 1 of 7
Allan Peda

Assignment
Describe a Business Problem or set of business problems that the your company
is trying to solve through a data warehouse or data mart. Compare and contrast
the Kimball and Inmon approaches to data warehouse. Pick the one that best
solves your business problem and why.

Problem Statement
The chosen business problem for this data warehouse is the determination of
enrollment trends and subject interest within a university. This information derived
from these data is of tremendous value as it will be used to develop long term
growth plans for the university including campus expansion, departmental
funding levels and potential areas for enhanced research and development
funding. The initial focal point of this from a business unit perspective will be
student registration data and library collection access records. This will define
initial data feeds for ETL processing.

Problem Resolution

The chosen design is the relational (Inmon) database design as it is considered


to be quicker to develop, supports gradual expansion over time, is robust, flexible
and scalable. These attributes are considered more important than speed of
implementation or raw performance. Kimballs multidimensional system (also
known as BUS for reasons discussed below) is considered more rigid because
well defined conforming dimension and fact tables must be selected at the outset
of system development. The fact that the BUS architecture is modeled on the
analytical requirements of the end users creates difficulties when these
requirements change.

Page 2 of 7
Allan Peda

Relational Multidimensional
High Performance  
Considered Quicker To Implement  
Gradual Development Supported  
Robust  
Flexible  
Scalable Performance  

A more extensive discussion of each system is developed below.

Common Attributes among all data warehouse designs

No matter which design paradigm is used to approach the development of a data


warehouse design, the basic definition of a data warehouse remains the same. It
is a subject oriented, integrated, nonvolatile and time variant collection of data in
support of management's decisions. As such it can be viewed as one type of
read intensive online analytical processing (OLAP) system which must integrate
a large amount of data derived from multiple business units. These data must be
consistent and summarized at the appropriate level of granularity, consequently
an Extract, Transform and Load (ETL) step is necessary in drawing data from the
operational (OLTP) systems.1 Both designs emphasize the importance of the ETL
step in developing cleansed data which is loaded from a common Integrated
Data Store. There must be no direct dependence on legacy or operational data.

The Multidimensional (Kimball) Approach

As noted earlier, the Kimball design is developed along the requirements of the
end user. As such the data must be denormalized and refactored using a well
defined set of end user requirements. The underlying logical architecture is not

1 This step is sometimes altered slightly with data transformation done primarily within the
database. This is referred to as Extract, Load and Transform (ELT).
http://www.b-eye-network.com/blogs/linstedt/archives/2005/05/elt_and_etl_can.php

Page 3 of 7
Allan Peda

relational, instead it is multidimensional (MOLAP, often referred to as a


multidimensional hypercube). It should be noted that star schemas can be
developed using tables within an RDBMS. In such cases it is possible to reduce
data redundancy and lower storage requirements by reintroducing some level of
normalization to the data. Such controlled reintroduction of normalization is
known as snowflaking of a star schema. Physical MOLAP database systems (as
opposed to the logical schema) are typically proprietary commercial off the shelf
(COTS) products, and as such there is little publicly information regarding the
actual implementation of such systems.

Kimball data warehouses are comprised of multiple subject oriented data marts
which each center on a few core fact tables. These tables contain data which are
easily aggregated. Surrounding these fact tables are multiple dimension tables,
which contain the associated characteristics of these facts (time being an almost
universal attribute for example). These dimension tables are considered to be the
entry points used to access and aggregate the associated facts. They are used
for filtering, grouping and labeling.

In order for queries to determine what correlations between facts and attributes, it
is desirable for dimension and fact tables to be shared among several subject
areas. The careful design and selection of this set of conforming fact and
dimension tables is referred to by Kimball as the BUS architecture.

Page 4 of 7
Allan Peda

Illustration 1: Data flow in the Kimball Data Warehouse Architecture (from The
Data Warehouse Toolkit, Kimball 2nd edition,2003)

The Relational (Inmon) Approach


In contrast with the architecture developed by Ralph Kimball, Bill Inmon's design
is based on a more flexible relational design, which is shaped by the enterprise
data model. The relational design is the more abstract and more flexible one,
however this flexibility is achieved at a cost, and it is generally accepted that
relational model is not optimal in terms of reporting performance requirements.
Data which are spread among several parent-child relationships will require
multiple lookups in order to fetch each record. In order to obtain acceptable
performance levels, the relational data warehouse is often systematically
denormalized in places. The following techniques are recommended by Inmon:

1. Tables may be physically merged (possibly by creating materialized views


during load step).
2. Data may be refactored so they are no longer atomic; this may be
accomplished via embedded arrays.

Page 5 of 7
Allan Peda

3. Data which has a low probability of access may be placed in a separate


tables.
4. Redundant data may be deliberately introduced to reduce the need to
lookup other tables.
5. Derived data may be placed within the database. This is a special
instance of data redundancy. This may be accomplished via regularly
scheduled batch processes or by use of a creative index during the data
loading step.
6. Most noteworthy perhaps is the fact that Inmon states that it is “patently
incorrect” to enforce referential integrity within a data warehouse. As such
any parent-child integrity within a data warehouse is referred to by Inmon
as a relationship “artifact”.
Inmon's design also relies on star (and snowflake) joins, however these design
techniques are restricted to the summarized data presented at the data
warehouse level. In Ralph Inmon's data warehouse designs, data marts are not
linked together via a data bus of conforming fact and dimension tables, instead
the RDBMS system upstream of the data marts is considered the warehouse
proper, and it is there where relations are derived allowing queries across subject
areas (contrast his to the use of conforming dimensions and facts within Kimball's
architectures). As noted earlier Kimball's logical architectures naturally map
directly to a physical multidimensional online analytical processing (OLAP)
database system known as MOLAP, while Bill Inmon's designs are inherently
relational OLAP (ROLAP). Combinations, or hybrids of both systems are known
as HOLAP.

Page 6 of 7
Allan Peda

Illustration 2: Data flow for the Inmon Relational Data Warehouse

References

1. Inmon, W.H., Building The Data Warehouse, 4th edition. Indianapolis,


Indiana: Wiley Publications, 2005.
2. Kimball, R and Ross, M. The Data Warehouse Toolkit, 2nd edition.
Indianapolis, Indiana: Wiley Publications, 2003.
3. Website for The Kimball Group
http://www.kimballgroup.com/
4. Website for Bill Inmons Corporate Information Factory (CIF)
http://www.inmoncif.com/home/

Page 7 of 7

You might also like