Professional Documents
Culture Documents
Dimensional modeling makes raw data easier for BI end users to understand
Dimensional modeling is made up of both logical and physical modeling. Measures are the core
of the dimensional model and are data elements that can be summed, averaged, or
mathematically manipulated. Fact tables are at the center of the star schema (which is a
physically implemented dimensional model), and data marts are made up of multiple fact tables.
You can use dimensional modeling to create an enterprise data warehouse that makes it easier for
business intelligence (BI) end users to access and understand data.
to enforce referential integrity; after all, the data has already been through the extraction,
transformation, and loading (ETL) process, having been scrubbed and validated before being
loaded into the data warehouse. Instead, the relationship's function is to associate keys in the fact
table with the expanded definitions, which are found in the dimension tables.
In a data warehouse environment, you rarely retrieve just one record. Typically, hundreds,
thousands, or even millions of records are retrieved in a single query, and the most logical thing
to do with record sets that large is to perform a mathematical process on their data. That's why
the section of a fact table that contains the numeric and additive columns is arguably the most
important section.
You can start your dimensional design by reviewing the ER model of the transactional data
source (the operational database). You can usually identify potential fact tables by locating the
associative tables that represent the many-to-many (M:N) relationship in an ER model. An ER
model will break down into multiple dimensional diagrams: The number of diagrams is
determined by the number of functions in the organization and the organization's BI reporting
needs.
To create a dimensional model from an ER model, first separate the ER model into its discrete
business processes. Each business process can be expressed as a data marta modular, highly
focused, richly detailed, incrementally designed component of the enterprise data warehouse.
Although there are many ways to approach and implement an enterprise data warehouse, the data
mart approach lets you tackle the data warehouse project business-process-by-business-process
and produce deliverables for your user community.
If this is your first excursion into dimensional modeling, start with a single-source data mart;
don't try to tackle multiple-source data marts until you've acquired the skill set necessary to
design a complex data warehouse. Examples of single-source data marts include retail sales,
purchase orders, shipments, and payments. An example of a multiple-source data mart is
customer profitability, which combines revenue and costs that often come from separate
transactional sources (e.g., the sales database and the inventory database, respectively).
For each preliminary data mart, identify the M:N relationships from the transactional model that
are comprised of numeric and additive non-key data. These relationships will most likely be the
associative tables in the ER model. Designate these as fact tables. It's not unusual for a large ER
model to produce a dimensional model that has from 10 to 25or even morefact tables.
Note that as of right now, I haven't been able to confirm that the SQL Server 2005 Analysis
Services (SSAS) optimizer uses heuristics rather than a cost-based analysis. However, I do know
that SSAS uses thin-client architecture to minimize the load on the client computer, ensuring that
SSAS will have scalable performance. The SSAS calculation engine is entirely server-based, so
all queries are resolved on the server, optimizing the use of the corporate network. Each query,
regardless of its complexity, requires just one round trip between the client and the server. For
more information about dimensional modeling and SSAS, see "Dimensional Modeling Basics,"
April 2006.
The dimensional model is extensible and expandable; it can accommodate changes in user
behavior, new data elements, and new ways of analyzing the data. You can easily change a fact
table in place by executing the ALTER TABLE commandthe table doesn't have to be dropped
and recreated and the data doesn't have to be reloaded. You can add new (and unanticipated)
columns to the fact table as long as they're consistent with the fact table's existing grain (as I
explain later).
A single fact table does not a data mart makeat least, not usually. Most data marts consist of
multiple coordinated fact tables that have similar structures because they're all derived from one
business supply chain or value chain. An example of a manufacturing supply chain would be
ingredient purchasing, ingredient inventory, bill of materials, manufacturing process control and
costs, packaging, and finished goods inventory. Many companies have software systems that
manage the flow of control through the supply chain. Usually, there's a data source for each step
of the supply chain. Each data source can translate into a set of data marts and has at least one
fact table. After the manufacturing step, the flow of control is called a value chain. An example
of a retail value chain is ship from manufacturing, reseller inventory, reseller shipments, retail
inventory, and retail sales.
The purpose of the data warehouse is to provide BI end users with a single source of information
regarding supply and value chains so they can follow the data as it flows through the business
cycle from beginning to end.
When doing dimensional designs, it helps to understand what kinds of information you can
request from the data. For example, from fine-grained fact tables such as Reseller_Sales, you can
analyze behavior and frequencies and perform behavior counts, such as the number of times a
customer places an order on the date of a storewide inventory sale. You can do time-of-day
analysis (e.g., determine whether online sales increase during the lunch hour) and queue analysis
(e.g., determine highway capacity and traffic flow, packet traffic on IP networks, and customer
movement within an online store). You might be able to do some sequential behavior analysis
(e.g., does action A always result in action B?), which could in turn warn you of possible fraud or
customer cancellation intent (e.g., this reseller is planning to drop out of your company's partner
program). Finally, every retail data mart should lend itself to basket analysis (e.g., do people
really buy more beer if it's placed near the baby diapers?) and missing-basket analysis (e.g., what
didn't work the way we thought it would and why?). These are all valuable measures that can be
derived only with the assistance of a fine-grained fact table containing records at the individual
transaction level.