You are on page 1of 9

Corporate Technology Solutions

Dimensional Modeling
An Overview of Dimensional Modeling by Haitham Salawdeh

Dimensional Modeling Table of Contents:


INTRODUCTION: BASIC CONCEPTS AND TERMINOLOGY: TYPES OF FACT TABLES: DIMENSION TABLES SLOWLY CHANGING DIMENSIONS: DO NOTHING: ROW CENTRIC VERSIONING: COLUMN CENTRIC VERSIONING: HYBRID APPROACH: OTHER CONSIDERATIONS: STEPS TO A DM SOME DOS AND DONTS: REVERENCES: 3 3 5 6 7 7 7 8 8 8 8 9 9

Haitham Salawdeh

Page 2

June, 2009

Dimensional Modeling

Introduction:
Multi Dimensional Modeling (DM) stands in contrast to the normalized model (NM) in many ways. First and for most, modeled dimensionally, a structure is easier to navigate and understand. This is a plus especially when the user of the model is not familiar with database technologies and tools. Next, the audience of a normalized model is generally a developer or someone capable of navigating the normalized structures effectively. Lastly, the emphasis of a normalized model is on reducing data redundancy and making sure a datum is updated once and in one place. When ease of user navigation and performance are the primary concern data normalization becomes an obstacle. DM is generally used in the context of data warehousing. In this context the data warehouse is loaded once and accessed many times. On the other hand, NM is well suited for systems where updates/inserts and deletes happen frequently. This Overview is meant to introduce the concepts and language of Multi-Dimensional Modeling (DM). I will not spend a lot of time developing the motivation for DM or Warehousing.

Basic Concepts and Terminology:


A dimensional model is also called a star schema. A star is a model that has a center table called fact table surrounded by dimensions. Figure 1 shows an example of a star where the fact is the Transaction table and dimensions are Account and Transaction Date. The non-key values of a fact table are called measures. Measures are numerical values and they can be additive or semi-additive. An example of an additive measure would be the transaction amount in the Transaction fact table of depicted in Figure 1. The amount can be added over a period to generate a meaningful transaction total amount. A semi-additive fact, on the other hand, would be market value amount in a holding fact table of a fund company. Adding the market value cross a month is not meaningful. However, it is possible to have an average market value instead. In Figure 2 the BalanceAmount field is semi-additive.

Haitham Salawdeh

Page 3

June, 2009

Dimensional Modeling

Figure 1: A transaction fact table for a bank. There are times when a star has some of its portions normalized. The design is then said to be snow-flaked. An example of a snowflake design is when a security dimension table has a relationship to an industry classification table. To undo the snow flake the relationships generating it will have to be flattened. In the earlier example, the industry assignment will be made part of the security table. Later we will learn about a snowflake that cannot be avoided. Figure 2 illustrates such a situation where the relationship between accounts and customer cannot be flattened and need to be normalized.

Haitham Salawdeh

Page 4

June, 2009

Dimensional Modeling

Figure 2: A simple balance fact of a bank and its dimensions. It is important to note that when one is modeling a business process one should re-use what has been modeled previously. Other modeling sessions might have defined dimensions and facts that can be reused. This notion is also known as conforming dimension. When developing a warehouse iteratively this idea becomes paramount. The use of a dimension is conforming if we use an already defined dimension or a subset of it. Conforming dimensions coupled with conforming facts are the foundation of the warehouse bus architecture advocated by Ralph Kimball.

Types of Fact Tables:


In addition to being additive or non-additive, fact tables come in four flavors: transactional, periodic, accumulative or factless. First, a fact table can be a transactional snapshot. A transactional snapshot fact table represents a point of time in the life of business events. An example of a transactional snapshot would be a table capturing all order details of a retail website. Figure 1 shows another example of a transactional fact table.

Haitham Salawdeh

Page 5

June, 2009

Dimensional Modeling While a transactional snapshot represents a point in time, a periodic snapshot represents a pre-defined interval or a period. A daily balance fact table of a bank as depicted in Figure 2 is an example of a periodic snapshot.

Furthermore, an accumulative snapshot represents business activities over a time period. An accumulative fact could represent a fulfillment process of a mutual fund company. A row in that table will capture a customers first contact date, then the literature send date and finally an account open date. As a consequence, this is one of the only times when a fact table is updated in a data warehouse. Otherwise, a fact table row is not updated after it is loaded. Finally, a fact table does not necessarily have to have measures. It could simply be a bridge between dimensions. In that case the fact table is said to be factless.

Dimension Tables:
The dimensions define a structure around the facts. It is imperative that all dimensions be demoralized to ease the navigation of the model. In some cases dimensions have many-to-many relationships that cannot be flattened. It is also possible that the manyto-many is between a fact and dimension. It is appropriate then to have a multi-valued dimension to implement the relationship. A multi-valued dimension is simply a bridge table between the entities involved in the many-to-many relationship. Figure 2 shows a multi-valued dimension table depicting the account ownership. An account can have multiple relationships to customers including: primary, secondary and custodian. In addition, the number of secondary customers for an account might not be bound. It is best in such situations to normalize the dimensional model as we have done here. Furthermore, when attempting to conform dimension we are faced with using the same dimension with different name. An example of that is when a transaction fact table refers to a settlement date and a transaction date. The two dates should refer to a date dimension that is conforming. A way to do that is to introduce views on top of the date dimension for the settlement and transaction date. In this case the date is said to be role-playing. For example, in Figure 1 the TransactionDate dimension can be a view over the Date dimension depicted in Figure 2. In addition, it is possible to be left with some attributes that do not fit in any of the extracted dimensions and dont group well together. In that case it is not recommended to keep these attributes with the fact table. Instead, they can be pulled out into a Junk Dimension. Figure 3 shows how 2 indicators were grouped into a junk dimension. The TransactionIndicators dimension will have 4 rows and they are the combination of the 2 indicators possible values. As we add more indicators and multi-

Haitham Salawdeh

Page 6

June, 2009

Dimensional Modeling valued columns this dimension will grow. At one point we might need to split a junk dimension if the collection of columns is highly un-related.

Figure 3: Shows some indicators grouped as part of a junk dimension.

Slowly Changing Dimensions:


The fact tables in a data warehouse rarely see an update. They are generally appended to or truncated and reloaded. The dimension tables on the other hand do need to represent change and some cases the change is happening rapidly. This issue is known as Slowly Changing Dimension or SCD. There are generally few ways to deal with SCD.

Do Nothing:
In the datamodeling literature, this is known as type 1 SCD. In this scenario you simply overwrite the old values with the new. History is then lost forever. If there is no requirement to keep history this might be the approach you need to take. However, it is important to understand that this approach does come at a cost. All cubes will have to be rebuilt. Otherwise, reports and cubes will have dead paths.

Row Centric Versioning:


This is known as type 2 SCD. In type 2, for each change in a dimension we generate a new row. In the security dimension mentioned above we will add a new dimension row Haitham Salawdeh Page 7 June, 2009

Dimensional Modeling for a security if it were to change cusip. This allows us to represent history accurately. New holdings will point to the new dimensions and old holdings will continue to point to the old one. The drawback of this approach is that it does not allow us to associate the old facts with the new values. Depending on the requirements this could be unacceptable.

Column Centric versioning:


This is known as type 3 SCD. When using type 3 a designer identifies the columns that need to be tracked and over how many changes. Then the column is recreated with some sequence scheme. In our security dimension example if the cusip is the column to be tracked then we need to decide on how many changes to track. If we want to only keep track of a single change event we would create a new column and call it previous cusip. The issue with this approach is very apparent. The change is implemented into the structure of the table and cannot be revised easily. If it is important to track many change events then this approach is difficult to manage.

Hybrid Approach:
I personally like the use of type 2 SCD to capture the change and type 3 to chain link the dimensions. In the case of the cusip change example earlier, I would create a new security dimension row with the new values and add a new column to the security dimension called previous security. This column will point to the security that just changed. While it is hard for a user to navigate the chain and while I, otherwise, refrain from using recursion, I feel this approach meets many requirements. It might require IT or some technical users to assist in predefining navigation paths for less experienced users.

Other considerations:
In some dimensions some fields change much more frequently than others. It is admissible to break out the fields that change more rapid from the others. However, the breakup has to be done in a way that makes sense to the business user of the model who is not aware of the technical motivating the split.

Steps to a DM:
Now that we talked about the terminology of DM, where do we create a dimensional model? The initial step to creating a DM is to decide what business process(es) to model. As part of that the business requirements need to be harvested and the available data need to be understood.

Haitham Salawdeh

Page 8

June, 2009

Dimensional Modeling The next step is to decide on the grain of the model. The best and most flexible approach is to go for the lowest available grain. By doing this you give the user of your model a way to summarize data in an ad-hoc manner. It also allows your users to drill from a summary view to supporting details. In addition, keep all the measures of a fact table at the same grain. An example of grain declaration when modeling holdings for a fund company would state, We will model account holdings per day. The third step is to define the dimensions. In the above example the dimensions for a holding will include portfolio, holding date and security. Finally, we derive the facts described by the dimensions. The facts in the above example will include holding market value and holding cost basis.

Some Dos and Donts:


The use of surrogate, non-natural keys, is highly encouraged when designing a warehouse. This will isolate the warehouse from changes that happen to the natural key. It also does improve performance dependant on what type the natural key is. In addition, avoid the temptation of normalization. Many application developers and modelers cannot resist the temptation of snow-flaking. If you are one of those people stop it! Help your users and dont give in to your vices. Having said that, snow flaked models are sometimes necessary and unavoidable. Multi-valued dimensions are an example of when snow flaking is acceptable and indeed necessary. Finally, Dimension and fact conformance is a must in a successful warehouse implementation. If you are to guaranty that your warehouse is expandable you have to enforce fact and dimension conformance.

References:
Ralph Kimball and Margy Ross (2002). The Data Warehouse Toolkit. Second Edition. New York: Wiley Computer Publishing.

Haitham Salawdeh

Page 9

June, 2009

You might also like