You are on page 1of 19

Data Warehouse

Unlocking the mystery

Data warehousing is the design and implementation of processes, tools, and facilities to manage and deliver complete, timely, accurate, and understandable information for decision making. It includes all the activities that make it possible for an organization to create, manage, and maintain a data warehouse or data mart.

Data warehouse is a repository of an organization's electronically stored data. Data warehouses are designed to facilitate reporting and analysis. Data warehouse is a single source for key, corporate information needed to enable business decisions

The data warehouse is that portion of an overall Architected Data Environment that serves as the single integrated source of data for processing information

The concept of data warehousing dates back to the late 1980s when IBM researchers Barry Devlin and Paul Murphy developed the business data warehouse. In essence, the data warehousing concept was intended to provide an architectural model for the flow of data from operational systems to decision support environments. The origin of the concept of data warehousing can be traced back to the early 1980s, when relational database management systems emerged as commercial products. One of the prime concerns underlying the creation of these systems was the performance impact of end-user computing on the operational data processing systems. This concern prompted the requirement to separate end-user computing systems from transactional processing systems. The foundation of the relational model with its simplicity, together with the query capabilities provided by the SQL language, supported the growing interest in what then was called end-user computing or decision support.

Brief History

One of the prime concerns underlying the creation of these systems was the performance impact of end-user computing on the operational data processing systems. This concern prompted the requirement to separate enduser computing systems from transactional processing systems.

The role and purpose of data warehouses in the data processing industry have evolved considerably since those early days and are still evolving rapidly.
Comparing todays data warehouses with the early days decision support databases should be done with great care.

Brief History

Data warehouses should no longer be identified with database systems that support end-user queries and reporting functions. They should no longer be conceived as snapshots of operational data. Data warehouse databases should be considered as new sources of information, conceived for use by the whole organization or for smaller communities of users and data analysts within the organization

Brief History

Subject-Oriented: Information is presented according to specific subjects or areas of interest, not simply as computer files. Data is manipulated to provide information about a particular subject. Integrated: A single source of information for and about understanding multiple areas of interest. Non-Volatile: Stable information that doesnt change each time an operational process is executed. Information is consistent regardless of when the warehouse is accessed. Time-Variant: Containing a history of the subject, as well as current information. Historical information is an important component of a data warehouse. Accessible: The primary purpose of a data warehouse is to provide readily accessible information to end-users. Process-Oriented: It is important to view data warehousing as a process for delivery of information. The maintenance of a data warehouse is ongoing and iterative in nature.

Characteristics of Data warehouse

Based on analogies with real-life warehouses, data warehouses were intended as large-scale collection/storage/staging areas for corporate data. Data could be retrieved from one central point or data could be distributed to retail stores or data marts that were tailored for ready access by users. Data Mart: A data structure that is optimized for access. It is designed to facilitate end-user analysis of data. It typically supports a single, analytic application used by a distinct set of workers.

Data Warehouse Architecture in the context of an organization's data warehousing efforts is a conceptualization of how the data warehouse is built. There is no right or wrong architecture; rather multiple architectures exist to support various environments and situations. The worthiness of the architecture can be judged in how the conceptualization aids in the building, maintenance, and usage of the data warehouse

Data Architecture describes how data is processed, stored, and utilized in a given system. It provides criteria for data processing operations that make it possible to design data flows and also control the flow of data in the system. The Data Architecture breaks a subject down to the atomic level and then builds it back up to the desired form. The Data Architect breaks the subject down by going through 3 traditional architectural processes: Conceptual - represents all business entities. Logical - represents the logic of how entities are related. Physical - the realization of the data mechanisms for a specific type of functionality.

Data Warehouse Architecture

Operational database layer The source data for the data warehouse - An organization's Enterprise Resource Planning systems fall into this layer. Data access layer The interface between the operational and informational access layer Tools to extract, transform, load data into the warehouse fall into this layer.

Metadata layer The data directory - This is usually more detailed than an operational system data directory. There are dictionaries for the entire warehouse and sometimes dictionaries for the data that can be accessed by a particular reporting and analysis tool.

Conceptualization of Data warehouse with the interconnected layers

Informational access layer The data accessed for reporting and analyzing and the tools for reporting and analyzing data - Business intelligence tools fall into this layer. And the InmonKimball differences about design methodology, discussed later in this article, have to do with this layer.

A data warehouse provides a common data model for all data of interest regardless of the data's source.

Prior to loading data into the data warehouse, inconsistencies are identified and resolved. This greatly simplifies reporting and analysis.
Information in the data warehouse is under the control of data warehouse users so that, even if the source system data is purged over time, the information in the warehouse can be stored safely for extended periods of time. Because they are separate from operational systems, data warehouses provide retrieval of data without slowing down operational systems. Data warehouses can work in conjunction with and, hence, enhance the value of operational business applications, notably customer relationship management (CRM) systems. Data warehouses facilitate decision support system applications such as trend reports (e.g., the items with the most sales in a particular area within the last two years), exception reports, and reports that show actual performance versus goals

Benefits of Data Warehousing

There are also disadvantages to using a data warehouse. Some of them are:

Data warehouses are not the optimal environment for unstructured data
Because data must be extracted, transformed and loaded into the warehouse, there is an element of latency in data warehouse data.

Over their life, data warehouses can have high costs. The data warehouse is usually not static. Maintenance costs are high.
Data warehouses can get outdated relatively quickly. There is a cost of delivering suboptimal information to the organization.

Disadvantages of Data Warehousing

There is often a fine line between data warehouses and operational systems. Duplicate, expensive functionality may be developed. Or, functionality may be developed in the data warehouse that, in retrospect, should have been developed in the operational systems and vice versa.

Data warehousing, like any technology niche, has a history of innovations that did not receive market acceptance A 2009 Gartner Group paper predicted these developments in business intelligence/data warehousing market.

The Future of Data Warehousing

Because of lack of information, processes, and tools, through 2012, more than 35 per cent of the top 5,000 global companies will regularly fail to make insightful decisions about significant changes in their business and markets. By 2012, business units will control at least 40 per cent of the total budget for business intelligence. By 2012, one-third of analytic applications applied to business processes will be delivered through coarsegrained application mashups.

The Future of Data Warehousing

Businesses of all sizes and in different industries, as well as government agencies, are finding that they can realize significant benefits by implementing a data warehouse. It is generally accepted that data warehousing provides an excellent approach for transforming the vast amounts of data that exist in these organizations into useful and reliable information for getting answers to their questions and to support the decision making process. A data warehouse provides the base for the powerful data analysis techniques that are available today such as data mining and multidimensional analysis, as well as the more traditional query and reporting. Making use of these techniques along with data warehousing can result in easier access to the information you need for more informed decision making.

Often we think that a data warehouse is a product, or group of products, that we can buy to help get answers to our questions and improve our decisionmaking capability. But, it is not so simple. A data warehouse can help us get answers for better decision making, but it is only one part of a more global set of processes. As examples, where did the data in the data warehouse come from? How did it get into the data warehouse? How is it maintained? How is the data structured in the data warehouse? What is actually in the data warehouse? These are all questions that must be answered before a data warehouse can be built. We prefer to discuss the more global environment, and we refer to it as data warehousing.

A Solution, Not a Product

The concept of data warehousing has evolved out of the need for easy access to a structured store of quality data that can be used for decision making. It is globally accepted that information is a very powerful asset that can provide significant benefits to any organization and a competitive advantage in the business world. Organizations have vast amounts of data but have found it increasingly difficult to access it and make use of it. This is because it is in many different formats, exists on many different platforms, and resides in many different file and database structures developed by different vendors. Thus organizations have had to write and maintain perhaps hundreds of programs that are used to extract, prepare, and consolidate data for use by many different applications for analysis and reporting. Also, decision makers often want to dig deeper into the data once initial findings are made. This would typically require modification of the extract programs or development of new ones. This process is costly, inefficient, and very time consuming. Data warehousing offers a better approach.

Why Data Warehousing?

Data warehousing implements the process to access heterogeneous data sources; clean, filter, and transform the data; and store the data in a structure that is easy to access, understand, and use. The data is then used for query, reporting, and data analysis. As such, the access, use, technology, and performance requirements are completely different from those in a transaction-oriented operational environment. The volume of data in data warehousing can be very high, particularly when considering the requirements

You might also like