Professional Documents
Culture Documents
IBM Corporation
Conduct interviews, brain storming discussions with project team to get additional
requirements.
Gather accurate data by data analysis and functional analysis.
Development of data model:
Create standard abbreviation document for logical, physical and dimensional data models.
Create logical, physical and dimensional data models(data warehouse data modelling).
Document logical, physical and dimensional data models (data warehouse data modelling).
Reports:
Generate reports from data model.
Review:
Review the data model with functional and technical team.
Creation of database:
Create sql code from data model and co-ordinate with DBAs to create database.
Check to see data models and databases are in synch.
Support & Maintenance:
Assist developers, ETL, BI team and end users to understand the data model.
Maintain change log for each data model.
Steps to create a Data Model
These are the general guidelines to create a standard data model and in real time, a data model
may not be created in the same sequential manner as shown below. Based on the enterprises
requirements, some of the steps may be excluded or included in addition to these.
Sometimes, data modeler may be asked to develop a data model based on the existing database.
In that situation, the data modeler has to reverse engineer the database and create a data model.
1 Get Business requirements.
2 Create High Level Conceptual Data Model.
3 Create Logical Data Model.
4 Select target DBMS where data modeling tool creates the physical schema.
5 Create standard abbreviation document according to business standard.
6 Create domain.
7 Create Entity and add definitions.
8 Create attribute and add definitions.
9 Based on the analysis, try to create surrogate keys, super types and sub types.
10 Assign datatype to attribute. If a domain is already present then the attribute should be
attached to the domain.
11 Create primary or unique keys to attribute.
12 Create check constraint or default to attribute.
13 Create unique index or bitmap index to attribute.
This is the actual implementation and extension of a conceptual data model. A Logical data
model is the version of a data model that represents the business requirements(entire or part)
of an organization and is developed before the physical data model.
As soon as the conceptual data model is accepted by the functional team,
development of logical data model gets started. Once logical data model is
completed, it is then forwarded to functional teams for review. A sound logical
design should streamline the physical design process by clearly defining data
structures and the relationships between them. A good data model is created by
clearly thinking about the current and future business requirements. Logical data
model includes all required entities, attributes, key groups, and relationships
that represent business information and define business rules.
In the example, we have identified the entity names, attribute names, and
relationship. For detailed explanation, refer to relational data modeling.
A physical data modeler should know the technical-know-how to create data models from
existing databases and to tune the data models with referential integrity, alternate keys, indexes
and how to match indexes to SQL code. It would be good if the physical data modeler knows
about replication, clustering and so on.
The differences between a logical data model and physical data model is shown below.
Logical vs Physical Data Modeling
Logical Data Model
Represents business information and defines Represents the physical implementation of the
business rules
model in a database.
Entity
Table
Attribute
Column
Primary Key
Alternate Key
Rule
Relationship
Foreign Key
Definition
Comment
Extract, transform, and load (ETL) in database usage and especially in data warehousing
involves:
The advantages of efficient and consistent databases make ETL very important as the way data
actually gets loaded.
This article discusses ETL in the context of a data warehouse, whereas the term ETL can in fact
refer to a process that loads any database.
The typical real-life ETL cycle consists of the following execution steps:
1. Cycle initiation
2. Build reference data
3. Extract (from sources)
4. Validate
5. Transform (clean, apply business rules, check for data integrity, create aggregates)
6. Stage (load into staging tables, if used)
7. Audit reports (for example, on compliance with business rules. Also, in case of failure,
helps to diagnose/repair)
8. Publish (to target tables)
9. Archive
10. Clean up
---
Data Warehouse
Staging area is the place where all transformation, cleansing and enrichment is done before
data can flow further.
The Data is extracted from the source system, by various methods (typically called Extraction)
and is placed in the normalized form into the Staging Area. Once in the Staging Area, data is
cleansed, standardized and re-formatted to make to ready for Loading into the Data-Warehouse
Loaded area. We are going to cover the broad details here. The details of staging can be referred
to in Data Extraction and Transformation Design in Data Warehouse.
Staging Area is important not only for Data Warehousing, bit for host of other applications as
well. Therefore, it has to seen from a wider perspective. Staging is an area where a sanitized,
integrated & detailed data in normalized form exists.
With the advent of Data Warehouse, the concept of Transformation has gained ground, which
provides a high degree of quality & uniformity to the data. The conventional (pre-data
warehouse) Staging Areas used to be plain dumps of the production data. Therefore a Staging
Area with Extraction & Transformation is best of both the worlds for generating quality
transaction level information.
DW vs DataMart
Data Warehouse
A Data Warehouse is the area where the information is loaded in under-normalized Dimensional
Modeling form. This subject has been dealt in fair degree of detail in Data Warehousing/Marting
section. A Data Warehouse is a repository of data, which contains data in a under-normalized
dimensional form ACROSS the enterprise. Following are the features of a Data Warehouse:
A Data-Warehouse is the source for most of the end user tools for Data Analysis, Data
Mining, and strategic planning .
It is supposed to be enterprise wide repository and open to all possible applications of
information delivery.
It contains uniform & standard dimensions and measures. The details of this can be
referred to Dimensional Modeling Concepts.
based analysis is one of the most important applications of a data warehouse. The
methods of dined this is detailed in special situations in Dimensional Modeling.
It contains only the actuals data: This is linked to 'read-only'. As a best practice, all the
non-actual data (like standards, future projections, what-if scenarios) should be managed
and maintained in OLAP and End-user tools
Data Marts
Data Marts are a smaller and specific purpose oriented data warehouse. Data Warehouse is a big
a strategic platform, which needs considerable planning. The difference in Data Warehouse and
Data Marts is like that of planning a city vs. planning a township. Data Warehouse is a mediumlong term effort to integrate and create single point system of record for virtually all applications
and needs for data. Data mart is a short to medium term effort to build a repository for a specific
analysis. The differences between a Data Warehouse vs. Data mart are as follows:
Data Warehouse
Scope & Application
Application Independent
Data Mart
Domain Independent
Specific Domain
Specific Application
Sources
Many Internal & external
Sources
Data - Modeling
4. View
OLTP: focuses on the current data within an enterprise or department.
OLAP: spans multiple versions of a database schema due to the evolutionary process of an
organization; integrates information from many organizational locations and data stores
ETL tools are used to extract, transformation and loading the data into data
warehouse / data mart
OLAP tools are used to create cubes/reports for business analysis from data
warehouse / data mart
ETL tool is used to extract the data and to perform the operation as per our needs
for eg : informatica data mart but OLAP is completely differ from etl process it is
used for generating report also known as reporting tool eg: bo and cognos.
What is BI?
Business Intelligence is a term introduced by Howard Dresner of Gartner Group in
1989. He described Business Intelligence as a set of concepts and methodologies to
improve decision making in business through use of facts and fact based systems.
What is aggregation?
In a data warehouse paradigm "aggregation" is one way of improving query
performance. An aggregate fact table is a new table created off of an existing fact
table by summing up facts for a set of associated dimension. Grain of an aggregate
fact is higher than the fact table. Aggreagate tables contain fewer rows thus making
quesries run faster.
What are the different approaches for making a Datawarehouse?
This is a generic question: From a business perspective, it is very important to first
get clarity on the end user requirements and a system study before commencing
any Data warehousing project. From a technical perspective, it is important to first
understand the dimensions and measures, determine quality and structure of
source data from the OLTP systems and then decide which dimensional model to
apply, i.e. whether we do a star or snowflake or a combination of both. From a
conceptual perspective, we can either go the Ralph Kimball method (build data
marts and then consolidate at the end to form an enterprise Data warehouse) or the
Bill Inmon method (build a large Data warehouse and derive data marts from the
same. In order to decide on the method, a strong understanding of the business
requirement and data structure is needed as also consensus with the customer.
What is staging area?
Staging area is also called Operational Data Store (ODS). It is a data holding place
where the data which is extracted from all the data sources are stored. From the
Staging area, data is loaded to the data warehouse. Data cleansing takes place in
this stage.
What is the difference between star and snowflake schema?
The main difference between star schema and snowflake schema is that the star
schema is highly denormalized and the snowflake schema is normalized. So the
data access latency is less in star schema in comparison to snowflake schema. As
the star schema is denormalized, the size of the data warehouse will be larger than
that of snowflake schema. The schemas are selected as per the client requirements.
Performance wise, star schema is good. But if memory utilization is a major concern,
then snow flake schema is better than star schema.
What is Data mart?
A data mart is a subset of an organizational data store, usually oriented to a
specific purpose or major data subject, that may be distributed to support business
needs.[1] Data marts are analytical data stores designed to focus on specific
business functions for a specific community within an organization. Data marts are
often derived from subsets of data in a data warehouse, though in the bottom-up
data warehouse design methodology the data warehouse is created from the union
of organizational data marts.
gives context to the facts. For example, a sales transaction can be broken up into facts such as the
number of products ordered and the price paid for the products, and into dimensions such as
order date, customer name, product number, order ship-to and bill-to locations, and salesperson
responsible for receiving the order. A key advantage of a dimensional approach is that the data
warehouse is easier for the user to understand and to use. Also, the retrieval of data from the data
warehouse tends to operate very quickly. The main disadvantages of the dimensional approach
are: 1) In order to maintain the integrity of facts and dimensions, loading the data warehouse
with data from different operational systems is complicated, and 2) It is difficult to modify the
data warehouse structure if the organization adopting the dimensional approach changes the way
in which it does business.
In the normalized approach, the data in the data warehouse are stored following, to a degree,
database normalization rules. Tables are grouped together by subject areas that reflect general
data categories (e.g., data on customers, products, finance, etc.) The main advantage of this
approach is that it is straightforward to add information into the database. A disadvantage of this
approach is that, because of the number of tables involved, it can be difficult for users both to 1)
join data from different sources into meaningful information and then 2) access the information
without a precise understanding of the sources of data and of the data structure of the data
warehouse.
These approaches are not mutually exclusive. Dimensional approaches can involve normalizing
data to a degree.
A data warehouse provides a common data model for all data of interest regardless of the
data's source. This makes it easier to report and analyze information than it would be if
multiple data models were used to retrieve information such as sales invoices, order
receipts, general ledger charges, etc.
Prior to loading data into the data warehouse, inconsistencies are identified and resolved.
This greatly simplifies reporting and analysis.
Information in the data warehouse is under the control of data warehouse users so that,
even if the source system data is purged over time, the information in the warehouse can
be stored safely for extended periods of time.
Because they are separate from operational systems, data warehouses provide retrieval of
data without slowing down operational systems.
Data warehouses can work in conjunction with and, hence, enhance the value of
operational business applications, notably customer relationship management (CRM)
systems.
Data warehouses facilitate decision support system applications such as trend reports
(e.g., the items with the most sales in a particular area within the last two years),
exception reports, and reports that show actual performance versus goals.
Cube can (and arguably should) mean something quite specific - OLAP artifacts presented
through an OLAP server such as MS Analysis Services or Oracle (nee Hyperion) Essbase.
However, it also gets used much more loosely. OLAP cubes of this sort use cube-aware query
tools which use a different API to a standard relational database. Typically OLAP servers
maintain their own optimized data structures (known as MOLAP), although they can be
implemented as a front-end to a relational data source (known as ROLAP) or in various hybrid
modes (known as HOLAP)
I try to be specific and use 'cube' specifically to refer to cubes on OLAP servers such as SSAS.
Business Objects works by querying data through one or more sources (which could be relational
databases, OLAP cubes, or flat files) and creating an in-memory data structure called a
MicroCube which it uses to support interactive slice-and-dice activities. Analysis Services and
MSQuery can make a cube (.cub) file which can be opened by the AS client software or Excel
and sliced-and-diced in a similar manner. IIRC Recent versions of Business Objects can also
open .cub files.
To be pedantic I think Business Objects sits in a 'semi-structured reporting' space somewhere
between a true OLAP system such as ProClarity and ad-hoc reporting tool such as Report
Builder, Oracle Discoverer or Brio. Round trips to the Query Panel make it somewhat clunky as
a pure stream-of-thought OLAP tool but it does offer a level of interactivity that traditional
reports don't. I see the sweet spot of Business Objects as sitting in two places: ad-hoc reporting
by staff not necessarily familiar with SQL and provding a scheduled report delivered in an
interactive format that allows some drill-down into the data.
'Data Mart' is also a fairly loosely used term and can mean any user-facing data access medium
for a data warehouse system. The definition may or may not include the reporting tools and
metadata layers, reporting layer tables or other items such as Cubes or other analytic systems.
I tend to think of a data mart as the database from which the reporting is done, particularly if it is
a readily definable subsystem of the overall data warehouse architecture. However it is quite
reasonable to think of it as the user facing reporting layer, particularly if there are ad-hoc
reporting tools such as Business Objects or OLAP systems that allow end-users to get at the data
directly.The term "data mart" has become somewhat ambiguous, but it is traditionally associated
with a subject-oriented subset of an organization's information systems. Data mart does not
explicitly imply the presence of a multi-dimensional technology such as OLAP and data mart
does not explicitly imply the presence of summarized numerical data.
A cube, on the other hand, tends to imply that data is presented using a multi-dimensional
nomenclature (typically an OLAP technology) and that the data is generally summarized as
intersections of multiple hierarchies. (i.e. the net worth of your family vs. your personal net
worth and everything in between) Generally, cube implies something very specific whereas
data mart tends to be a little more general.
I suppose in OOP speak you could accurately say that a data mart has-a cube, has-a
relational database, has-a nifty reporting interface, etc but it would be less correct to say that
any one of those individually is-a data mart. The term data mart is more inclusive.
Figure 1-1 Contrasting OLTP and Data Warehousing Environments
Online transaction processing. OLTP systems are optimized for fast and reliable transaction
handling. Compared to data warehouse systems, most OLTP interactions will involve a relatively
small number of rows, but a larger group of tables.
then use looping and conditional processing as you require. The options are almost limitless.
So I think that PL/SQL could be your ETL solution if you have no other tools available. Another
approach might be writing procedures, packages and functions that may be used by an ETL tool.
This is usually done when complicated transformations cannot be efficiently implemented in the
ETL.
As you can see PL/SQL fits well into any ETL process.
Buy vs. Build
When it comes to ETL tool selection, it is not always necessary to purchase a third-party tool. This
determination largely depends on three things:
Complexity of the data transformation: The more complex the data transformation is, the more
suitable it is to purchase an ETL tool.
Data cleansing needs: Does the data need to go through a thorough cleansing exercise before it
is suitable to be stored in the data warehouse? If so, it is best to purchase a tool with strong data
cleansing functionalities. Otherwise, it may be sufficient to simply build the ETL routine from
scratch.
Data volume. Available commercial tools typically have features that can speed up data
movement. Therefore, buying a commercial product is a better approach if the volume of data
transferred is large.
While the selection of a database and a hardware platform is a must, the selection of an ETL tool
is highly recommended, but it's not a must. When you evaluate ETL tools, it pays to look for the
following characteristics:
Functional capability: This includes both the 'transformation' piece and the 'cleansing' piece.
In general, the typical ETL tools are either geared towards having strong transformation
capabilities or having strong cleansing capabilities, but they are seldom very strong in both. As a
result, if you know your data is going to be dirty coming in, make sure your ETL tool has strong
cleansing capabilities. If you know there are going to be a lot of different data transformations, it
then makes sense to pick a tool that is strong in transformation.
Ability to read directly from your data source: For each organization, there is a different set
of data sources. Make sure the ETL tool you select can connect directly to your source data.
Metadata support: The ETL tool plays a key role in your metadata because it maps the
source data to the destination, which is an important piece of the metadata. In fact, some
organizations have come to rely on the documentation of their ETL tool as their metadata source.
As a result, it is very important to select an ETL tool that works with your overall metadata
strategy.
Cognos
Hyperion
MicroStrategy
At this level, the data modeler attempts to identify the highest-level relationships among the different
entities.
Logical Data Model
Features of logical data model include:
Foreign keys (keys identifying the relationship between different entities) are specified.
At this level, the data modeler attempts to describe the data in as much detail as possible, without regard
to how they will be physically implemented in the database.
In data warehousing, it is common for the conceptual data model and the logical data model to be
combined into a single step (deliverable).
The steps for designing the logical data model are as follows:
1. Identify all entities.
2. Specify primary keys for all entities.
3. Find the relationships between different entities.
Physical considerations may cause the physical data model to be quite different from the logical
data model.
At this level, the data modeler will specify how the logical data model will be realized in the database
schema.
The steps for physical data model design are as follows:
1. Convert entities into tables.
2. Convert relationships into foreign keys.
3. Convert attributes into columns.
4. Modify the physical data model based on physical constraints / requirements.
Hierarchy: The specification of levels that represents relationship between different attributes within a
dimension. For example, one possible hierarchy in the Time dimension is Year Quarter Month
Day.
Fact Table: A fact table is a table that contains the measures of interest. For example, sales amount
would be such a measure. This measure is stored in the fact table with the appropriate granularity. For
example, it can be sales amount by store by day. In this case, the fact table would contain three columns:
A date column, a store column, and a sales amount column.
Lookup Table: The lookup table provides the detailed information about the attributes. For example, the
lookup table for the Quarter attribute would include a list of all of the quarters available in the data
warehouse. Each row (each quarter) may have several fields, one for the unique ID that identifies the
quarter, and one or more additional fields that specifies how that particular quarter is represented on a
report (for example, first quarter of 2001 may be represented as "Q1 2001" or "2001 Q1").
A dimensional model includes fact tables and lookup tables. Fact tables connect to one or more lookup
tables, but fact tables do not have direct relationships to one another. Dimensions and hierarchies are
represented by lookup tables. Attributes are the non-key columns in the lookup tables.
In designing data models for data warehouses / data marts, the most commonly used schema types are
Star Schema and Snowflake Schema.
Star Schema: In the star schema design, a single object (the fact table) sits in the middle and is radially
connected to other surrounding objects (dimension lookup tables) like a star. A star schema can be simple
or complex. A simple star consists of one fact table; a complex star can have more than one fact table.
Snowflake Schema: The snowflake schema is an extension of the star schema, where each point of the
star explodes into more points. The main advantage of the snowflake schema is the improvement in query
performance due to minimized disk storage requirements and joining smaller lookup tables. The main
disadvantage of the snowflake schema is the additional maintenance efforts needed due to the increase
number of lookup tables.
Whether one uses a star or a snowflake largely depends on personal preference and business needs.
Personally, I am partial to snowflakes, when there is a business case to analyze the information at that
particular level.
Aggregation: One way of speeding up query performance. Facts are summed up for selected
dimensions from the original fact table. The resulting aggregate table will have fewer rows, thus making
queries that can use them go faster.
Attribute: Attributes represent a single type of information in a dimension. For example, year is an
attribute in the Time dimension.
Conformed Dimension: A dimension that has exactly the same meaning and content when being
referred from different fact tables.
Data Mart: Data marts have the same definition as the data warehouse (see below), but data marts have
a more limited audience and/or data content.
Data Warehouse: A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection
of data in support of management's decision making process (as defined by Bill Inmon).
Data Warehousing: The process of designing, building, and maintaining a data warehouse system.
Dimension: The same category of information. For example, year, month, day, and week are all part of
the Time Dimension.
Dimensional Model: A type of data modeling suited for data warehousing. In a dimensional model, there
are two types of tables: dimensional tables and fact tables. Dimensional table records information on each
dimension, and fact table records all the "fact", or measures.
Dimensional Table: Dimension tables store records related to this particular dimension. No facts are
stored in a dimensional table.
ETL: Stands for Extraction, Transformation, and Loading. The movement of data from one area to
another.
Fact Table: A type of table in the dimensional model. A fact table typically includes two types of columns:
fact columns and foreign keys to the dimensions.
Hierarchy: A hierarchy defines the navigating path for drilling up and drilling down. All attributes in a
hierarchy belong to the same dimension.
Metadata: Data about data. For example, the number of tables in the database is a type of metadata.
Metric: A measured value. For example, total sales is a metric.
MOLAP: Multidimensional OLAP. MOLAP systems store data in the multidimensional cubes.
OLAP: On-Line Analytical Processing. OLAP should be designed to provide end users a quick way of
slicing and dicing the data.
ROLAP: Relational OLAP. ROLAP systems store data in the relational database.
Snowflake Schema: A common form of dimensional model. In a snowflake schema, different hierarchies
in a dimension can be extended into their own dimensional tables. Therefore, a dimension can have more
than a single dimension table.
Star Schema: A common form of dimensional model. In a star schema, each dimension is represented by
a single dimension table.
---------The following are the typical processes involved in the datawarehousing project cycle.
Requirement Gathering
Physical Environment Setup
Data Modeling
ETL
Performance Tuning
Quality Assurance
Production Maintenance
Incremental Enhancements
This is a very important step in the data warehousing project. Indeed, it is fair to say that the foundation of
the data warehousing system is the data model. A good data model will allow the data warehousing
system to grow easily, as well as allowing for good performance.
In data warehousing project, the logical data model is built based on user requirements, and then it is
translated into the physical data model. The detailed steps can be found in the Conceptual, Logical, and
Physical Data Modeling section.
Part of the data modeling exercise is often the identification of data sources. Sometimes this step is
deferred until the ETL step. However, my feeling is that it is better to find out where the data exists, or,
better yet, whether they even exist anywhere in the enterprise at all. Should the data not be available, this
is a good time to raise the alarm. If this was delayed until the ETL phase, rectifying it will becoming a
much tougher and more complex process.
The ETL (Extraction, Transformation, Loading) process typically takes the longest to develop, and this can
easily take up to 50% of the data warehouse implementation cycle or longer. The reason for this is that it
takes time to get the source data, understand the necessary columns, understand the business rules, and
understand the logical and physical data models.
Possible Pitfalls
There is a tendency to give this particular phase too little development time. This can prove suicidal to the
project because end users will usually tolerate less formatting, longer time to run reports, less functionality
(slicing and dicing), or fewer delivered reports; one thing that they will not tolerate is wrong information.
A second common problem is that some people make the ETL process more complicated than necessary.
In ETL design, the primary goal should be to optimize load speed without sacrificing on quality. This is,
however, sometimes not followed. There are cases where the design goal is to cover all possible future
uses, whether they are practical or just a figment of someone's imagination. When this happens, ETL
performance suffers, and often so does the performance of the entire data warehousing system.
There are three major areas where a data warehousing system can use a little performance tuning:
ETL - Given that the data load is usually a very time-consuming process (and hence they are
typically relegated to a nightly load job) and that data warehousing-related batch jobs are typically
of lower priority, that means that the window for data loading is not very long. A data warehousing
system that has its ETL process finishing right on-time is going to have a lot of problems simply
because often the jobs do not get started on-time due to factors that is beyond the control of the
data warehousing team. As a result, it is always an excellent idea for the data warehousing group
to tune the ETL process as much as possible.
Query Processing - Sometimes, especially in a ROLAP environment or in a system where the
reports are run directly against the relationship database, query performance can be an issue. A
study has shown that users typically lose interest after 30 seconds of waiting for a report to return.
My experience has been that ROLAP reports or reports that run directly against the RDBMS often
exceed this time limit, and it is hence ideal for the data warehousing team to invest some time to
tune the query, especially the most popularly ones.
Report Delivery - It is also possible that end users are experiencing significant delays in receiving
their reports due to factors other than the query performance. For example, network traffic, server
setup, and even the way that the front-end was built sometimes play significant roles. It is
important for the data warehouse team to look into these areas for performance tuning.
QA
Once the development team declares that everything is ready for further testing, the QA team takes over.
The QA team is always from the client. Usually the QA team members will know little about data
warehousing, and some of them may even resent the need to have to learn another tool or tools. This
makes the QA process a tricky one.
Sometimes the QA process is overlooked. On my very first data warehousing project, the project team
worked very hard to get everything ready for Phase 1, and everyone thought that we had met the
deadline. There was one mistake, though, the project managers failed to recognize that it is necessary to
go through the client QA process before the project can go into production. As a result, it took five extra
months to bring the project to production (the original development time had been only 2 1/2
In the OLAP world, there are mainly two different types: Multidimensional OLAP (MOLAP) and Relational
OLAP (ROLAP). Hybrid OLAP (HOLAP) refers to technologies that combine MOLAP and ROLAP.
MOLAP
This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multidimensional cube.
The storage is not in the relational database, but in proprietary formats.
Advantages:
Excellent performance: MOLAP cubes are built for fast data retrieval, and is optimal for slicing
and dicing operations.
Can perform complex calculations: All calculations have been pre-generated when the cube is
created. Hence, complex calculations are not only doable, but they return quickly.
Disadvantages:
Limited in the amount of data it can handle: Because all calculations are performed when the
cube is built, it is not possible to include a large amount of data in the cube itself. This is not to
say that the data in the cube cannot be derived from a large amount of data. Indeed, this is
possible. But in this case, only summary-level information will be included in the cube itself.
Requires additional investment: Cube technology are often proprietary and do not already exist in
the organization. Therefore, to adopt MOLAP technology, chances are additional investments in
human and capital resources are needed.
ROLAP
This methodology relies on manipulating the data stored in the relational database to give the appearance
of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is
equivalent to adding a "WHERE" clause in the SQL statement.
Advantages:
Can handle large amounts of data: The data size limitation of ROLAP technology is the limitation
on data size of the underlying relational database. In other words, ROLAP itself places no
limitation on data amount.
Can leverage functionalities inherent in the relational database: Often, relational database already
comes with a host of functionalities. ROLAP technologies, since they sit on top of the relational
database, can therefore leverage these functionalities.
Disadvantages:
Performance can be slow: Because each ROLAP report is essentially a SQL query (or multiple
SQL queries) in the relational database, the query time can be long if the underlying data size is
large.
Limited by SQL functionalities: Because ROLAP technology mainly relies on generating SQL
statements to query the relational database, and SQL statements do not fit all needs (for
example, it is difficult to perform complex calculations using SQL), ROLAP technologies are
therefore traditionally limited by what SQL can do. ROLAP vendors have mitigated this risk by
building into the tool out-of-the-box complex functions as well as the ability to allow users to
define their own functions.
HOLAP
HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP. For summary-type
information, HOLAP leverages cube technology for faster performance. When detail information is
needed, HOLAP can "drill through" from the cube into the underlying relational data.
OLAP (online analytical processing) is a function of business intelligence software that enables a
user to easily and selectively extract and view data from different points of view. Designed for
managers looking to make sense of their information, OLAP tools structure data hierarchically
the way managers think of their enterprises, but also allows business analysts to rotate that data,
changing the relationships to get more detailed insight into corporate information.
WebFOCUS OLAP combines all the functionality of query tools, reporting tools, and OLAP into
a single powerful solution with one common interface so business analysts can slice and dice the
data and see business processes in a new way. WebFOCUS makes data part of an organization's
natural culture by giving developers the premier design environments for automated ad hoc and
parameter-driven reporting and giving everyone else the ability to receive and retrieve data in
any format, performing analysis using whatever device or application is part of the daily working
life.
WebFOCUS ad hoc reporting and OLAP features allow users to slice and dice data in an almost
unlimited number of ways. Satisfying the broadest range of analytical needs, business
intelligence application developers can easily enhance reports with extensive data-analysis
functionality so that end users can dynamically interact with the information. WebFOCUS also
supports the real-time creation of Excel spreadsheets and Excel PivotTables with full styling,
drill-downs, and formula capabilities so that Excel power users can analyze their corporate data
in a tool with which they are already familiar.