Unit 9 Online Anal ytical Processing (OLAP) Structure 9.1 Introduction Objectives 9.2 Overview of OLAP 9.2.1 Definition of OLAP 9.2.2 Origin of OLAP Self Assessment Question(s) (SAQs) 9.3 Significance of OLAP 9.3.1 Benefits of OLAP 9.3.2 Characteristics of OLAP Self Assessment Question(s) (SAQs) 9.4 Features and Functions of OLAP Self Assessment Question(s) (SAQs) 9.5 OLAP Models 9.5.1 MOLAP Model 9.5.2 ROLAP Model Self Assessment Question(s) (SAQs) 9.6 Summary 9.7 Terminal Questions (TQs) 9.8 Multiple Choice Questions (MCQs) 9.9 Answers to SAQs, TQs, and MCQs 9.9.1 Answers to Self Assessment Questions (SAQs) 9.9.2 Answers to Terminal Questions (TQs) 9.9.3 Answers to Multiple Choice Questions (MCQs) 9.1 Introduction As the information needs of the user community grew, the complex nature of the users request also grew. Over the time, the types of questions that the Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 202 users asked could no longer be served with the standard relational tool set. Thus the need to answer more complex questions finally led to the development of (online analytical processing) OLAP tools. Using OLAP, business organizations can analyze the data in a data warehouse in all possible ways, including budgeting, planning, stimulation, data warehouse reporting, and trend analysis. Objectives: The objectives of the Unit are to make you understand: The significance of OLAP Characteristics of OLAP systems Functions of OLAP systems OLAP models; MOLAP and ROLAP OLAP implementation considerations 9.2 Overview of OLAP OLAP stands for online analytical processing. As the name states, OLAP has to do with the processing of the data as it is manipulated for the analysis. The data warehouse provides the best opportunity for analysis and OLAP acts as a vehicle to carry out such an analysis. It allows you to look at the data in many dimensions. 9.2.1 Definition of OLAP OLAP is a category of software technology that enables the managers to gain insight into data through fast, consistent, interactive access in a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensions of the enterprise as understood by the user. Thus the key elements of an OLAP system are speed, consistency, interactive access and multiple dimensional views. To understand in simple terms, OLAP is a technical term for multi-dimensional analysis. Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 203 9.2.2 Origin of OLAP The term OLAP was introduced in a paper presentation Providing On-line Analytical Processing to User Analysts, by Dr. E.F. Codd, published in the year 1993. In the paper, Codd discussed twelve rules or guidelines for an OLAP system. The guidelines form the yardstick for measuring any sets of OLAP tools and products. The twelve guidelines for an OLAP system are as follows: Multidimensional Conceptual View: The OLAP system has to provide a multidimensional data model that is analytical and easy to use. It has to support slice-and-dice operations and is usually required in financial modeling. Transparency: These systems need to be part of an open system that supports heterogeneous data sources. Also, the end-user need not necessarily be concerned about the details of data access or conversions. Accessibility: The OLAP system should present the user with a single logical schema of the data. It has to map its own logical schema to the heterogeneous physical data stores and perform any necessary transformations. Consistent Reporting Performance: The users of the system should not experience any significant degradation in reporting performance as the number of dimensions or the size of the database increases. Users need to perceive consistent run time, response time, or machine utilization every time a given query is run. Client/Server Architecture: The system has to have conformance to the principles of client/server architecture for optimum performance, flexibility, adaptability, and interoperability. Also, the server component needs to be sufficiently intelligent to enable various clients to be attached with minimum effort. Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 204 Generic Dimensionality: The system has to ensure that every data dimension is equivalent in both structure and operational capabilities. We should be able to apply the function of one dimension to another too. Dynamic Sparse Matrix Handling: This guideline is related to the idea of nulls in relational databases and to the notion of compressing large files, and a sparse matrix is one in which not every cell contains data. So the OLAP systems should accommodate varying storage and data- handling options. Multi-user Support: Similar to EIS systems, the OLAP systems need to support multiple concurrent users, including their individual views and/or slices of a common database. Unrestricted Cross-dimensional operations: The OLAP system should have the ability to recognize dimensional hierarchies and automatically perform roll-up and drill-down operations within a dimension or across dimensions. Intuitive data manipulation: The system should enable consolidation path reorientation (pivoting), drill-down and roll-up, and other manipulations to be accomplished intuitively and directly via point-and- click and drag-and-drop actions. Flexible Reporting: The system should enable its users arrange columns, rows, and cells in a manner that facilitates easy manipulation, analysis, and synthesis of information. Unlimited Dimensions and Aggregation Levels: The system is expected to accommodate at least fifteen (preferably twenty) data dimensions within a common analytical model. Later in 1995, Codd included the following six requirements in addition to the above twelve basic guidelines: Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 205 Drill-through to Detail level: The system has to allow a smooth transition from the multidimensional, pre-aggregated database to the detail record level of the source data warehouse repository. Treatment of Non-normalized Data: The system should prohibit calculations made within it from getting affected by the external data serving as the source. Storing OLAP Results: The OLAP system should not deploy write- capable OLAP tools on top of transactional systems. Missing Values: The system should be able to ignore the missing values, irrespective of their source. Incremental Database Refresh: The system has to provide for incremental refreshes of the extracted and aggregated OLAP data. SQL Interface: The OLAP system should have the ability to get integrated into the existing enterprise environment. The first product which performed OLAP queries was in 1970 by IRI's Express (which was taken over by Oracle in 1995). But the term did not appear till 1993 when it was coined by Codd, who has been described as "the father of the relational database". His paper resulted from a short consulting assignment which he undertook for former Arbor Software (now Hyperion Solutions). The OLAP market experienced strong growth in late 90s with dozens of commercial products going into market. In 1998, Microsoft released its first OLAP Server - Microsoft Analysis Services, which drove wide adoption of OLAP technology and moved it into mainstream. In the mid 2000, the Open Source OLAP market began to establish itself, with several companies springing up with offers. Self Assessment Question(s) (SAQs) For Section 9.2 1. What are the guidelines provided by Dr. E.F.Codd for an OLAP system? Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 206 9.3 Significance of OLAP The top-down and bottom-up approaches of building the data warehouses has its own advantages and disadvantages. But there is a better practical approach that involves in building a conglomeration of supermarts with conformed and standardized data content. But all these traditional methods of analysis provided in a data warehouse are not sufficient. Therefore, the OLAP systems came into the picture so as to fulfill the following needs and requirements of its users: Decision makers are no longer satisfied with one-dimensional queries such as How many products of A were sold in Store X? The decision makers require answers for complex questions viz., What is the impact of the promo on sales of Product A for a specific time period by individual stores, Is there any shift in the brand loyalty of a specific product among a set of customers?. For effective analysis, the users need an environment that presents a multi-dimensional view of data, providing the foundation for analytical processing through easy and flexible access to information. Without a solid system for a true multidimensional analysis, the data warehouse becomes definitely incomplete. The system should be able to recognize metrics along several dimensions and allow data to be viewed from different perspectives. Also, it should be able to drill down or roll up along with each dimension. Irrespective of the complexity of the query, the query and analysis system must have consistent response times. The system should be capable of applying mathematical formulas and calculations to measures. 9.3.1 Benefits of OLAP Following are the important benefits of the OLAP systems to its business users: Enhanced productivity of business managers, executives and analysts Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 207 OLAP users can be self-sufficient in running their own analysis without IT assistance, which results in reduction of backlogs Faster delivery of IT applications Reduces time on query executions and in network traffic Ability to model real-world challenges with business metrics and dimensions. 9.3.2 Characteristics of OLAP The fundamental characteristics of an OLAP system are as follows: Able to let the business users have a multi-dimensional and logical view of data in the data warehouse Perform intricate calculations, comparisons, and aggregation of metrics Facilitate interactive query and complex analysis Able to present the results in meaningful ways, viz., charts and graphs. Self Assessment Question(s) (SAQs) For Section 9.3 1. Define an OLAP system and discuss its fundamental characteristics? 2. Discuss the benefits of the OLAP systems to its business users? 9.4 Features and Functions of OLAP Fundamentally, OLAP is an information delivery system for the data warehouse. But the system complements the data warehouse by lifting the information delivery capabilities to new heights. The important features and functions of OLAP systems are discussed below. Basic features: Multidimensional analysis Drill-down and roll-up Multiple view modes Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 208 Consistent performance Navigation in and out of details Easy scalability Fast response time for interactive queries Slice-and-dice or rotation Time intelligence (year-to-date, fiscal period) Advanced features: Powerful calculations Drill-through across dimensions or details Derived data values through formulas Cross-dimensional calculations Sophisticated presentation & displays Application of alert technology Pre-calculation or pre-consolidation Collaborative decision making Report generation with agent technology The dimensional analysis is an important feature of an OLAP tool. OLAP tools or systems make use of hypercubes to make multi-dimensional analysis when there are more than three dimensions. These hypercubes provide a method for representing views with more dimensions. The significant aspects of this multi-dimensional analysis would include drill- down and roll-up exercises, and slice and dice operations. These analyses enable the user to examine the summary numbers and to explore the components in the summary. Self Assessment Question(s) (SAQs) For Section 9.4 1. What are the important functions and features of online analytical processing (OLAP) systems? Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 209 9.5 OLAP Models The OLAP models can be broadly divided into two categories: ROLAP and MOLAP. ROLAP stands for relational online analytical processing and MOLAP stands for multidimensional online analytical processing. In either case, the information interface is still OLAP. In the MOLAP model, online analytical processing is best implemented by storing the data multi- dimensionally, that is easily viewed in a multidimensional way. Here the data structure is fixed so that the logic to process multidimensional analysis can be based on well-defined methods of establishing data storage coordinates. There is another variation, DOLAP. DOLAP stands for desktop online analytical processing. DOLAP is a variation of ROLAP. It is meant to provide portability to users of online analytical processing. The processing is still online analytical processing; only the storage methodology is different. In the DOLAP methodology, the multi-dimensional datasets are created and transferred to the desktop machine, requiring only the DOLAP software to exist on that machine. 9.5.1 MOLAP Model This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a multi-dimensional cube. The storage is not in the relational database, but in proprietary formats. For example, an array is represented by the values (ProductA,2001/01,StoreS1,Channel05) to store sales number of 500 units for product A, in month number 2001/01, in storeS1, under distributing channel05. The array values indicate the location of the cells and these cells are intersections of the values of dimension attributes. If you note how the cells are formed, you will realize that not all cells have values of metrics. If a store is closed on Sundays, then the cells representing Sundays will all be zero. Advantages: MOLAP cubes are built for fast data retrieval, and are optimal for slicing and dicing operations. Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 210 All calculations have been pre-generated when the cube is created. Hence, complex calculations can not only be done, but they return quickly. Disadvantages: Since all calculations are performed when the cube is built, it is not possible to include a large amount of data in the cube itself. This is not to say that the calculations in the cube cannot be derived from a large amount of data. Indeed, this is possible. But in this case, only summary- level information will be included in the cube itself. Cube technology is often proprietary and does not already exist in the organization. Therefore, to adopt MOLAP technology, additional investments in human and capital resources may be needed. 9.5.2 ROLAP Model This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing function. In this model, data is stored as rows and columns in relational form. This model presents data to the users in the form of business dimensions. In order to hide the storage structure to the user and present data multi-dimensionally, a semantic layer of metadata is created. The metadata layer supports the mapping of dimensions to the relational tables. Additional metadata supports summarizations and aggregations. A true ROLAP has the following three distinct characteristics; supporting all the basic OLAP features and functions, storing of the data in a relational form, and supporting some form of aggregation. Advantages: ROLAP itself places no limitation on data amount. Often, relational database already comes with a host of functions. As ROLAP technologies, sit on top of the relational database, they can leverage these functions. Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 211 Disadvantages: Since each ROLAP report is essentially a SQL query (or multiple SQL queries) in the relational database, the duration of the query time can be long if the underlying data size is large. Since ROLAP technology mainly relies on generating SQL statements to query the relational database, and since SQL statements do not fit all needs (for example, it is difficult to perform complex calculations using SQL), ROLAP technologies are traditionally limited compared to what SQL can do. The choice between ROALP and MOLAP depends on the complexity of queries from users. MOLAP is the choice for faster response and more intensive queries. But the MOLAP databases have a limit to the physical size of the data set that they can handle. There is also limit to the number of dimensions they can handle and still provide you with reasonable performance. But ROLAP has the advantage of running against large data sets. Self Assessment Question(s) (SAQs) For Section 9.5 1. Distinguish MOLAP and ROLAP models of online analytical processing systems. 9.6 Summary OLAP is a category of software technology that enables the managers to gain insight into data through fast, consistent, interactive access in a wide variety of possible views of information that has been transformed from raw data to reflect the real dimension of the enterprise as understood by the user. Dr. E.F.Codd, who used the word OLAP for the first time, came out with the following twelve rules or guidelines for an OLAP system: Multidimensional conceptual view, Transparency, Accessibility, Consistent reporting performance, client/server architecture, Generic dimensionality, Dynamic Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 212 sparse matrix handling, Multi-user support, Unrestricted cross-dimensional operations, Intuitive data manipulation, Flexible reporting, Unlimited dimensions and aggregation levels. Later in 1995, Codd included another six requirements Drill-through to detail level, Treatment of non-normalized Data, Storing OLAP results, Missing values, Incremental database refresh, and SQL Interface. Use of OLAP systems is significant in making business decisions because of these reasons: need for multi-dimensional analysis, ability to provide fast access and powerful calculations. Drill-down and roll-up, and Slice and dice (or Rotation) are the most important aspects of multi-dimensional analysis. The OLAP models can be broadly divided into two categories: ROLAP and MOLAP. ROLAP stands for relational online analytical processing and MOLAP stands for multidimensional online analytical processing. In MOLAP, data is stored in a multidimensional cube. In ROLAP model, data is stored as rows and columns in relational form. This model presents data to the users in the form of business dimensions. 9.7 Terminal Questions (TQs) 1. Compare the characteristics of an OLAP system with a OLTP system and a data warehouse system? 2. The contribution of Dr. E. F. Codd towards developing the online analytical processing systems is commendable. Comment. 3. Discuss the salient features of MOLAP and ROLAP systems and explain the significance of each of these models. 9.8 Multiple Choice Questions (MCQs) 1. Which of the following categories of software technology enables the managers to access the information in a wide variety of views? a. OLTP b. OLAP c. Data mining d. None of the above. Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 213 2. Who among the following introduced the concept of OLAP? a. E.F. Codd b. Inmon c. J .W. Winksey d. Robinson 3. Which of the following benefits is derived by an organization through the use of OLAP system? a. To perform multidimensional analysis b. Fast access and powerful calculations c. Both (a) and (b) d. None of the above. 4. DOLAP stands for a. Derived Online Analytical Processing b. Disk Oriented Application Programming c. Design Oriented Online Analytical Programming d. Desktop Online Analytical Programming. 5. As per the way the basic data is stored, OLAP models are categorized into a. ROLAP and DOLAP b. ROLAP and MOLAP c. MOLAP and DOLAP d. None of the above. 6. Which of the following provide a method for representing views with more dimensions? a. Hyper cubes b. Multi-dimensional databases (MDBSs) c. OLAP engines d. DOLAP systems Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 214 7. Which of the following statements is incorrect? a. OLAP is defined on the basis of Codds twelve guidelines b. OLAP characteristics include multi-dimensional view of the data c. Hyper cubes represent views with more dimensions d. None of the above. 8. ROLAP stands for a. Rational Online Analytical Processing b. Rational Online Analytical Programming c. Relational Online Analytical Programming d. Relational Online Analytical Processing. 9. Which of the following statements is true? a. Dimensional analysis is confined to three dimensions that can be represented by a physical cube b. DOLAP is a variant of ROLAP c. ROLAP, MOLAP, DOLAP are the important categories of OLAP d. All the statements. 10. MOLAP stands for a. Multi-level online analytical programming b. Multi-dimensional online analytical processing c. Multi-dimensional online analytical programming d. Multi-level online analytical processing 11. Which of the following is not a guideline for an OLAP system as proposed by Dr. E.F. Codd in his 12 rules? a. Generic dimensionality b. Flexible reporting c. SQL interface d. Intuitive data manipulation Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 215 12. Which of the following is a requirement added by Dr. E.F. Codd in addition to his twelve basic guidelines? a. Treatment of normalized data b. Intuitive data manipulation c. Multi-user support d. Client/server architecture 13. Which of the following are used by the OLAP systems in the multi- dimensional analysis? a. Hypercubes b. Hetro cubes c. Metro views d. Hyper views 14. Which of the following is (are) the significant aspect(s) of the multi- dimensional analysis? a. Drill-down and roll-up b. Slice and dice c. Both (a) and (b) d. None of the above 9.9 Answers to SAQs, TQs, and MCQs 9.9.1 Answers to Self Assessment Questions (SAQs) Section 9.2 1. Dr. E.F. Codd, who used the word OLAP for the first time, came out with the following twelve rules or guidelines for an OLAP system: Multidimensional conceptual view, Transparency, Accessibility, Consistent reporting performance, client/server architecture, Generic dimensionality, Dynamic sparse matrix handling, Multi-user support, Unrestricted cross-dimensional operations, Intuitive data manipulation, Flexible reporting, Unlimited dimensions and aggregation levels. Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 216 Later in 1995, Codd included another six requirement; Drill-through to detail level, Treatment of non-normalized Data, Storing OLAP results, Missing values, Incremental database refresh, and SQL Interface. Section 9.3 1. OLAP is a category of software technology that enables the managers to gain insight into data through fast, consistent, interactive access in a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. The fundamental characteristics of an OLAP system include ability to let the business users have a multi-dimensional and logical view of data in the data warehouse, perform intricate calculations, comparisons, and aggregation of metrics, facilitate interactive query and complex analysis, and to present the results in meaningful ways, viz., charts and graphs. 2. The basic benefits of the OLAP systems to its business users are enhanced productivity, self-sufficiency in running their own analysis without IT assistance, faster delivery of IT applications, reduced time on query executions and ability to model real-world challenges with business metrics and dimensions. For Section 9.4 1. You may discuss the basic features, advanced features of the OLAP systems along with its ability to perform drill-down and roll-up, slice and dice operations. For Section 9.5 1. The OLAP models can be broadly divided into two categories: ROLAP and MOLAP. ROLAP stands for relational online analytical processing and MOLAP stands for multidimensional online analytical processing. In MOLAP, data is stored in a multidimensional cube. In ROLAP model, data is stored as rows and columns in relational form. This model presents data to the users in the form of business dimensions. You can discuss the differences between these models as provided in the Section 9.5. Business Intelligence and Tools Unit 9 Sikkim Manipal University Page No. 217 9.9.2 Answers to Terminal Questions (TQs) 1. The characteristics of the OLAP tools in comparison with those of a data warehouse and OLTP systems are provided below. S# Characteristics OLAP OLTP Data Warehouse 1. Basic Operation Update Report Analyze 2. Level of Analytical Requirements Low Medium High 3. Data per Transaction Very small Small to Large Large 4. Type of Data Detailed Details and Summary Summary 5. Timeliness of Data Must be current Current & Historical Current and Historical 2. You can discuss the initial 12 rules prescribed by Codd and then the additional 6 requirements. Also, discuss how each of these rules plays a significant role in making OLAP tools beneficial for their business users. 3. You may discuss the advantages and limitations of both models of OLAP systems; ROLAP and MOLAP as discussed in the Section 9.5. 9.9.3 Answers to Multiple Choice Questions (MCQs) 1. Ans: b 2. Ans: a 3. Ans: c 4. Ans: d 5. Ans: b 6. Ans: a 7. Ans: d 8. Ans: d 9. Ans: b 10. Ans: b 11. Ans: c 12. Ans: a 13. Ans: a 14. Ans: c