You are on page 1of 12

BT 21 – 01

DATA WAREHOUSING AND MINING

1) Write short notes on:


(1) Operational versus Information data
(2) Multidimensional analysis and OLAP
(3)Operational Team

Ans: 1) Operational versus Information data: Operational data is the data you
use to run your business. This data is what is typically stored, retrieved, and
updated by your online transactional processing (OLTP) system. An OLTP system
may be a reservation system, an accounting application, or an order entry
application.
Operational data is typically stored in a relational database, but
may be stored in legacy hierarchical or flat file formats as well. Some of the
characteristics of operational data include:
 Update often and through online transactions.
 Non-historical data (not more than three to six months old)
 Optimized for transactional processing
 Highly normalized(1) in the relational database for easy update, maintenance

2)Multidimensional analysis and OLAP: Relational database store data in a two


dimensional format: table of data represented by row and columns. Multi-
dimensional analysis solution commonly referred to as On-Line Analysis
Processing (OLTP) solution; offer an extension to relational model to provide a
multi-dimensional view of the data. Multi-dimensional data structures provide
both a mechanism to store the data and a way for a business analyst to view actual
sales versus forecast numbers across the different dimensions in a very timely,
very powerful fashion. Multi-dimensional solution provide the ability to:
 Analyze potentially large amount of data with very fast response times.
 “slice and dice” through the data, and drill down of roll up through various
dimensions as defined by data structure
 Quickly identify trends or problem areas that would otherwise be missed.

3) Operational Team: The overall structure of the business support and management
team should mimic the inherent structure of the business. For most organizations, two
groups are key to success of the project – senior management and “working”
management. These two groups represent the strategic and tactical perspectives,
respectively.
2) Explain the four key components to develop a business-driven data
warehouse.

Ans: Four key components to developing a business-driven data ware house should
revolve around four keys: the organization structure, change facilitation,
expectation management and a proven methodology. Each of these components
has a distinct purpose but must work with the others to support the guiding
principles and philosophy previously mentioned.
A. Organization Structure: The overall structure of the business support and
management team should mimic the inherent structure of the business. For most
organizations, two groups are key to success of the project – senior management
and “working” management. These two groups represent the strategic and
tactical perspectives, respectively.
 A senior level business sponsor/owner must be identified. This individual
must have a respected voice within the organization and the ability and
authority to lead the project and make decisions.
 A steering committee consisting of a manageable number of senior business
executives. This cross-functional group set the longer-term direction of the
data warehouse, decides the priority of applications and has project approval,
funding and acceptance responsibility.
 A user group that contains a select group of “working management” that will
be involved on the details of the project.
A. Managing the Expectation of Users: Data warehouse are good at some things
and not good at others. The key is scope for success, education and
communication. If the business receives what is expected, the probability that
your data warehouse will viewed as a success will be greatly enhanced.
B. Methodology and tools: The fourth key component of our approach is the
methodology and business tools utilized to execute the project. This area of the
methodology must be clearly understood by the business people involved and tie
directly back to the goals of the data warehouse. The following simple tools and
methods have been very successful in ensuring the project stays on track with the
business objectives:
1 1. Requirement gathering:
Performance Measure-Based Requirement. Business performance measures
derived from the strategies, goals, objectives and direction of the business are the
absolute key to the business success of any data warehouse.
Enterprise Requirements Gathering. Even if the initial scope of the data
warehouse is limited, an understanding of the enterprise-performance measures
can provide many short and long-term benefits including positioning the data
warehouse for growth and flexibility as well as helping plan and prioritize future
business direction.
2. Scope for success: Identify a business priority with a manageable scope to get
it done quickly and supply value to the organization. Remember data warehousing
is an iterative process of supplying business value.
3.Business terms for business people: There will be several layers of meta data
within the applications. Providing the business users with a simple interface to the
Meta data will be essential for understanding the data.

3) What is scope statement? Discuss the major elements in the breakdown of a


scope statement.

Ans: One of the impotent techniques the project can use is called as scope statement. It
is written document by which the DSS team the define the job and hand and
different the deliverables role from that rule and these are the major element in the
breakdown of a scope statement.

 Project Title: Every project should have a clear name and description of
what you are trying to accomplish.
 Project Justification: A clear description of why this project is being done.
What is the goal of the project?
 Project Key Deliverables: A list of keys item that must be accomplished so
this project can be completed. What must be done for us to consider the
project done?
 Project Objective: An additional list of success criteria. These items must be
measurable: a good place to put any time, money, or resource constraints.

Think of a scope statement as your first stake in the ground. What’s important is
that a scope statement provides a documented basis for building a common
understanding among all the shareholders of the project at hand.

4) Discuss the probability and risk factor in data warehouse project


implementation with an example.

Ans: Probability and Risk: In life most things result in a bell curve. A sample bell curve
that measures the likelihood of on-time project delivery. The graph measures the
number of projects delivered late, on time, and early.
That is trying to predict a single point will never work. The law of averages works
against us.

Percentage of Project Delivered Days from Expected Delivery

25 33 days early or 35 days late

50 25 days early or 25 days late


Bell curve Data Summary
75 8 days early of 13 days late
Task Subtask Best Case most likely Worst Case
Choosing technical the determine skill set 1.0 3.0 5.0
Editor
Screen candidates 1.0 2.0 3.0

Choose candidates .05 1.0 2.0


Three-Point Time Estimate Worksheet

We should predict project time estimates like we predict rolling dice.


Experience has taught us when a pair of dice is rolled, the most likely number to
come up is seven. When you look at alternatives, the odd of a number other than
seven coming up are less.

Risk Management: This is typically due to one of three situations:

 Assumptions – you get caught by unvoiced assumptions. Which were never


spelled out?
 Constraints – you get caught by restricting factors, which were not fully
understood.
 Unknowns – items you could never predict, be they are acts god of human.

5) What is Bitmapped Index? When bitmap indexes should be used?

Ans: This alternative indexing mechanism has become part of the industry standard for
query intensive application such as DSS/EIS OLAP. Bitmapped indexing involves
building a stream of bits: each bit relates to a column value in a single row of a
table. Support you built a bitmapped index on the PERSON table shown
Using bitmapped indexes, the SQL statement to create the index would be:

Create bitmap index person_region on person (region)

Suppose the region could have the values north, east, west, or south. Oracle
goes through a two-step process in building the index:

1. Scan the table, sort the column values for region, and decides how many
bitmap will be required for the index, based on the number of district values
found in the column. In this case, four unique values exist.
2. Builds the number of bit streams to populate the index decided in step 1. The
REGION column bitmapped index is show in table. 2. The PENSIONED column
bitmapped index. This index is create with the following statement:

Create bitmap index person_pensioned on person (pensioned);

Storage parameters and a tablespace name can be explicitly defined when you
create a bitmap index. The statement that created the REGION index would
become:
Create bitmap index person_region on
Person (region) tablespace indexes_prd
Pctfree 20 pctused 80 initrans 4 maxtrans 16
Storage (initial 1m next 1m pctincrease 0 initrans 4);
Column Data Contained
Pin number
Region varchar2 (1)
Hire_year varchar2 (4)
Pensioned varchar2 (2)

Person table

Row REGION North East West South


Bitmap Bitmap Bitmap Bitmap
1 North 1 0 0 0
2 East 0 1 0 0
3 West 0 0 1 0
4 West 0 0 1 0
5 East 0 1 0 0
6 West 0 0 1 0
7 South 0 0 0 1
8 North 1 0 0 0
9 East 0 1 0 0
10 South 0 0 0 1

REGION Bitmapped Index Entries

Bitmapped Index should be used:


Transaction table in the operational system usually undergo constant record
creation and update activities. The overhead involved in updating a
bitmapped index is higher then for Oracle’s traditional indexing
mechanisms.

TIP:
Column in this type of table that have low cardinality are good candidates
for having bitmapped indexes.
Cardinality is a measurement of the number of unique of unique values in
a column in the table, compared to the total number of rows in that table.
In some circumstances, low cardinality means the number of unique
column values is less than 5 percent of the rows in the table.
If the degree of cardinality of a column is <= .1 percent (yes, 1/10 of 1
percent), it is an ideal candidate for a bitmapped index.
BT 21 – 02

DATA WAREHOUSING AND MINING

1) Explain Critical Path analysis.

Ans: Critical Path Analysis: These critical path are yet another from of risk within a
project. For example, there are inherent dependencies among many project
activities. A chapter must be written before it can be reviewed. A technical editor
must be selected before the editing cycle of a chapter can be completed. These are
examples of dependency analysis.
Lets say that we are working on a project with three unique list of
activities associated with it. Each unique path (A, B, C) of activities is represented
based on its dependencies. Each cell represents the number of days that activity
would take. By adding all the rows together, we can tell the duration of each path.
This is shown in the table below.

Critical Path Analysis

Unique Part > 1 Part > 2 Part > 3 Total


Task

Start A 1 3 4 8 Days

Start B 1 2 3 6 Days

Start C 15 5 20 Days

Start A represents a part of the project with three steps, which will take a total of
eight days. Start B represents a part of the project with the three steps, which will
take a total of six days. Start C represents a path with two steps, which will take a
total of twenty days.
The critical path is Start C. We must begin
this as soon as possible. In fact, this tells us the soonest this project can be done is
20 days. If we do not start the activity that takes 15 days first, it will delay the
entire project ending one day for each day we wait.

2) Write short notes on multidimensional Queries.

Ans: Multidimensional Queries: A multidimensional Queries accesses data by mire


than one dimension, i.e. by more than one column or criteria. In a data warehouse
environment, users rarely want to access data by only one column or dimension,
such as finding the number of customer in the state of CA. They more commonly
want to ask complex questions such as how many customers in the state of CA have
purchased product B and C in the last year, and how does that compare to the year
before. Over 90 percent of data warehousing queries are multidimensional in nature,
using multiple criteria against multiple columns. OMNIDEX multidimensional
indexes are specially designed for optimal multidimensional queries performance.
They provide unlimited multidimensional access, allowing complex queries using
any number of criteria or dimensions for the most flexible data exploration.
OMNIDEX uses index-only processing whenever possible. When a
multidimensional query is requested based on criteria from multiple column of
dimensions, the required values are instantly accessed within the indexes, and the
results joined at the index level.

3) Discuss the responsibilities of a Data Warehouse Project Manager.

Ans: The most responsibilities of a great manager is the ability to keep an eye on
milestones constantly and never lose sight of the project commitments.
Responsibilities:
 This person is not a corporate politician but, instead, a person with strong
team building skills. The project manager understands how to make a group of
people work together as one toward a common goal.
 Must have strong organization and planning goal.
 Must be able to plan and allocate resources. People will quit a project become
burned out over time, or just want a vacation. A good project manager deals
with these common issues so they do not reach crisis proportions.
 Must have strong management skills and understanding of technology. On the
job, the project manager will constantly be dealing with team members who
have varied skill sets. He or she must understand technical and non-technical
jargon with equal ease.
 Must known how to deliver bad news and how to circulate good news.
 Must be able to set up and control a meeting.
 Must control the of the project scope. The project manager must be able to
control scope creep. Scope creep happens when requests come in the will
affect the delivery date of a project. The project manager must know how to
take this problem to the appropriate ears and then he or she must obtain the
necessary documentation to support the approval of the scope creep.
 Must be able to do appropriate risk management. For example, when the
project manager is working with new technology, he or she must understand
that this new technology may affect the project plans.

4) What is the relevance of a Work Breakdown Structure in a Data Warehouse


Project? Explain it with an example and a neat diagram.
Ans: One you completed your project scope statement, another technique we find
useful is called a work breakdown structure. We use this technique to help fill in
any gaps of missing items.
A work breakdown structure is exactly as it sounds a breakdown of all the
work that must be done. This includes all deliverables.

For Example: if you are expected to provide the customer with weekly status
reports, this should be in the structure. If you expect to hold a kick-off meeting,
should also be in the structure. Both these items would fall under the category of
project management.
When we create a work breakdown structure, we show it to our customer and
say:“if you don’t see it in here, don’t assume it’s being done.” To illustrate this
point better.

Project

Project Management Deliverable Deliverable

Deliverable Deliverable

Activity

Activity

Sample work Breakdown Structure

5) Briefly explain Star Queries? How can Star Queries be optimized?

Ans: The star schema is a common method used to store data in the warehouse. We
saw the difference in the way data is stored in the data warehouse compared to
relational operational systems. The DSS query processes large amounts of data
and can be formulated and passed to Oracle in ways never dreamed of by the
personnel who designed and build the warehouse. In operational systems, queries
end up navigating through a series of tables joined together by the common
column columns
In the operational system a query may weave its way through the network
of tables. The Oracle 7 optimizer processes queries based on star schema
differently than classic relational schema. Prior to release 7.3, Oracle, like most
other relational data base vendors, processed joins two tables at a time. Using the
join columns, Oracle would create a subset of the data of the tables being joined,
and move on to processing that subset against another table in the join. It would
move through the tables involved in the join until the table list was exhausted.
Then, armed with the result set from the query, the qualifying data would be
presented. The result set is the collection of the data that has qualified for display
based on the criteria specified in the join condition; it comes from the data in the
tables involved in the join. One of the optimization tricks that assist performance
in Star Queries entails deferring the join against the fact table until the end of the
processing. The fact table usually contains more data than dimension tables,
which adds impetus to this join approach.

6) What are Read–Only tablespace? What are the requirements for Read – Only
tablespace?

Ans: Read–Only Tablespaces are useful for improving the response time for the most
static data in the data warehouse. A tablespace is a collection of one or more data
files that serve as a repository for data in the Oracle 7 Database. A tablespace can
contain one or many data files; each data file can belong to only one tablespace.
When a tablespace is in read-only mode, the data residing in that tablespace
cannot be changed, new data cannot be added and existing cannot be deleted.

The requirements for Read-Only Tablespaces are:-


 The tablespace must be online and have no active transactions
 To take the advantage of read-only tablespaces, the initiation parameter file entry
for compatible must be set to:
Compatible= 7.n.0.0 where the n is the value of 1, 2 or 3.
 The tablespace cannot be the SYSTEM tablespace or any other tablespace
containing an active rollback segment.
 The tablespace cannot be currently in backup mode if you are running the
database with media recovery enabled.

7) What is Data Mining? How does it work? What are the elements of Data
Mining?

Ans: Data Mining can be generally called as data or knowledge discovery or we can
say that it is the process of analyzing data. It allows users to analyze data from
many different perspectives and summarizing it into useful information that can
be used to increase revenue, cut costs, or both. Data Mining software is one of a
number of analytical tools for analyzing data. It allows users to analyze data from
many different dimensions or angles, categorize it, and summarize the
relationships identified. Technically, data mining is the process of finding
correlations or pattern among dozens of fields in large relational databases.
Data mining software analyzes relationships and patterns in stored
transaction data based on open-ended user queries. Several types of analytical
software are available: statistical, machine learning and neural networks.
Generally, any of four types of relationships are sought :-

☻ Classes: Stored data is used to locate data in predetermined groups.


☻ Clusters: data items are grouped according to logical relationships or consumer
preferences.
☻ Associations: Data can be mined to identify associations.
☻ Sequential patterns: Data is mined to anticipate behavior patterns and trends.

Elements of Data Mining:-

☻ Extract, transform and load transaction data onto the data warehouse system.
☻ Store and manage data in a multidimensional database system.
☻ Provide data access to business analyzed and information technology
professionals.
☻ Analyze the data by application software.
☻ Present the data in a useful format, such as a graph or a table.
NAME : …Debabrata Roy

CLASS…..Bsc(IT) 5th semester


ROLL: ……133131180

SUBJECT : Data WareHousing And Mining

CODE : BT 21 (01 & 02)

STUDY CENTER: .. FBMIT Ashok Nagar.

DATE OF SUBMISSION: ……………….


NAME : Pragya Kishore Prashad

CLASS : Bsc (IT ) 5th sem


ROLL: 133131185

SUBJECT : …Data Warehouse &


Mining

Sub code: BT 21 (01 & 02)

STUDY CENTER: … FBMIT

DATE OF SUBMISSION: ……………….

You might also like