You are on page 1of 7

Q1) OLAP is an approach to swiftly answer multi-dimensional analytical queries.

[1] OLAP is part


of the broader category of business intelligence, which also encompasses relational
reporting anddata mining.[2] Typical applications of OLAP include business reporting for
sales, marketing, management reporting, business process management (BPM),
[3]
budgeting and forecasting, financial reporting and similar areas, with new applications coming
up, such as agriculture.[4] The term OLAP was created as a slight modification of the traditional
database term OLTP (Online Transaction Processing).
The core of any OLAP system is an OLAP cube (also called a 'multidimensional cube' or
a hypercube). It consists of numeric facts called measures which are categorized by dimensions.
The cube metadata is typically created from a star schema or snowflake schema of tables in
a relational database. Measures are derived from the records in the fact table and dimensions are
derived from thedimension tables. Each measure can be thought of as having a set of labels, or
meta-data associated with it. A dimension is what describes these labels; it provides information
about the measure.

Q2)
OLTP System OLAP System
Online Transaction Online Analytical
Processing Processing
(Operational System) (Data Warehouse)
Operational data; OLTPs are the original Consolidation data; OLAP data comes
Source of data
source of the data. from the various OLTP Databases
To control and run fundamental business To help with planning, problem solving,
Purpose of data
tasks and decision support
Reveals a snapshot of ongoing business Multi-dimensional views of various
What the data
processes kinds of business activities
Short and fast inserts and updates Periodic long-running batch jobs refresh
Inserts and Updates
initiated by end users the data
Relatively standardized and simple Often complex queries involving
Queries
queries Returning relatively few records aggregations
Depends on the amount of data
involved; batch data refreshes and
Processing Speed Typically very fast complex queries may take many hours;
query speed can be improved by
creating indexes
Larger due to the existence of
Can be relatively small if historical data is
Space Requirements aggregation structures and history
archived
data; requires more indexes than OLTP
Typically de-normalized with fewer
Database Design Highly normalized with many tables tables; use of star and/or snowflake
schemas
Backup religiously; operational data is Instead of regular backups, some
Backup and critical to run the business, data loss is environments may consider simply
Recovery likely to entail significant monetary loss reloading the OLTP data as a recovery
and legal liability method

Q3) Orange Tool-: It’s a component based mining and machine learning s/w suite feature which
friendly yet powerful and flexible visual programming front end for explorative data analysis and
visualization, python binding and libraries for scripting.
It includes comprehensive set of components for both data preprocessing features like scaling,
filtering, modeling, evolution and reporting tech.
5. Its implanted with c++’s speed and python’s flexibility
6. Its GUI builds upon cross platform framework.
7. Orange is distributed for under the APL.
8. Its maintained and developed at facility of computer & information services.
Q4) WAP to select a file and some attributes.
Ans) Input -: None.
Output-: Example ( example table). Attribute valid dataset read from the input file.
Description-: It reads the i/p data and sends the dataset to the o/p channel. It maintains
the most recently accessed data files.
SELECT ATTRIBUTE -:
Input-: Example ( example table).
Attribute valid dataset.
Output-: Examples
Attribute valid dataset composing domain specific using the widget.
Description-: selects attribute widgets used normally to compose your data domain that is
shown.
Data Table-:
Input-: Example (example table).
Attribute valid dataset.
Output-: None
Description-: Data-table widget take some or more data sets/omits i/p and presents
them in a spreadsheet.
Steps 
9. Select a file from tabs.
10. d/c on file icon, select adult tab from file option.
11. Click on reload button & also the window.
12. Click on select attribute from data tab.
13. Connect the file icon o/p to i/p of select attribute icon.
14. d/c the select attribute icon.
15. A “select attribute” window will open. Select the attribute that you want to filter
out & click supply button.
16. Select data table from the data tab to create one.
17. Connect the o/p of select attribute icon to the i/p of the data table.
18. d/c the data table icon and view the selected tables’ data.

Q5) Decide whether an attribute will be used and how.


Ans) Input-: Example (example table).
Attribute valid dataset.
Output-: None
Description-: Data-table widget take some or more data sets/omits i/p and presents
them in a spreadsheet.
Steps 
19. Select a file from tabs.
20. d/c on file icon, select adult tab from file option.
21. Click on reload button & also the window.
22. Click on select attribute from data tab.
23. Connect the file icon o/p to i/p of select attribute icon.
24. d/c the select attribute icon.
25. A “select attribute” window will open. Select the attribute that you want to
filter out & click supply button.
26. Select data table from the data tab to create one.
27. Connect the o/p of select attribute icon to the i/p of the data table.
28. d/c the data table icon and view the selected tables’ data.
Q6) Select data from a particular table that fulfill some condition.
Ans) Input-: Attribute value dataset composed from instances from input dataset tat matches
user-defined condition
Description-: This widget allows the user to selct a subset of the data from i/p dataset that
matches the condition defined over the set of data & information attributes that matches the
selection rule are placed on the o/p channel.
Steps 
29. Select file from the data tab and select core tab from the browser button.
30. Click on select button of data tab.
31. d/c on select data icon & enter condition to be checked to select data.
32. Create a data table to wire the data.
33. Connect the o/p file to the input of select data & the o/p of select data to i/p of data table.

Q 7) Data mining: Data mining, the extraction of hidden predictive information from large
databases, is a powerful new technology with great potential to help companies focus on
the most important information in their data warehouses. Data mining tools predict future
trends and behaviors, allowing businesses to make proactive, knowledge-driven
decisions. The automated, prospective analyses offered by data mining move beyond the
analyses of past events provided by retrospective tools typical of decision support
systems. Data mining tools can answer business questions that traditionally were too time
consuming to resolve. They scour databases for hidden patterns, finding predictive
information that experts may miss because it lies outside their expectations.
Most companies already collect and refine massive quantities of data. Data mining
techniques can be implemented rapidly on existing software and hardware platforms to
enhance the value of existing information resources, and can be integrated with new
products and systems as they are brought on-line. When implemented on high
performance client/server or parallel processing computers, data mining tools can analyze
massive databases to deliver answers to questions such as, "Which clients are most likely
to respond to my next promotional mailing, and why?"
This white paper provides an introduction to the basic technologies of data mining.
Examples of profitable applications illustrate its relevance to today’s business
environment as well as a basic description of how data warehouse architectures can
evolve to deliver the value of data mining to end users.

Q 8) Data warehouse;
Q 9)Database and Datawarehouse
Application databases are OLTP (On-Line Transaction Processing)
systems where every transaction has to be recorded, and super-fast at that.
Consider the scenario where a bank ATM has disbursed cash to a customer
but was unable to record this event in the bank records. If this started
happening frequently, the bank wouldn't stay in business for too long. So the
banking system is designed to make sure that every trasaction gets recorded
within the time you stand before the ATM machine. This system is write-
optimized, and you shouldn't crib if your analysis query (read operation)
takes a lot of time on such a system.
A Data Warehouse (DW) on the other end, is a database (yes, you
are right, it's a database) that is designed for facilitating querying and
analysis. Often designed as OLAP (On-Line Analytical Processing) systems,
these databases contain read-only data that can be queried and analysed far
more efficiently as compared to your regular OLTP application databases. In
this sense an OLAP system is designed to be read-optimized.
Separation from your application database also ensures that your
business intelligence solution is scalable (your bank and ATMs don't go down
just because the CFO asked for a report), better documented and managed
(god help the novice who is given the application database diagrams and
asked to locate the needle of data in the proverbial haystack of table
proliferation), and can answer questions far more efficietly and frequently.
Creation of a DW leads to a direct increase in quality of analyses as the
table structures are simpler (you keep only the needed information in
simpler tables), standardized (well-documented table structures), and often
denormalized (to reduce the linkages between tables and the corresponding
complexity of queries). A DW drastically reduces the 'cost-per-analysis' and
thus permits more analysis per FTE. Having a well-designed DW is the
foundation successful BI/Analytics initiatives are built upon.
If you are still running your reports off the main application
database, answer this simple question: Would the solution still work next year
with 20% more customers, 50% more business, 70% more users, and 300%
more reports? What about the year after next? If you are sure that your
solution will run without any changes, great!! However, if you have already
budgeted to buy new state-of-the-art hardware and 25 new Oracle licenses
with those partition-options, and the 33 other cool-sounding features, good
luck to you. (You can probably send me a ticket to Hawaii, since it's gonna
cost you just a minute fraction of your budget)
It's probably simpler and more sensible to create a new DW
exclusively for your BI needs. And if you are cash strapped, you could easily
do that at extremely low costs by using excellent open source databases like
MySQL.
"A computer database is a structured collection of records or data that is
stored in a computer system so that a computer program or person using a
query language can consult it to answer queries[1]. The records retrieved in
answer to queries are information that can be used to make decisions."
A data warehouse is a specially setup database designed to hold
large amounts of data for reporting purposes. While a normal database is
optomized for transactional activity (while keeping a small amount of history)
a data warehouse will be optomized for large scale reporting.
Within a data warehouse data from several systems will typically
be merged together to present a global enterprise view. Data warehouses will
also typically keep a very long history from several years to the entire life of
the company so that very long term trends can be viewed.
Data warehouse identifies a number of characteristics that
differentiate warehouses and marts from conventional operational databases.
Virtually all of these have some impact on data modeling.
Database is an integrated collection of logically related records or
files consolidated into a common pool that provides data for one or more
multiple uses.

Q 10) Logistic regression:


In statistics, logistic regression (sometimes called the logistic model or
logit model) is used for prediction of the probability of occurrence of an
event by fitting data to a logit function logistic curve. It is a generalized linear
model used for binomial regression. Like many forms of regression analysis, it
makes use of several predictor variables that may be either numerical or
categorical. For example, the probability that a person has a heart attack
within a specified time period might be predicted from knowledge of the
person's age, sex and body mass index. Logistic regression is used
extensively in the medical and social sciences fields, as well as marketing
applications such as prediction of a customer's propensity to purchase a
product or cease a subscription.
An explanation of logistic regression begins with an explanation of the logistic function:

A graph of the function is shown in figure 1. The input is z and the output is ƒ(z). The
logistic function is useful because it can take as an input any value from negative infinity
to positive infinity, whereas the output is confined to values between 0 and 1. The
variable z represents the exposure to some set of independent variables, while ƒ(z)
represents the probability of a particular outcome, given that set of explanatory variables.
The variable z is a measure of the total contribution of all the independent variables used
in the model and is known as the logit.
The variable z is usually defined as

where β0 is called the "intercept" and β1, β2, β3, and so on, are called the "regression
coefficients" of x1, x2, x3 respectively. The intercept is the value of z when the value of all
independent variables is zero (e.g. the value of z in someone with no risk factors). Each
of the regression coefficients describes the size of the contribution of that risk factor. A
positive regression coefficient means that the explanatory variable increases the
probability of the outcome, while a negative regression coefficient means that the
variable decreases the probability of that outcome; a large regression coefficient means
that the risk factor strongly influences the probability of that outcome, while a near-zero
regression coefficient means that that risk factor has little influence on the probability of
that outcome.
Logistic regression is a useful way of describing the relationship between one or more
independent variables (e.g., age, sex, etc.) and a binary response variable, expressed as a
probability, that has only two values, such as having cancer ("has cancer" or "doesn't
have cancer") .
Q 11) Program to show outlier:
Select Data
Input: select any file

Steps:
34.Select a file from data menu load file.table.
35.Click on unsupervised tool & select EXAMPLE DISTANCE from its
menu
36.Connect file @ example distance.
37. Click and open example distance
Select
DISTANCE METRICS
Euclidean
EXAMPLE LABEL
Sepal length
38.Again from data select outliers and connect it to EXAMPLE
DISTANCE.
39.Click on outlier
Eg. DISTANCE METRIC
Euclidean
NEAREST NEIGHBOUR
OUTLIER
OUTLIER 2:4.0
40.Select data tables and connect it to outliers, now a table will appear
, there we can connect outlier , inliers, or either one of them with
Ex. As shown in figure
41.Now open data table . It will give the result for outlier. this is in the
figure

Q 12) to show the association rules


INPUT: select any file eg. Adult.file

Steps:
42.Select a file from data table load file in it.
43.Select associate table & click on association rules from it. Connect
FILE & ASSOCIATION RULE.
44.Double click on association rule
Set the values for
Minimal support
Minimal confidence,
Maximal no. of rules
45.Select association rules filter icon from association and connect it to
association rules.
46.Select association rule explorer and connect it to association rule
file.

Q 13) Aim- to study- Scatter plot, Distribution, attribute statistics

Steps-
47.Select a file
a. Eg. Adult.tab
48.Now for VISUALIZE tab :
b. Select
DISTRIBUTIONS
ATTRIBUTE STATISTICS
SCATTER PLOT
49.Click on distributions
50.Click on scatter plot
51.Click on attribute selection

Q 14) Write a case study for Data ware housing in the world bank.
The world bank collects and maintains huge data of economics &
development at parameters at all their world countries across the globe.
For the purpose of monitoring the effectiveness of the various world
bank assisted projects in the their world countries , the bank started
collecting and analyzing macros electronic financial statistics and also info on
parameters such as poetry, health, education,environment and public sector.

LIVE DATABASE (LDB) DATAWAREHOUSE:


In 1995,the LDB of world bank was developed by world bank
East Artificial team. This was developed by SQL server 2000 platform for the
db. The OLAP cube was redefined for this database using OLAP server
modules of SQL server 2000.
Universal access was provided for this data ware house which was
called LIVE DATABASE.
CONCLUSION:
This was resulted in significant cost savings by reducing the
time effort required to prepare a large variety of reports to suit varying needs
of the other government decision matters too and effective and better
economic planning.

You might also like