You are on page 1of 21

Appendix A: An Introduction to

Microsoft’s OLE OLDB for Data


Mining
 Introduction
 Overview and design philosophy
 Basic components
 Data set components
 Data mining models
 Operations on data model
 Concluding remarks
October 15, 2008 Data Mining: Concepts and Techniques 1
Why OLE DB for Data
Mining?
 Industry standard is critical for data mining
development, usage, interoperability, and
exchange
 OLEDB for DM is a natural evolution from OLEDB
and OLDB for OLAP
 Building mining applications over relational
databases is nontrivial
 Need different customized data mining
algorithms and methods
 Significant work on the part of application
builders
 Goal: ease the burden
October 15, 2008 of developing
Data Mining: Concepts and Techniques mining 2
Motivation of OLE DB for DM

 Facilitate deployment of data mining models


 Generating data mining models
 Store, maintain and refresh models as data is
updated
 Programmatically use the model on other data
set
 Browse models
 Enable enterprise application developers to
participate in building data mining solutions

October 15, 2008 Data Mining: Concepts and Techniques 3


Features of OLE DB for DM

 Independent of provider or software


 Not specialized to any specific mining
model
 Structured to cater to all well-known
mining models
 Part of upcoming release of Microsoft SQL
Server 2000

October 15, 2008 Data Mining: Concepts and Techniques 4


Overview

Data mining
 Core relational engine
applications
exposes OLE DB in a
language-based API OLE DB OLAP/DM
 Analysis server exposes
OLE DB OLAP and OLE DB Analysis Server

DM OLE DB
 Maintain SQL metaphor RDB engine
 Reuse existing
October 15, 2008
notions
Data Mining: Concepts and Techniques 5
Key Operations to Support Data
Mining Models

 Define a mining model


 Attributes to be predicted

 Attributes to be used for prediction

 Algorithm used to build the model

 Populate a mining model from training


data
 Predict attributes for new data
 Browse a mining model fro reporting and
visualization
October 15, 2008 Data Mining: Concepts and Techniques 6
DMM As Analogous to A Table in
SQL
 Create a data mining module object
 CREATE MINING MODEL [model_name]

 Insert training data into the model and train it


 INSERT INTO [model_name]

 Use the data mining model


 SELECT relation_name.[id],

[model_name].[predict_attr]
 consult DMM content in order to make predictions

and browse statistics obtained by the model


 Using DELETE to empty/reset
 Predictions on datasets: prediction join between a
model and a data set (tables)
 Deploy DMM by just writing SQL queries!
October 15, 2008 Data Mining: Concepts and Techniques 7
Two Basic Components

 Cases/caseset: input data


 A table or nested tables (for hierarchical data)
 Data mining model (DMM): a special type of table
 A caseset is associated with a DMM and meta-info
while creating a DMM
 Save mining algorithm and resulting abstraction
instead of data itself
 Fundamental operations: CREATE, INSERT INTO,
PREDICTION JOIN, SELECT, DELETE FROM, and
DROP
October 15, 2008 Data Mining: Concepts and Techniques 8
Flatterned Representation of
Caseset
Customer
Product
s
Customer Purchases
ID
Gender Customer ID
Product Problem: Lots of replication!
Hair Color Name
Age Quantity

Age Prob Product Type


Car
CID Gend Hair Age Age prob Prod Quan Type Car
Car prob
Owernershi 1 Male Black 35 100% TV 1 Elec Car 100%
p 1 Male Black 35 100% VCR 1 Elec Car 100%
Customer ID
1 Male Black 35 100% Ham 6 Food Car 100%
Car 1 Male Black 35 100% TV 1 Elec Van 50%
Car Prob 1 Male Black 35 100% VCR 1 Elec Van 50%
1 Male Black 35 100% Ham 6 Food Van 50%

October 15, 2008 Data Mining: Concepts and Techniques 9


Logical Nested Table
Representation of Caseset

 Use Data Shaping Service to generate a


hierarchical rowset
 Part of Microsoft Data Access
Components (MDAC) products
Product Car
CID Gend Hair Age Age prob Purchases Ownership
Car
Prod Quan Type Car
prob
TV 1 Elec Car 100%
1 Male Black 35 100% VCR 1 Elec
Van 50%
Ham 6 Food

October 15, 2008 Data Mining: Concepts and Techniques 10


More About Nested Table

 Not necessary for the storage subsystem


to support nested records
 Cases are only instantiated as nested
rowsets prior to training/predicting data
mining models
 Same physical data may be used to
generate different casesets

October 15, 2008 Data Mining: Concepts and Techniques 11


Defining A Data Mining Model

 The name of the model


 The algorithm and parameters
 The columns of caseset and the
relationships among columns
 “Source columns” and “prediction
columns”

October 15, 2008 Data Mining: Concepts and Techniques 12


Example

EATE MINING MODEL [Age Prediction] %Name of Model

stomer ID] LONG KEY, %source column


nder] TEXT DISCRETE, %source column
e] Double DISCRETIZED() PREDICT, %prediction colu
oduct Purchases] TABLE %source column

oduct Name] TEXT KEY, %source column


uantity] DOUBLE NORMAL CONTINUOUS, %source column
oduct Type] TEXT DISCRETE RELATED TO [Product Name]
%source column

NG [Decision_Trees_101] %Mining algorithm use

October 15, 2008 Data Mining: Concepts and Techniques 13


Column Specifiers

 KEY
 ATTRIBUTE
 RELATION (RELATED TO clause)
 QUALIFIER (OF clause)
 PROBABILITY: [0, 1]

 VARIANCE

 SUPPORT

 PROBABILITY-VARIANCE

 ORDER

 TABLE

October 15, 2008 Data Mining: Concepts and Techniques 14


Attribute Types

 DISCRETE
 ORDERED
 CYCLICAL
 CONTINOUS
 DISCRETIZED
 SEQUENCE_TIME

October 15, 2008 Data Mining: Concepts and Techniques 15


Populating A DMM

 Use INSERT INTO statement

 Consuming a case using the data mining

model

 Use SHAPE statement to create the

nested table from the input data

October 15, 2008 Data Mining: Concepts and Techniques 16


Example: Populating a DMM

TO [Age Prediction]

r ID], [Gender], [Age],


Purchases](SKIP, [Product Name], [Quantity], [Product Type])

[Customer ID], [Gender], [Age] FROM Customers ORDER BY [Cust

[CustID], {product Name], [Quantity], [Product Type] FROM Sale


Y [CustID]}
Customer ID] TO [CustID]

ct Purchases]

October 15, 2008 Data Mining: Concepts and Techniques 17


Using Data Model to Predict

 Prediction join
 Prediction on dataset D using DMM M

 Different to equi-join

 DMM: a “truth table”


 SELECT statement associated with
PREDICTION JOIN specifies values
extracted from DMM

October 15, 2008 Data Mining: Concepts and Techniques 18


Example: Using a DMM in
Prediction

SELECT t.[Customer ID], [Age Prediction].[Age]


FROM [Age Prediction]
PRECTION JOIN
(SHAPE
{SELECT [Customer ID], [Gender] FROM Customers ORDER BY
[Customer ID]}
APPEND
(
{SELECT [CustID], [Product Name], [Quantity] FROM Sales ORDER
BY [CustID]}
RELATE [Customer ID] TO [CustID]
)
AS [Product Purchases]
)
AS t
ON [Age Prediction].[Gender]=t.[Gender] AND
[Age Prediction].[Product Purchases].[Product Name]=t.[Product
Purchases].[Product Name]
October 15, 2008
AND
Data Mining: Concepts and Techniques 19
Browsing DMM

 What is in a DMM?

 Rules, formulas, trees, …, etc

 Browsing DMM

 Visualization

October 15, 2008 Data Mining: Concepts and Techniques 20


Concluding Remarks
 OLE DB for DM integrates data mining and
database systems
 A good standard for mining application

builders
 How can we be involved?

 Provide association/sequential pattern

mining modules for OLE DB for DM?


 Design more concrete language

primitives?
 References

October 15, http://www.microsoft.com/data.oled


 2008 Data Mining: Concepts and Techniques 21

You might also like