DMQB

Data Mining
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Unit 1
What is data mining and data warehouse?
Compare data base processing Vs. data mining processing.
Explain applications of data mining in detail.
Explain all data mining models and tasks.
What is KDD? Explain with diagram.
Write a short note on visualization.
Discuss the issues in Data mining.
What is fuzzy logic? Explain in brief with example.
Define following terms:
1. Information retrieval
2. Precision
3. Recall
4. Similarity
5. Granularities
6. Facts
7. Roll ups
8. Drill down
Define cube and explain with example.
Write a short note on star schema.
Explain charact eri sti cs of data w arehouse.
Discuss the w ays to improve the performance of data w arehouse
applications.
Write a short note on OLAP operations.
Write a short note on point estimation.
Compute mean, variance and standard deviation for (1, 3, 4,6,5).
Define
1. Mean
2. Median
3. Mode
4. Variance
5. Standard deviation (Sample/Population)
6. Bias
7. MSE
Compute
mean, median and mode for (15, 10, 18, 20, 28, 32).
8. RMS
What is Jackknife estimate technique?
Find out Jackknife estimate for variance X={1, 5, 6}
Mean X= {5, 6, 6}.
Estimate P that maximizes the likelihood that the given sequence of
heads and tails w ould occur for {H, H, H, T, T} Note: Assume coin
w ith H and T equally likely.
2013-14
Data Mining
22
Estimate the missing data and continues until convergence using

Expectation Maximization {1, 5, 10, 4, *, *}. (Guess 0=3)
23
Prove that X 11 belongs to class h2 using Bayes theorem.
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
ID
Income
Credit
Class
xi
1
4
Excellent
h1
x4
2
3
Good
h1
x7
3
2
Excellent
h1
x2
4
3
Good
h1
x7
5
4
Good
h1
x8
6
2
Excellent
h1
x2
7
3
Bad
h2
x11
8
2
Bad
h2
x10
9
3
Bad
h3
x11
10
1
Bad
h4
x9
Write a short note on Hypothesis testing.
Find Chi square statistics for
Observed value = {51, 95, 67, 78, 88}
Expected value=76
Write a short note on linear regression.
Write a short note on non-linear regression.
Explain correlation in detail.
Find correlation betw een Ice cream sales Vs temperature
Temperature
Ice Cream Sales (in
0C
rupees)
14. 2
215
16. 4
325
11. 9
185
15. 2
332
18. 5
406
22. 1
522
Write a short on similarity measures.
Unit 2
Explain the need of data pre-processing.
List and explain major tasks in data processing.
Explain terms Quartile and Inter-Quartile range.
What are Box plot and Quantile plot?
What is histogram and scatter plot?
Write a short note on data cleaning tasks.
Explain Binning w ith example.
Explain Data aggregation, generalization and smoothing.
Write a short note on data transformation.
Write a short note on data normalization.
Define
1. Association rule
2013-14
Data Mining
2013-14
2. Support
3. Confidence
w)
42
42a
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Explain apriori algorithm w ith example.

Write a short note on Association rule mining.
Unit 3
What is classification? Discuss the issues.
What is prediction? Discuss the issues.
Write a short note on decision tree.
Write a short note on Bayesian classifier.
Write a short note on Rule based classifier.
Write a short note on Neural netw ork classifier.
Write a short note on Support Vector Machine.
Define coverage and accuracy in rule based classifier.
Explain triggering and firing of rules.
Explain rule based and class based ordering.
Discuss The accuracy on its ow n is not a reliable estimate
of rule
Consider
quality a training set that contains 100 positive examples
and 400
negative examples for each of the follow ing
candidate rule. R1 : A
+ (covers 4 positive and
one negative examples) R2 : B
+ (covers 30
positive and 10 negative examples) R3 : C
+
(covers 100
positiveset
andthat
90 contains
negative 100
examples)
Consider
a training
positive examples
Determine
and
400 w hich is the best and w orst candidate rule
accordingexamples
to
negative
for each of the follow ing
Rule
accuracy?
candidate rule. R1 : A
+ (covers 4 positive and
+ (covers 30
+
(covers 100
positiveset
andthat
90contains
negative 100
examples)
Consider
a training
positive examples
and
400 w hich is the best and w orst candidate rule
Determine
negative
for each of the follow ing
accordingexamples
to
candidate
rule. R1 : A
FOILs information
gain? + (covers 4 positive and
+ (covers 30
+
(coversthe100
positive betw
and 90
examples)
Write
difference
eennegative
classification
and clustering.
Determine
w
hich
is
the
best
and
w
orst
candidate
Explain supervised and unsupervised learning. rule
according
to the
Explain term
pruning and overfitting.
likelihood ratio statistic?
Find information gain for income in follow ing data
Data Mining
61
62
Write a short note on Gini index.

Classify the follow ing tuple using Nave Bayesian classifier.
X=(age=youth, income=low , student=yes, credit_rating=fair) using
follow ing training data.
2013-14
Data Mining
63
Find out the population for the year 2013 using linear regression.
2005
12
64
65
2013-14
2006
19
2007
28
69
2009
45
2013
?
Write a short note on confusion matrix.

Classify X 1 =4, X2 =7 using K-nearest neighbour (assume k=3).
X
7
7
3
1
66
67
68
68 a
2008
35
Y
7
4
4
4
Class
B
B
G
G
Unit 4
List all the requirements of clustering Data mining.
Write a short note on type of data in clustering analysis.
Compute Euclidean and Manhattan distance for X1 (1, 2) and X 2 (3, 6).
Compute Euclidean and Manhattan distance for X1 (1, 2) and X 2 (4, 6).
Compute
1. Similarity betw een A and B
2. Similarity betw een C and B
3. Similarity betw een A and C and comment on the most similar
tuples.
Name
A
B
C
Gender
F
M
F
F
Y
Y
Y
C
N
N
P
T1
P
Y
P
T2
P
N
N
T3
N
P
N
Data Mining
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
Write a short note on K-means clustering.

Write a short note on K-medoids clustering.
Write a short note on partitioning approach.
Write a short note on Hierarchical approach.
Write a short note on DBSCAN.
List and discuss major clustering approaches.
Write a short note on ROCK.
Explain agglomeration and divisive approach.
Apply hierarchical clustering using single linkage to follow ing data.
A (1, 1), B(1. 5, 1.5), C(3, 4), D(4,4), E(3, 3.5)
What are outliers? How to find out? Write the applications.
Unit5
What is graph mining and social netw ork?
What are multimedia and spatial databases?
Explain set and listed valued attribute w ith example.
Explain set and complex structure valued attribute.
What is spatial aggregation and approximation? Explain w ith
example.
Define plan, plan database and plan mining.
Explain the types of dimensions in spatial data cube.
Explain measures in spatial data cube.
Discuss approaches for similarity based retrieval in image database.
Write a short note on mining association in multimedia data.
Write a short note on text mining.
Define
1. Term frequency
2. Term frequency matrix
3. Relative term frequency
4. Inverse document frequency
Compute TF, IDF and TF-IDF for t2 in d2 for follow ing data.
2013-14
Data Mining
92 (a)
Compute TF, IDF and TF-IDF for t3 in d4 for follow ing data.
93
Discuss Web poses grate challenges for effective resources and

know ledge discovery
2013-14

DMQB

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DMQB

Uploaded by

Copyright:

Available Formats

Data Mining

Estimate the missing data and continues until convergence using

Prove that X 11 belongs to class h2 using Bayes theorem.

Explain apriori algorithm w ith example.

Write a short note on Gini index.

Write a short note on confusion matrix.

Write a short note on K-means clustering.

Discuss Web poses grate challenges for effective resources and

You might also like