Professional Documents
Culture Documents
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Unit 1
What is data mining and data warehouse?
Compare data base processing Vs. data mining processing.
Explain applications of data mining in detail.
Explain all data mining models and tasks.
What is KDD? Explain with diagram.
Write a short note on visualization.
Discuss the issues in Data mining.
What is fuzzy logic? Explain in brief with example.
Define following terms:
1. Information retrieval
2. Precision
3. Recall
4. Similarity
5. Granularities
6. Facts
7. Roll ups
8. Drill down
Define cube and explain with example.
Write a short note on star schema.
Explain charact eri sti cs of data w arehouse.
Discuss the w ays to improve the performance of data w arehouse
applications.
Write a short note on OLAP operations.
Write a short note on point estimation.
Compute mean, variance and standard deviation for (1, 3, 4,6,5).
Define
1. Mean
2. Median
3. Mode
4. Variance
5. Standard deviation (Sample/Population)
6. Bias
7. MSE
Compute
mean, median and mode for (15, 10, 18, 20, 28, 32).
8. RMS
What is Jackknife estimate technique?
Find out Jackknife estimate for variance X={1, 5, 6}
Mean X= {5, 6, 6}.
Estimate P that maximizes the likelihood that the given sequence of
heads and tails w ould occur for {H, H, H, T, T} Note: Assume coin
w ith H and T equally likely.
2013-14
Data Mining
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
ID
Income
Credit
Class
xi
1
4
Excellent
h1
x4
2
3
Good
h1
x7
3
2
Excellent
h1
x2
4
3
Good
h1
x7
5
4
Good
h1
x8
6
2
Excellent
h1
x2
7
3
Bad
h2
x11
8
2
Bad
h2
x10
9
3
Bad
h3
x11
10
1
Bad
h4
x9
Write a short note on Hypothesis testing.
Find Chi square statistics for
Observed value = {51, 95, 67, 78, 88}
Expected value=76
Write a short note on linear regression.
Write a short note on non-linear regression.
Explain correlation in detail.
Find correlation betw een Ice cream sales Vs temperature
Temperature
Ice Cream Sales (in
0C
rupees)
14. 2
215
16. 4
325
11. 9
185
15. 2
332
18. 5
406
22. 1
522
Write a short on similarity measures.
Unit 2
Explain the need of data pre-processing.
List and explain major tasks in data processing.
Explain terms Quartile and Inter-Quartile range.
What are Box plot and Quantile plot?
What is histogram and scatter plot?
Write a short note on data cleaning tasks.
Explain Binning w ith example.
Explain Data aggregation, generalization and smoothing.
Write a short note on data transformation.
Write a short note on data normalization.
Define
1. Association rule
2013-14
Data Mining
2013-14
2. Support
3. Confidence
w)
42
42a
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Data Mining
61
62
2013-14
Data Mining
63
Find out the population for the year 2013 using linear regression.
2005
12
64
65
2013-14
2006
19
2007
28
69
2009
45
2013
?
66
67
68
68 a
2008
35
Y
7
4
4
4
Class
B
B
G
G
Unit 4
List all the requirements of clustering Data mining.
Write a short note on type of data in clustering analysis.
Compute Euclidean and Manhattan distance for X1 (1, 2) and X 2 (3, 6).
Compute Euclidean and Manhattan distance for X1 (1, 2) and X 2 (4, 6).
Compute
1. Similarity betw een A and B
2. Similarity betw een C and B
3. Similarity betw een A and C and comment on the most similar
tuples.
Name
A
B
C
Gender
F
M
F
F
Y
Y
Y
C
N
N
P
T1
P
Y
P
T2
P
N
N
T3
N
P
N
Data Mining
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
2013-14
Data Mining
92 (a)
Compute TF, IDF and TF-IDF for t3 in d4 for follow ing data.
93
2013-14