You are on page 1of 7

Data Mining

1
2
3
4
5
6
7
8
9

10
11
12
13
14
15
16
17

18
19
20
21

Unit 1
What is data mining and data warehouse?
Compare data base processing Vs. data mining processing.
Explain applications of data mining in detail.
Explain all data mining models and tasks.
What is KDD? Explain with diagram.
Write a short note on visualization.
Discuss the issues in Data mining.
What is fuzzy logic? Explain in brief with example.
Define following terms:
1. Information retrieval
2. Precision
3. Recall
4. Similarity
5. Granularities
6. Facts
7. Roll ups
8. Drill down
Define cube and explain with example.
Write a short note on star schema.
Explain charact eri sti cs of data w arehouse.
Discuss the w ays to improve the performance of data w arehouse
applications.
Write a short note on OLAP operations.
Write a short note on point estimation.
Compute mean, variance and standard deviation for (1, 3, 4,6,5).
Define
1. Mean
2. Median
3. Mode
4. Variance
5. Standard deviation (Sample/Population)
6. Bias
7. MSE
Compute
mean, median and mode for (15, 10, 18, 20, 28, 32).
8. RMS
What is Jackknife estimate technique?
Find out Jackknife estimate for variance X={1, 5, 6}
Mean X= {5, 6, 6}.
Estimate P that maximizes the likelihood that the given sequence of
heads and tails w ould occur for {H, H, H, T, T} Note: Assume coin
w ith H and T equally likely.

2013-14

Data Mining

22

Estimate the missing data and continues until convergence using


Expectation Maximization {1, 5, 10, 4, *, *}. (Guess 0=3)

23

Prove that X 11 belongs to class h2 using Bayes theorem.

24
25
26
27
28
29

30
31
32
33
34
35
36
37
38
39
40
41

ID
Income
Credit
Class
xi
1
4
Excellent
h1
x4
2
3
Good
h1
x7
3
2
Excellent
h1
x2
4
3
Good
h1
x7
5
4
Good
h1
x8
6
2
Excellent
h1
x2
7
3
Bad
h2
x11
8
2
Bad
h2
x10
9
3
Bad
h3
x11
10
1
Bad
h4
x9
Write a short note on Hypothesis testing.
Find Chi square statistics for
Observed value = {51, 95, 67, 78, 88}
Expected value=76
Write a short note on linear regression.
Write a short note on non-linear regression.
Explain correlation in detail.
Find correlation betw een Ice cream sales Vs temperature
Temperature
Ice Cream Sales (in
0C
rupees)
14. 2
215
16. 4
325
11. 9
185
15. 2
332
18. 5
406
22. 1
522
Write a short on similarity measures.
Unit 2
Explain the need of data pre-processing.
List and explain major tasks in data processing.
Explain terms Quartile and Inter-Quartile range.
What are Box plot and Quantile plot?
What is histogram and scatter plot?
Write a short note on data cleaning tasks.
Explain Binning w ith example.
Explain Data aggregation, generalization and smoothing.
Write a short note on data transformation.
Write a short note on data normalization.
Define
1. Association rule

2013-14

Data Mining

2013-14

2. Support
3. Confidence
w)

42
42a
43
44
45
46
47
48
49
50
51
52
53
54

55

56

57
58
59
60

Explain apriori algorithm w ith example.


Write a short note on Association rule mining.
Unit 3
What is classification? Discuss the issues.
What is prediction? Discuss the issues.
Write a short note on decision tree.
Write a short note on Bayesian classifier.
Write a short note on Rule based classifier.
Write a short note on Neural netw ork classifier.
Write a short note on Support Vector Machine.
Define coverage and accuracy in rule based classifier.
Explain triggering and firing of rules.
Explain rule based and class based ordering.
Discuss The accuracy on its ow n is not a reliable estimate
of rule
Consider
quality a training set that contains 100 positive examples
and 400
negative examples for each of the follow ing
candidate rule. R1 : A
+ (covers 4 positive and
one negative examples) R2 : B
+ (covers 30
positive and 10 negative examples) R3 : C
+
(covers 100
positiveset
andthat
90 contains
negative 100
examples)
Consider
a training
positive examples
Determine
and
400 w hich is the best and w orst candidate rule
accordingexamples
to
negative
for each of the follow ing
Rule
accuracy?
candidate rule. R1 : A
+ (covers 4 positive and
one negative examples) R2 : B
+ (covers 30
positive and 10 negative examples) R3 : C
+
(covers 100
positiveset
andthat
90contains
negative 100
examples)
Consider
a training
positive examples
and
400 w hich is the best and w orst candidate rule
Determine
negative
for each of the follow ing
accordingexamples
to
candidate
rule. R1 : A
FOILs information
gain? + (covers 4 positive and
one negative examples) R2 : B
+ (covers 30
positive and 10 negative examples) R3 : C
+
(coversthe100
positive betw
and 90
examples)
Write
difference
eennegative
classification
and clustering.
Determine
w
hich
is
the
best
and
w
orst
candidate
Explain supervised and unsupervised learning. rule
according
to the
Explain term
pruning and overfitting.
likelihood ratio statistic?
Find information gain for income in follow ing data

Data Mining

61
62

Write a short note on Gini index.


Classify the follow ing tuple using Nave Bayesian classifier.
X=(age=youth, income=low , student=yes, credit_rating=fair) using
follow ing training data.

2013-14

Data Mining

63

Find out the population for the year 2013 using linear regression.
2005
12

64
65

2013-14

2006
19

2007
28

69

2009
45

2013
?

Write a short note on confusion matrix.


Classify X 1 =4, X2 =7 using K-nearest neighbour (assume k=3).
X
7
7
3
1

66
67
68
68 a

2008
35

Y
7
4
4
4

Class
B
B
G
G

Unit 4
List all the requirements of clustering Data mining.
Write a short note on type of data in clustering analysis.
Compute Euclidean and Manhattan distance for X1 (1, 2) and X 2 (3, 6).
Compute Euclidean and Manhattan distance for X1 (1, 2) and X 2 (4, 6).
Compute
1. Similarity betw een A and B
2. Similarity betw een C and B
3. Similarity betw een A and C and comment on the most similar
tuples.
Name
A
B
C

Gender
F
M
F

F
Y
Y
Y

C
N
N
P

T1
P
Y
P

T2
P
N
N

T3
N
P
N

Data Mining

70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91

92

Write a short note on K-means clustering.


Write a short note on K-medoids clustering.
Write a short note on partitioning approach.
Write a short note on Hierarchical approach.
Write a short note on DBSCAN.
List and discuss major clustering approaches.
Write a short note on ROCK.
Explain agglomeration and divisive approach.
Apply hierarchical clustering using single linkage to follow ing data.
A (1, 1), B(1. 5, 1.5), C(3, 4), D(4,4), E(3, 3.5)
What are outliers? How to find out? Write the applications.
Unit5
What is graph mining and social netw ork?
What are multimedia and spatial databases?
Explain set and listed valued attribute w ith example.
Explain set and complex structure valued attribute.
What is spatial aggregation and approximation? Explain w ith
example.
Define plan, plan database and plan mining.
Explain the types of dimensions in spatial data cube.
Explain measures in spatial data cube.
Discuss approaches for similarity based retrieval in image database.
Write a short note on mining association in multimedia data.
Write a short note on text mining.
Define
1. Term frequency
2. Term frequency matrix
3. Relative term frequency
4. Inverse document frequency
Compute TF, IDF and TF-IDF for t2 in d2 for follow ing data.

2013-14

Data Mining

92 (a)

Compute TF, IDF and TF-IDF for t3 in d4 for follow ing data.

93

Discuss Web poses grate challenges for effective resources and


know ledge discovery

2013-14

You might also like