Professional Documents
Culture Documents
Q1. Discuss in brief the limitations of the Boolean retrieval model. [2 Marks]
Ans. (2 marks for any 4 out of 5)
Not tolerant to spelling mistakes
Phrase search (Stanford University) and proximity search (Gates /s Microsoft) requires the index
to be augmented.
Giving more weight to documents containing higher number of instances of terms is not considered.
Positional information of terms in a document is not considered.
No ranking of returned results so that the documents can be ranked according to degree of
relevance.
Q3. Discuss briefly the index construction algorithm used in Distributed Indexing with a suitable diagram.
[5 marks]
Q4. a. An IR system returns 8 relevant documents, and 10 non-relevant documents. There are a total of 20
relevant documents in the collection. What is the precision of the system on this search, and what is
its recall? [2 Marks]
Ans. Precision = 8/18 = 0.44; Recall = 8/20 =0.4
(1 mark each)
Q6. Consider the following document:The universe contains many different universities
[1 + 2 + 3 + 2 = 8 Marks]
a. How many entries a bigram index would contain? Ans: 50 (1 mark)
b. If a booloean query of answering is used on this index for the initial query uni*, what terms would
you search in this permuterm index?
Ans. ( 2 marks)
c. How do you process queries such as univ*,uni*rse,uni*e*se by using the permuterm index? Show
what terms will you search for and how?
Ans. (1 mark each for each query)
d. Use the 2-gram index and 3-gram index for processing the following wildcard queries tol* and rea*
. Is "tool" result for the wildcard query tol* ? If the answer is yes, solve this problem.
Ans. ( 1 mark each for each part)
Q7. Assume that Simple term frequency weights are used (with no IDF factor), and the stop words is,
am and are are removed. Compute the cosine similarity of the following two documents: [Show
the term frequency matrix] [3 marks]
Doc1: Precision is very very high
Doc2: high precision is very very very important
Ans. (1 mark for matrix and 2 marks for calculations.)
***********