Monday, April 19, 2010

Relevance Feedback

In a document retrieval system, when user issues an initial query, the list of documents system returns often consists of relevant and irrelevant documents. After user provides some feedback on the retrieval relevance, explicitly or implicitly, system is able to return a better list of documents by formulating a new query, with most relevant ones ranked first and irrelevant on the bottom.

The theory is try to maximize the similarity between query and relevant documents, while minimizing the similarity between query and irrelevant documents.

How to generate a new query: 1. add new terms. 2. reweight query terms.




q = \arg max_q [sim(q,C_r)-sim(q,C_nr)]



Rocchio algorithm is an example of incorporating feedback to the modified query. The effect is to move query towards the centroid of relevant documents, and move away from the centroid irrelevant documents.

Users are often reluctant to provide feedback to prolong the search interaction. Pseudo-feedback

0 comments: