Sanderson, M. and Zobel, J. (2005) Information retrieval system evaluation: effort, sensitivity, and reliability. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. Annual ACM Conference on Research and Development in Information Retrieval, August 15 - 19, 2005, Salvador, Brazil. ACM , New York , pp. 162-169. ISBN 1-59593-034-5
The effectiveness of information retrieval systems is measured by comparing performance on a common set of queries and documents. Significance tests are often used to evaluate the reliability of such comparisons. Previous work has examined such tests, but produced results with limited application. Other work established an alternative benchmark for significance, but the resulting test was too stringent. In this paper, we revisit the question of how such tests should be used. We find that the t-test is highly reliable (more so than the sign or Wilcoxon test), and is far more reliable than simply showing a large percentage difference in effectiveness measures between IR systems. Our results show that past empirical work on significance tests overestimated the error of such tests. We also re-consider comparisons between the reliability of precision at rank 10 and mean average precision, arguing that past comparisons did not consider the assessor effort required to compute such measures. This investigation shows that assessor effort would be better spent building test collections with more topics, each assessed in less detail.
|Copyright, Publisher and Additional Information:||Copyright 2005 ACM. This is an author produced version of a paper published in "Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval". Uploaded in accordance with the publisher's self-archiving policy.|
|Institution:||The University of Sheffield|
|Academic Units:||The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)|
|Depositing User:||Repository Officer|
|Date Deposited:||03 Sep 2008 12:06|
|Last Modified:||15 Sep 2014 01:27|