White Rose University Consortium logo
University of Leeds logo University of Sheffield logo York University logo

Document frequency and term specificity

Joho, H. and Sanderson, M. (2007) Document frequency and term specificity. In: Proceedings of the Recherche d'Information Assistée par Ordinateur Conference (RIAO). RIAO 2007 - 8th Conference, 30th May - 1st June 2007, Pittsburgh, PA, USA. .


Download (106Kb)


Document frequency is used in various applications in Information Retrieval and other related fields. An assumption frequently made is that the document frequency represents a level of the term’s specificity. However, empirical results to support this assumption are limited. Therefore, a large-scale experiment was carried out, using multiple corpora, to gain further insight into the relationship between the document frequency and terms specificity. The results show that the assumption holds only at the very specific levels that cover the majority of vocabulary. The results also show that a larger corpus is more accurate at estimating the specificity. However, the co-occurrence information is shown to be effective for improving the accuracy when only a small corpus is available.

Item Type: Proceedings Paper
Institution: The University of Sheffield
Academic Units: The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Depositing User: Repository Officer
Date Deposited: 18 Sep 2008 10:26
Last Modified: 08 Feb 2013 16:56
Published Version: http://riao.free.fr/papers/29.pdf
Status: Published
URI: http://eprints.whiterose.ac.uk/id/eprint/4552

Actions (repository staff only: login required)