Yang, H. orcid.org/0000-0002-3372-4801 and Zhang, M. (2003) A Language Modeling Approach to Search Distributed Text Databases. In: Gedeon, T.D. and Fung, L.C.C., (eds.) AI 2003: Advances in Artificial Intelligence. The 16th Australian Conference on Artificial Intelligence (AI'03), 03-05 Dec 2003, Perth, Australia. Lecture Notes in Computer Science ( vol 2903). Springer , Berlin, Heidelberg , pp. 196-207. ISBN 978-3-540-20646-0
Abstract
As the number and diversity of distributed information sources on the Internet exponentially increase, it is difficult for the user to know which databases are appropriate to search. Given database language models that describe the content of each database, database selection services can provide assistance in locate relevant databases of the user’s information need. In this paper, we propose a database selection approach based on statistical language modeling. The basic idea behind the approach is that, for the databases that are categorized into a topic hierarchy, individual language models are estimated at different search stages, and then the databases are ranked by the similarity to the query according to the estimated language model. Two-stage smoothed language models are presented to circumvent the inaccuracy due to word sparseness. Experimental results demonstrate such a language modeling approach is competitive with current state-of-the-art database selection approaches.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2003 Springer-Verlag Berlin Heidelberg |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 14 Jul 2017 11:29 |
Last Modified: | 19 Dec 2022 13:35 |
Published Version: | https://link.springer.com/chapter/10.1007/978-3-54... |
Status: | Published |
Publisher: | Springer |
Series Name: | Lecture Notes in Computer Science |
Refereed: | Yes |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:110491 |