Hodge, V J orcid.org/0000-0002-2469-0224 and Austin, J orcid.org/0000-0001-5762-8614 (2002) Hierarchical word clustering - automatic thesaurus generation. Neurocomputing. pp. 819-846. ISSN 0925-2312
Abstract
In this paper, we propose a hierarchical, lexical clustering neural network algorithm that automatically generates a thesaurus (synonym abstraction) using purely stochastic information derived from unstructured text corpora and requiring no prior word classifications. The lexical hierarchy overcomes the Vocabulary Problem by accommodating paraphrasing through using synonym clusters and overcomes Information Overload by focusing search within cohesive clusters. We describe existing word categorisation methodologies, identifying their respective strengths and weaknesses and evaluate our proposed approach against an existing neural approach using a benchmark statistical approach and a human generated thesaurus for comparison. We also evaluate our word context vector generation methodology against two similar approaches to investigate the effect of word vector dimensionality and the effect of the number of words in the context window on the quality of word clusters produced. We demonstrate the effectiveness of our approach and its superiority to existing techniques. (C) 2002 Elsevier Science B.V. All rights reserved.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | Copyright © 2002 Elsevier Science B.V. This is an author produced version of a paper published in Neurocomputing. This paper has been peer-reviewed but does not include the final publisher proof-corrections or journal pagination. |
Keywords: | neural network,hierarchical thesaurus,lexical,synonym clustering |
Dates: |
|
Institution: | The University of York |
Academic Units: | The University of York > Faculty of Sciences (York) > Computer Science (York) |
Depositing User: | Repository Officer |
Date Deposited: | 13 Dec 2005 |
Last Modified: | 26 Jan 2025 00:07 |
Published Version: | https://doi.org/10.1016/S0925-2312(01)00675-0 |
Status: | Published |
Refereed: | Yes |
Identification Number: | 10.1016/S0925-2312(01)00675-0 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:882 |