Hodge, V J and Austin, J (2002) Hierarchical word clustering - automatic thesaurus generation. Neurocomputing. pp. 819-846. ISSN 0925-2312Full text available as:
In this paper, we propose a hierarchical, lexical clustering neural network algorithm that automatically generates a thesaurus (synonym abstraction) using purely stochastic information derived from unstructured text corpora and requiring no prior word classifications. The lexical hierarchy overcomes the Vocabulary Problem by accommodating paraphrasing through using synonym clusters and overcomes Information Overload by focusing search within cohesive clusters. We describe existing word categorisation methodologies, identifying their respective strengths and weaknesses and evaluate our proposed approach against an existing neural approach using a benchmark statistical approach and a human generated thesaurus for comparison. We also evaluate our word context vector generation methodology against two similar approaches to investigate the effect of word vector dimensionality and the effect of the number of words in the context window on the quality of word clusters produced. We demonstrate the effectiveness of our approach and its superiority to existing techniques. (C) 2002 Elsevier Science B.V. All rights reserved.
|Copyright, Publisher and Additional Information:||Copyright © 2002 Elsevier Science B.V. This is an author produced version of a paper published in Neurocomputing. This paper has been peer-reviewed but does not include the final publisher proof-corrections or journal pagination.|
|Keywords:||neural network, hierarchical thesaurus, lexical, synonym clustering|
|Academic Units:||The University of York > Computer Science (York)|
|Depositing User:||Repository Officer|
|Date Deposited:||13 Dec 2005|
|Last Modified:||17 Oct 2013 14:31|