Derczynski, L., Chester, S. and Bøgh, K.S. (2015) Tune your brown clustering, please. In: International Conference Recent Advances in Natural Language Processing, RANLP. Recent Advances in Natural Language Processing, Sep 7–9 2015, Hissar, Bulgaria. Association for Computational Linguistics , pp. 110-117.
Abstract
Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | Article licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License (https://creativecommons.org/licenses/by-nc-sa/3.0/). Permission is granted to make copies for the purposes of teaching and research. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 03 Feb 2016 16:14 |
Last Modified: | 03 Feb 2016 16:14 |
Published Version: | http://www.aclweb.org/anthology/R/R15/R15-1.pdf#pa... |
Status: | Published |
Publisher: | Association for Computational Linguistics |
Refereed: | Yes |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:94052 |