Salle, A., Idiart, M. and Villavicencio, A. orcid.org/0000-0002-3731-9168 (2016) Matrix factorization using window sampling and negative sampling for improved word representations. In: Erk, K. and Smith, N.A., (eds.) Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Short Papers). 54th Annual Meeting of the Association for Computational Linguistics, 07-12 Aug 2016, Berlin, Germany. Association for Computational Linguistics , pp. 419-424. ISBN 9781510827592
Abstract
In this paper, we propose LexVec, a new method for generating distributed word representations that uses low-rank, weighted factorization of the Positive Point-wise Mutual Information matrix via stochastic gradient descent, employing a weighting scheme that assigns heavier penalties for errors on frequent cooccurrences while still accounting for negative co-occurrence. Evaluation on word similarity and analogy tasks shows that LexVec matches and often outperforms state-of-the-art methods on many of these tasks.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2016 Association for Computational Linguistics. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 21 Nov 2019 14:28 |
Last Modified: | 21 Nov 2019 16:24 |
Status: | Published |
Publisher: | Association for Computational Linguistics |
Refereed: | Yes |
Identification Number: | 10.18653/v1/P16-2068 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:153565 |