Parallel data-local training for optimizing Word2Vec embeddings for word and graph embeddings

Moon, G.E., Newman-Griffis, D. orcid.org/0000-0002-0473-4226, Kim, J. et al. (3 more authors) (2020) Parallel data-local training for optimizing Word2Vec embeddings for word and graph embeddings. In: 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC). 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), 18 Nov 2019, Denver, CO, USA. IEEE , pp. 44-55. ISBN 978-1-7281-5986-7

Abstract

The Word2Vec model is a neural network-based unsupervised word embedding technique widely used in applications such as natural language processing, bioinformatics and graph mining. As Word2Vec repeatedly performs Stochastic Gradient Descent (SGD) to minimize the objective function, it is very compute-intensive. However, existing methods for parallelizing Word2Vec are not optimized enough for data locality to achieve high performance. In this paper, we develop a parallel data-locality-enhanced Word2Vec algorithm based on Skip-gram with a novel negative sampling method that decouples loss calculation with positive and negative samples; this allows us to efficiently reformulate matrix-matrix operations for the negative samples over the sentence. Experimental results demonstrate our parallel implementations on multi-core CPUs and GPUs achieve significant performance improvement over the existing state-of-the-art parallel Word2Vec implementations while maintaining evaluation quality. We also show the utility of our Word2Vec implementation within the Node2Vec algorithm which accelerates embedding learning for large graphs.

Metadata

Item Type:	Proceedings Paper
Authors/Creators:	Moon, G.E. Newman-Griffis, D. https://orcid.org/0000-0002-0473-4226 Kim, J. Sukumaran-Rajam, A. Fosler-Lussier, E. Sadayappan, P.
Copyright, Publisher and Additional Information:	© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reproduced in accordance with the publisher's self-archiving policy.
Keywords:	gradient methods; graph theory; matrix algebra; multiprocessing systems; neural nets; parallel algorithms; sampling methods; stochastic processes; text analysis; unsupervised learning
Dates:	Published: 9 January 2020
Institution:	The University of Sheffield
Academic Units:	The University of Sheffield > Faculty of Social Sciences (Sheffield) > Information School (Sheffield)
Depositing User:	Symplectic Sheffield
Date Deposited:	16 Feb 2023 16:18
Last Modified:	17 Feb 2023 14:43
Published Version:	http://dx.doi.org/10.1109/mlhpc49564.2019.00010
Status:	Published
Publisher:	IEEE
Refereed:	Yes
Identification Number:	10.1109/mlhpc49564.2019.00010
Related URLs:	Author
Open Archives Initiative ID (OAI ID):	oai:eprints.whiterose.ac.uk:196489

CORE (COnnecting REpositories)

Parallel data-local training for optimizing Word2Vec embeddings for word and graph embeddings

Abstract

Metadata

Download

Accepted Version

Export

Statistics