Li, R., Peng, X., Lin, C. et al. (2 more authors) (2021) On the low-density latent regions of VAE-based language models. In: Bertinetto, L., Henriques, J.F., Albanie, S., Paganini, M. and Varol, G., (eds.) NeurIPS 2020 Workshop on Pre-registration in Machine Learning. NeurIPS 2020 Workshop on Pre-registration in Machine Learning, 11 Dec 2020, Virtual conference. Proceedings of Machine Learning Research (PMLR) (148). ML Research Press , pp. 343-357.
Abstract
By representing semantics in latent spaces, Variational autoencoders (VAEs) have been proven powerful in modelling and generating signals such as image and text, even without supervision. However, previous studies suggest that in a learned latent space, some low-density regions (aka. holes) exist, which could harm the overall system performance. While existing studies focus on empirically mitigating these latent holes, how they distribute and how they affect different components of a VAE, are still unexplored. In addition, the hole issue in VAEs for language processing is rarely addressed. In our work, by introducing a simple hole-detection algorithm based on the neighbour consistency between VAE’s input, latent, and output semantic spaces, we propose to deeply dive into these topics for the first time. Comprehensive experiments including automatic evaluation and human evaluation imply that large-scale low-density latent holes may not exist in the latent space. In addition, various sentence encoding strategies are explored and the native word embedding is the most suitable strategy for VAEs in language modelling task.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2021 The Authors. For reuse permissions please contact the authors. |
Keywords: | Variational Autoencoder; Low-density Regions; Latent Holes |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number Engineering and Physical Sciences Research Council EP/P011829/1 Ningbo Natural Science Foundation 202003N4320; 202003N4321 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 12 Aug 2021 09:14 |
Last Modified: | 12 Aug 2021 09:14 |
Published Version: | http://proceedings.mlr.press/v148/li21a.html |
Status: | Published |
Publisher: | ML Research Press |
Series Name: | Proceedings of Machine Learning Research (PMLR) |
Refereed: | Yes |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:177037 |