Madhyastha, P., Wang, J.K. orcid.org/0000-0003-0048-3893 and Specia, L. (2018) The role of image representations in vision to language tasks. Natural Language Engineering, 24 (3). pp. 415-439. ISSN 1351-3249
Abstract
Tasks that require modeling of both language and visual information such as image captioning have become very popular in recent years. Most state-of-the-art approaches make use of image representations obtained from a deep neural network, which are used to generate language information in a variety of ways with end-to-end neural network-based models. However, it is not clear how different image representations contribute to language generation tasks. In this paper, we probe the representational contribution of the image features in an end-to-end neural modeling framework and study the properties of different types of image representations. We focus on two popular vision to language problems: the task of image captioning and the task of multimodal machine translation. Our analysis provides interesting insights into the representational properties and suggests that end-to-end approaches implicitly learn a visual-semantic subspace and exploit the subspace to generate captions.
Metadata
Item Type: | Article |
---|---|
Authors/Creators: |
|
Copyright, Publisher and Additional Information: | © 2018 Cambridge University Press. This is an author produced version of a paper subsequently published in Natural Language Engineering. Article available under the terms of the CC-BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number European Commission - Horizon 2020 678017 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 23 Apr 2018 14:13 |
Last Modified: | 17 Nov 2020 11:31 |
Status: | Published |
Publisher: | Cambridge University Press (CUP) |
Refereed: | Yes |
Identification Number: | 10.1017/S1351324918000116 |
Related URLs: | |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:129793 |