Wang, J.K. orcid.org/0000-0003-0048-3893, Madhyastha, P. and Specia, L. (2018) Object counts! Bringing explicit detections back into image captioning. In: Walker, M., Ji, H. and Stent, A., (eds.) Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. North American Chapter of the Association of Computational Linguistics: Human Language Technologies (NAACL HLT), 01-06 Jun 2018, New Orleans, Louisiana. Association for Computational Linguistics (ACL) , pp. 2180-2193. ISBN 9781948087278
Abstract
The use of explicit object detectors as an intermediate step to image captioning – which used to constitute an essential stage in early work – is often bypassed in the currently dominant end-to-end approaches, where the language model is conditioned directly on a midlevel image embedding. We argue that explicit detections provide rich semantic information, and can thus be used as an interpretable representation to better understand why end-to-end image captioning systems work well. We provide an in-depth analysis of end-to-end image captioning by exploring a variety of cues that can be derived from such object detections. Our study reveals that end-to-end image captioning systems rely on matching image representations to generate captions, and that encoding the frequency, size and position of objects are complementary and all play a role in forming a good image representation. It also reveals that different object categories contribute in different ways towards image captioning.
Metadata
Item Type: | Proceedings Paper |
---|---|
Authors/Creators: |
|
Editors: |
|
Copyright, Publisher and Additional Information: | © 2018 The Association for Computational Linguistics. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
Dates: |
|
Institution: | The University of Sheffield |
Academic Units: | The University of Sheffield > Faculty of Engineering (Sheffield) > Department of Computer Science (Sheffield) |
Funding Information: | Funder Grant number European Commission - Horizon 2020 678017 |
Depositing User: | Symplectic Sheffield |
Date Deposited: | 19 Apr 2018 13:20 |
Last Modified: | 29 Jun 2020 13:25 |
Published Version: | https://www.aclweb.org/anthology/N18-1198 |
Status: | Published |
Publisher: | Association for Computational Linguistics (ACL) |
Refereed: | Yes |
Identification Number: | 10.18653/v1/N18-1198 |
Open Archives Initiative ID (OAI ID): | oai:eprints.whiterose.ac.uk:129790 |