Abstract
Standard image caption generation systems produce generic descriptions of images and do not utilize any contextual information or world knowledge. In particular, they are unable to generate captions that contain references to the geographic context of an image, for example, the location where a photograph is taken or relevant geographic objects around an image location. In this paper, we develop a geo-aware image caption generation system, which incorporates geographic contextual information into a standard image captioning pipeline. We propose a way to build an image-specific representation of the geographic context and adapt the caption generation network to produce appropriate geographic names in the image descriptions. We evaluate our system on a novel captioning dataset that contains contextualized captions and geographic metadata and achieve substantial improvements in BLEU, ROUGE, METEOR and CIDEr scores. We also introduce a new metric to assess generated geographic references directly and empirically demonstrate our system's ability to produce captions with relevant and factually accurate geographic referencing.
Original language | English |
---|---|
Pages | 3143-3156 |
Number of pages | 14 |
Publication status | Published - 2020 |
Event | The 28th International Conference on Computational Linguistics (COLING) - Online Duration: 8 Dec 2020 → 13 Dec 2020 https://coling2020.org/ |
Conference
Conference | The 28th International Conference on Computational Linguistics (COLING) |
---|---|
Abbreviated title | COLING'2020 |
Period | 8/12/20 → 13/12/20 |
Internet address |
Keywords
- image captioning
- caption generation
- knowledge integration
- geographic information
- contextualized language generation
- contextualization
- geographic context