Geo-Aware Image Caption Generation

Research output: Contribution to conferencePaperAcademic

Abstract

Standard image caption generation systems produce generic descriptions of images and do not utilize any contextual information or world knowledge. In particular, they are unable to generate captions that contain references to the geographic context of an image, for example, the location where a photograph is taken or relevant geographic objects around an image location. In this paper, we develop a geo-aware image caption generation system, which incorporates geographic contextual information into a standard image captioning pipeline. We propose a way to build an image-specific representation of the geographic context and adapt the caption generation network to produce appropriate geographic names in the image descriptions. We evaluate our system on a novel captioning dataset that contains contextualized captions and geographic metadata and achieve substantial improvements in BLEU, ROUGE, METEOR and CIDEr scores. We also introduce a new metric to assess generated geographic references directly and empirically demonstrate our system's ability to produce captions with relevant and factually accurate geographic referencing.
Original languageEnglish
Pages3143-3156
Number of pages14
Publication statusPublished - 2020
EventThe 28th International Conference on Computational Linguistics (COLING) - Online
Duration: 8 Dec 202013 Dec 2020
https://coling2020.org/

Conference

ConferenceThe 28th International Conference on Computational Linguistics (COLING)
Abbreviated titleCOLING'2020
Period8/12/2013/12/20
Internet address

Keywords

  • image captioning
  • caption generation
  • knowledge integration
  • geographic information
  • contextualized language generation
  • contextualization
  • geographic context

Fingerprint

Dive into the research topics of 'Geo-Aware Image Caption Generation'. Together they form a unique fingerprint.

Cite this