Face2Text revisited: Improved data set and baseline results

M Tanti, S Abdilla, A Muscat, C Borg, RA Farrugia, A Gatt

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Abstract

    Current image description generation models do not transfer well to the task of describing human faces. To encourage the
    development of more human-focused descriptions, we developed a new data set of facial descriptions based on the CelebA
    image data set. We describe the properties of this data set, and present results from a face description generator trained on
    it, which explores the feasibility of using transfer learning from VGGFace/ResNet CNNs. Comparisons are drawn through
    both automated metrics and human evaluation by 76 English-speaking participants. The descriptions generated by the
    VGGFace-LSTM + Attention model are closest to the ground truth according to human evaluation whilst the ResNet-LSTM +
    Attention model obtained the highest CIDEr and CIDEr-D results (1.252 and 0.686 respectively). Together, the new data set
    and these experimental results provide data and baselines for future work in this area.
    Original languageEnglish
    Title of host publicationProceedings of the Second Workshop on People in Vision, Language and Mind @ LREC2022
    PublisherEuropean Language Resources Association (ELRA)
    Pages41-47
    Publication statusPublished - 2022

    Fingerprint

    Dive into the research topics of 'Face2Text revisited: Improved data set and baseline results'. Together they form a unique fingerprint.

    Cite this