TY - GEN
T1 - VALSE: A Task-independent benchmark for Vision and Language models centered on linguistic phenomena
AU - Parcalabescu, L
AU - Cafagna, M
AU - Muradjan, L
AU - Frank, A
AU - Calixto, I
AU - Gatt, A
PY - 2022
Y1 - 2022
N2 - We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena. VALSE offers a suite of six tests covering various linguistic constructs. Solving these requires models to ground linguistic phenomena in the visual modality, allowing more fine-grained evaluations than hitherto possible. We build VALSE using methods that support the construction of valid foils, and report results from evaluating five widely-used V&L models. Our experiments suggest that current models have considerable difficulty addressing most phenomena. Hence, we expect VALSE to serve as an important benchmark to measure future progress of pretrained V&L models from a linguistic perspective, complementing the canonical task-centred V&L evaluations.
AB - We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena. VALSE offers a suite of six tests covering various linguistic constructs. Solving these requires models to ground linguistic phenomena in the visual modality, allowing more fine-grained evaluations than hitherto possible. We build VALSE using methods that support the construction of valid foils, and report results from evaluating five widely-used V&L models. Our experiments suggest that current models have considerable difficulty addressing most phenomena. Hence, we expect VALSE to serve as an important benchmark to measure future progress of pretrained V&L models from a linguistic perspective, complementing the canonical task-centred V&L evaluations.
UR - https://aclanthology.org/2022.acl-long.567/
U2 - 10.18653/v1/2022.acl-long.567
DO - 10.18653/v1/2022.acl-long.567
M3 - Conference contribution
SP - 8253
EP - 8280
BT - Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL'22)
PB - Association for Computational Linguistics
ER -