TY - JOUR
T1 - About Time
T2 - Advances, Challenges, and Outlooks of Action Understanding
AU - Stergiou, Alexandros
AU - Poppe, Ronald
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/5/30
Y1 - 2025/5/30
N2 - We have witnessed impressive advances in video action understanding. Increased dataset sizes, variability, and computation availability have enabled leaps in performance and task diversification. Current systems can provide coarse- and fine-grained descriptions of video scenes, extract segments corresponding to queries, synthesize unobserved parts of videos, and predict context across multiple modalities. This survey comprehensively reviews advances in uni- and multi-modal action understanding across a range of tasks. We focus on prevalent challenges, overview widely adopted datasets, and survey seminal works with an emphasis on recent advances. We broadly distinguish between three temporal scopes: (1) recognition tasks of actions observed in full, (2) prediction tasks for ongoing partially observed actions, and (3) forecasting tasks for subsequent unobserved action(s). This division allows us to identify specific action modeling and video representation challenges. Finally, we outline future directions to address current shortcomings.
AB - We have witnessed impressive advances in video action understanding. Increased dataset sizes, variability, and computation availability have enabled leaps in performance and task diversification. Current systems can provide coarse- and fine-grained descriptions of video scenes, extract segments corresponding to queries, synthesize unobserved parts of videos, and predict context across multiple modalities. This survey comprehensively reviews advances in uni- and multi-modal action understanding across a range of tasks. We focus on prevalent challenges, overview widely adopted datasets, and survey seminal works with an emphasis on recent advances. We broadly distinguish between three temporal scopes: (1) recognition tasks of actions observed in full, (2) prediction tasks for ongoing partially observed actions, and (3) forecasting tasks for subsequent unobserved action(s). This division allows us to identify specific action modeling and video representation challenges. Finally, we outline future directions to address current shortcomings.
KW - Action Anticipation
KW - Action Prediction
KW - Action Recognition
KW - Action Understanding
UR - https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=d7dz6a2i7wiom976oc9ff2iqvdhv8k5x&SrcAuth=WosAPI&KeyUT=WOS:001499210200001&DestLinkType=FullRecord&DestApp=WOS_CPL
U2 - 10.1007/s11263-025-02478-4
DO - 10.1007/s11263-025-02478-4
M3 - Article
SN - 0920-5691
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
M1 - 103406
ER -