LA-layer: General local attention layer for full attention networks

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Attention layers have contributed to state-of-the-art results on vision tasks. Still, they leave room for improvement because position information is used in a fixed manner, and the computation cost is typically high. To mitigate both issues, we propose a convolution-style local attention layer (LA-layer) as a replacement for traditional attention layers. LA-layers not only encode the position information of pixels in a convolutional manner, but also produce position offsets following a novel constrained rule so that keys will deform and result in larger receptive fields. Query and keys are processed by a novel aggregation function that outputs attention weights for the values. In our experiments with different types of ResNets, we replace convolutional layers with LA-layers and address image recognition, object detection and instance segmentation tasks. We consistently demonstrate performance gains, despite having fewer FLOPs and training parameters. Our code is available at: https://github.com/hotfinda/LA-layer.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Pages2057-2062
Number of pages6
ISBN (Electronic)978-1-6654-6891-6
DOIs
Publication statusPublished - 25 Aug 2023

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
Volume2023-July
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Bibliographical note

Funding Information:
ACKNOWLEDGMENT This work is supported in part by the scholarship from China Scholarship Council (CSC) under the Grant No.202106290068.

Publisher Copyright:
© 2023 IEEE.

Keywords

  • Local attention
  • CNN
  • Deformable Kernel
  • Convolutional neural network

Fingerprint

Dive into the research topics of 'LA-layer: General local attention layer for full attention networks'. Together they form a unique fingerprint.

Cite this