Skip to Main content Skip to Navigation
Conference papers

Relative Positional Encoding for Transformers with Linear Complexity

Antoine Liutkus 1 Ondřej Cífka 2, 3, 4 Shih-Lun Wu 5, 6, 7 Umut Şimşekli 8, 9 Yi-Hsuan Yang 5, 7 Gael Richard 2, 3, 4
1 ZENITH - Scientific Data Management
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier, CRISAM - Inria Sophia Antipolis - Méditerranée
4 S2A - Signal, Statistique et Apprentissage
LTCI - Laboratoire Traitement et Communication de l'Information
9 SIERRA - Statistical Machine Learning and Parsimony
DI-ENS - Département d'informatique de l'École normale supérieure, CNRS - Centre National de la Recherche Scientifique, Inria de Paris
Abstract : Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.
Complete list of metadata

https://hal.telecom-paris.fr/hal-03256451
Contributor : Ondřej Cífka Connect in order to contact the contributor
Submitted on : Thursday, June 10, 2021 - 11:42:27 AM
Last modification on : Tuesday, October 19, 2021 - 11:16:45 AM
Long-term archiving on: : Saturday, September 11, 2021 - 6:34:18 PM

File

spe.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03256451, version 1
  • ARXIV : 2105.08399

Citation

Antoine Liutkus, Ondřej Cífka, Shih-Lun Wu, Umut Şimşekli, Yi-Hsuan Yang, et al.. Relative Positional Encoding for Transformers with Linear Complexity. ICML 2021 - 38th International Conference on Machine Learning, Jul 2021, Virtual Only, United States. ⟨hal-03256451⟩

Share

Metrics

Record views

2478

Files downloads

84