北京伊人99主页平台系统 Junfan Chen--Home-- On Scalar Embedding of Relative Positions in Attention Models-伊人99

Personal Information

MORE+

Associate Professor

Supervisor of Master's Candidates

E-Mail:

Date of Employment:2025-05-21

School/Department:软件学院

Education Level:博士研究生

Business Address:新主楼C808,G517

Gender:Male

Contact Information:18810578537

Degree:博士

Status:Employed

Alma Mater:北京伊人99

Discipline:Software Engineering
Computer Science and Technology

Recommended MA Supervisor

Junfan Chen

Gender:Male

Education Level:博士研究生

Alma Mater:北京伊人99

Paper

Current position: Home / Paper

On Scalar Embedding of Relative Positions in Attention Models

Journal:Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI), CCF-A
Abstract:Attention with positional encoding has been demonstrated as a powerful component in modern neural network models, such as transformers. However, why positional encoding works well in attention models remains largely unanswered. In this paper, we study the scalar relative positional encoding (SRPE) proposed in the T5 transformer. Such an encoding method has two features. First, it uses a scalar to embed relative positions. Second, the relative positions are bucketized using a fixed heuristic algorithm, and positions in the same bucket share the same embedding. In this work, we show that SRPE in attention has an elegant probabilistic interpretation. More specifically, the positional encoding serves to produce a prior distribution for the attended positions. The resulting attentive distribution can be viewed as a posterior distribution of the attended position given the observed input sequence. Furthermore, we propose a new SRPE (AT5) that adopts a learnable bucketization protocol and automatically adapts to the dependency range specific to the learning task. Empirical studies show that the AT5 achieves superior performance than the T5's SRPE.
Co-author:Junshuang Wu,Richong Zhang, Yongyi Mao,Junfan Chen
Indexed by:国际学术会议
Page Number:14050--14057
Translation or Not:no
Date of Publication:2021-01-01

Pre One:Uncover the Ground-Truth Relations in Distant Supervision: A Neural Expectation-Maximization Framework

Next One:Hypernym Discovery via a Recurrent Mapping Model

Personal Information

Recommended MA Supervisor

Junfan Chen

Paper