Transformer Align Model

Shaojie Jiang

May 16, 2020 2 min read paper reading notes, Deep Learning, NLP

In this paper¹, transformer is trained to perform both translation and alignment tasks.

Application scenarios of word alignments in NMT

Generating bilingual lexica from parallel corpora
External dictionary assisted translation to improve translation of low frequency words
Trust, explanation, error analysis
Preserving style on webpages

Model design

The attention mechanism has long been motivated by word alignments in statistical machine translation, but ensure the alignment quality, additional supervision is needed.

There is a tendency that the attention probabilities from the penultimate layer of a normally trained transformer MT model corresponds to word alignments. Therefore, one attention head (clever!) in the penultimate layer is trained as the alignment head. The motivation of selecting only one attention head for alignment is to give the freedom to the model of choosing whether to rely more on the alignment or other attention heads.

How two train the alignment head

There are two approaches existing in the literature:

Label alignments beforehand and train the attention weights through KL-divergence.
Use the attentional vector to also predict either the target word or the properties such as POS tags of the target tokens.

In this work, an unsupervised training approach is used to train the alignment head. An alignment model is first trained on translation, then the penultimate layer attention weights are averaged and used as weak alignment supervision for a translation (and alignment) model. The alignment model is trained in both directions.

Previous work reported performance gain by introducing alignment supervision. In this paper, however, alignment performances are good, but translation results are moderate.

Jointly Learning to Align and Translate with Transformer Models ↩︎

Shaojie Jiang

Manager AI

My research interests include information retrieval, chatbots and conversational question answering.