Patrick McGuire: Length bias in Encoder Decoder Models and a Case for Global Conditioning. (arXiv:1606.03402v1 [cs.AI])

Sunday, June 12, 2016

Length bias in Encoder Decoder Models and a Case for Global Conditioning. (arXiv:1606.03402v1 [cs.AI])

Encoder-decoder networks are popular for probabilistic modeling sequences in many applications. These models use the power of the Long Short-Term Memory (LSTM) architecture to capture the full dependence among variables and are not subject to label bias of locally conditioned models that assume partial conditional independence. However in practice they exhibit a bias towards short sequences even when using a beam search to find the optimal sequence. Surprisingly, sometimes there is even a decline in accuracy with increasing the beam size.

In this paper we show that such phenomena are due to a discrepancy between the full sequence margin and the per-element margin enforced by the locally conditioned training objective of a encoder-decoder model. The discrepancy more adversely impacts long sequences, explaining the bias towards predicting short sequences.

For the case where the predicted sequences come from a closed set, we show that a globally conditioned model alleviates the above problems of encoder-decoder models. From a practical point of view, our proposed model also eliminates the need for a beam-search during inference, which reduces to an efficient dot-product based search in a vector-space.

from cs.AI updates on arXiv.org http://ift.tt/1Ym4FLz
via IFTTT

Patrick McGuire

Latest YouTube Video

Sunday, June 12, 2016

Length bias in Encoder Decoder Models and a Case for Global Conditioning. (arXiv:1606.03402v1 [cs.AI])

No comments:

Click to Show Support

Click to Show Support