Patrick McGuire: Online Sequence-to-Sequence Reinforcement Learning for Open-Domain Conversational Agents. (arXiv:1612.03929v1 [cs.CL])

Tuesday, December 13, 2016

Online Sequence-to-Sequence Reinforcement Learning for Open-Domain Conversational Agents. (arXiv:1612.03929v1 [cs.CL])

We propose an online, end-to-end, deep reinforcement learning technique to develop generative conversational agents for open-domain dialogue. We use a unique combination of offline two-phase supervised learning and online reinforcement learning with human users to train our agent. While most existing research proposes hand-crafted and develop-defined reward functions for reinforcement, we devise a novel reward mechanism based on a variant of Beam Search and one-character user-feedback at each step. Experiments show that our model, when trained on a small and shallow Seq2Seq network, successfully promotes the generation of meaningful, diverse and interesting responses, and can be used to train agents with customized personas and conversational styles.

from cs.AI updates on arXiv.org http://ift.tt/2hCSEDX
via IFTTT

Patrick McGuire

Latest YouTube Video

Tuesday, December 13, 2016

Online Sequence-to-Sequence Reinforcement Learning for Open-Domain Conversational Agents. (arXiv:1612.03929v1 [cs.CL])

No comments:

Click to Show Support

Click to Show Support