We propose an online, end-to-end, deep reinforcement learning technique to develop generative conversational agents for open-domain dialogue. We use a unique combination of offline two-phase supervised learning and online reinforcement learning with human users to train our agent. While most existing research proposes hand-crafted and develop-defined reward functions for reinforcement, we devise a novel reward mechanism based on a variant of Beam Search and one-character user-feedback at each step. Experiments show that our model, when trained on a small and shallow Seq2Seq network, successfully promotes the generation of meaningful, diverse and interesting responses, and can be used to train agents with customized personas and conversational styles.
from cs.AI updates on arXiv.org http://ift.tt/2hCSEDX
via IFTTT
No comments:
Post a Comment