Transferring knowledge across a sequence of related tasks is an important challenge in reinforcement learning. Despite much encouraging empirical evidence that shows benefits of transfer, there has been very little theoretical analysis. In this paper, we study a class of lifelong reinforcement-learning problems: the agent solves a sequence of tasks modeled as finite Markov decision processes (MDPs), each of which is from a finite set of MDPs with the same state/action spaces and different transition/reward functions. Inspired by the need for cross-task exploration in lifelong learning, we formulate a novel online discovery problem and give an optimal learning algorithm to solve it. Such results allow us to develop a new lifelong reinforcement-learning algorithm, whose overall sample complexity in a sequence of tasks is much smaller than that of single-task learning, with high probability, even if the sequence of tasks is generated by an adversary. Benefits of the algorithm are demonstrated in a simulated problem.
from cs.AI updates on arXiv.org http://ift.tt/1F9CEsw
via IFTTT
No comments:
Post a Comment