Latest YouTube Video

Wednesday, February 10, 2016

Iterative Hierarchical Optimization for Misspecified Problems (IHOMP). (arXiv:1602.03348v1 [cs.LG])

Reinforcement Learning (RL) aims to learn an optimal policy for a Markov Decision Process (MDP). For complex, high-dimensional MDPs, it may only be feasible to represent the policy with function approximation. If the policy representation used cannot represent good policies, the problem is misspecified and the learned policy may be far from optimal. We introduce IHOMP as an approach for solving misspecified problems. IHOMP iteratively refines a set of specialized policies based on a limited representation. We refer to these policies as policy threads. At the same time, IHOMP stitches these policy threads together in a hierarchical fashion to solve a problem that was otherwise misspecified. We prove that IHOMP enjoys theoretical convergence guarantees and extend IHOMP to exploit Option Interruption (OI) enabling it to learn where policy threads can be reused. Our experiments demonstrate that IHOMP can find near-optimal solutions to otherwise misspecified problems and that OI can further improve the solutions.

Donate to arXiv



from cs.AI updates on arXiv.org http://ift.tt/1KcDiiK
via IFTTT

No comments: