Patrick McGuire: Using Options for Long-Horizon Off-Policy Evaluation. (arXiv:1703.03453v1 [cs.AI])

Sunday, March 12, 2017

Using Options for Long-Horizon Off-Policy Evaluation. (arXiv:1703.03453v1 [cs.AI])

Evaluating a policy by deploying it in the real world can be risky and costly. Off-policy evaluation (OPE) algorithms use historical data collected from running a previous policy to evaluate a new policy, which provides a means for evaluating a policy without requiring it to ever be deployed. Importance sampling is a popular OPE method because it is robust to partial observability and works with continuous states and actions. However, we show that the amount of historical data required by importance sampling can scale exponentially with the horizon of the problem: the number of sequential decisions that are made. We propose using policies over temporally extended actions, called options, to address this long-horizon problem. We show theoretically and experimentally that combining importance sampling with options-based policies can significantly improve performance for long-horizon problems.

from cs.AI updates on arXiv.org http://ift.tt/2lRBaGy
via IFTTT

Patrick McGuire

Latest YouTube Video

Sunday, March 12, 2017

Using Options for Long-Horizon Off-Policy Evaluation. (arXiv:1703.03453v1 [cs.AI])

No comments:

Click to Show Support

Click to Show Support