Patrick McGuire: A K-fold Method for Baseline Estimation in Policy Gradient Algorithms. (arXiv:1701.00867v1 [cs.AI])

Wednesday, January 4, 2017

A K-fold Method for Baseline Estimation in Policy Gradient Algorithms. (arXiv:1701.00867v1 [cs.AI])

The high variance issue in unbiased policy-gradient methods such as VPG and REINFORCE is typically mitigated by adding a baseline. However, the baseline fitting itself suffers from the underfitting or the overfitting problem. In this paper, we develop a K-fold method for baseline estimation in policy gradient algorithms. The parameter K is the baseline estimation hyperparameter that can adjust the bias-variance trade-off in the baseline estimates. We demonstrate the usefulness of our approach via two state-of-the-art policy gradient algorithms on three MuJoCo locomotive control tasks.

from cs.AI updates on arXiv.org http://ift.tt/2iRo2Mt
via IFTTT

Patrick McGuire

Latest YouTube Video

Wednesday, January 4, 2017

A K-fold Method for Baseline Estimation in Policy Gradient Algorithms. (arXiv:1701.00867v1 [cs.AI])

No comments:

Click to Show Support

Click to Show Support