Patrick McGuire: Policy Gradient for Coherent Risk Measures. (arXiv:1502.03919v1 [cs.AI])

Sunday, February 15, 2015

Policy Gradient for Coherent Risk Measures. (arXiv:1502.03919v1 [cs.AI])

We provide sampling-based algorithms for optimization under a coherent-risk objective. The class of coherent-risk measures is widely accepted in finance and operations research, among other fields, and encompasses popular risk-measures such as the conditional value at risk (CVaR) and the mean-semi-deviation. Our approach is suitable for problems in which the tunable parameters control the distribution of the cost, such as in reinforcement learning with a parameterized policy; such problems cannot be solved using previous approaches. We consider both static risk measures, and time-consistent dynamic risk measures. For static risk measures, our approach is in the spirit of policy gradient algorithms, while for the dynamic risk measures our approach is actor-critic style.

from cs.AI updates on arXiv.org http://ift.tt/17f2i5O

via IFTTT

Patrick McGuire

Latest YouTube Video

Sunday, February 15, 2015

Policy Gradient for Coherent Risk Measures. (arXiv:1502.03919v1 [cs.AI])

No comments:

Click to Show Support

Click to Show Support