Latest YouTube Video

Thursday, December 1, 2016

Optimizing Quantiles in Preference-based Markov Decision Processes. (arXiv:1612.00094v1 [cs.AI])

In the Markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the quantile criterion. Both finite and infinite horizons are considered. Finally we experimentally evaluate our approach on random MDPs and on a data center control problem.



from cs.AI updates on arXiv.org http://ift.tt/2gMdqQt
via IFTTT

No comments: