Latest YouTube Video

Sunday, November 29, 2015

Reinforcement Learning with Parameterized Actions. (arXiv:1509.01644v4 [cs.AI] UPDATED)

We introduce a model-free algorithm for learning in Markov decision processes with parameterized actions-discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goal-scoring and Platform domains.



from cs.AI updates on arXiv.org http://ift.tt/1hRt9KE
via IFTTT

No comments: