Patrick McGuire: Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning. (arXiv:1503.09105v3 [math.DS] UPDATED)

Thursday, April 30, 2015

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning. (arXiv:1503.09105v3 [math.DS] UPDATED)

We present for the first time an asymptotic convergence analysis of two-timescale stochastic approximation driven by controlled Markov noise. In particular, both the faster and slower recursions have non-additive Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time-scales that are defined in terms of the invariant probability measures associated with the controlled Markov processes. Finally, we show how to solve the off-policy convergence problem for temporal difference learning with linear function approximation using our results.

from cs.AI updates on arXiv.org http://ift.tt/1EA6KLi
via IFTTT

Patrick McGuire

Latest YouTube Video

Thursday, April 30, 2015

Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning. (arXiv:1503.09105v3 [math.DS] UPDATED)

No comments:

Click to Show Support

Click to Show Support