Latest YouTube Video

Wednesday, June 10, 2015

Contextual Bandits with Global Constraints and Objective. (arXiv:1506.03374v1 [cs.LG])

We consider the contextual version of a multi-armed bandit problem with global convex constraints and concave objective function. In each round, the outcome of pulling an arm is a context-dependent vector, and the global constraints require the average of these vectors to lie in a certain convex set. The objective is a concave function of this average vector. The learning agent competes with an arbitrary set of context-dependent policies. This problem is a common generalization of problems considered by Badanidiyuru et al. (2014) and Agrawal and Devanur (2014), with important applications. We give computationally efficient algorithms with near-optimal regret, generalizing the approach of Agarwal et al. (2014) for the non-constrained version of the problem. For the special case of budget constraints our regret bounds match those of Badanidiyuru et al. (2014), answering their main open question of obtaining a computationally efficient algorithm.



from cs.AI updates on arXiv.org http://ift.tt/1S61oeE
via IFTTT

No comments: