In this paper we study an extension of the stochastic multi-armed bandit (MAB) framework, where in each round a player can play multiple actions and receive a stochastic reward which depends on the actions played. This problem is motivated by applications in recommendation problems where there are multiple populations of users and hence no single choice might be good for the entire population. We specifically look at bandit problems where we are allowed to make two choices in each round. We provide algorithms for this problem in both the noiseless and noisy case. Our algorithms are computationally efficient and have provable sample complexity guarantees. In the process of establishing sample complexity guarantees for our algorithms, we establish new results regarding the Nystr{\"o}m method which can be of independent interest. We supplement our theoretical results with experimental comparisons.
from cs.AI updates on arXiv.org http://ift.tt/1UcmJGB
via IFTTT
No comments:
Post a Comment