The two most fundamental problems in computational game theory are computing a Nash equilibrium and learning to exploit opponents given observations of their play (aka opponent exploitation). The latter is perhaps even more important than the former: Nash equilibrium does not have a compelling theoretical justification in game classes other than two-player zero-sum, and furthermore for all games one can potentially do better by exploiting perceived weaknesses of the opponent than by following a static equilibrium strategy throughout the match. The natural setting for opponent exploitation is the Bayesian setting where we have a prior model that is integrated with observations to create a posterior opponent model that we respond to. The most natural, and a well-studied prior distribution is the Dirichlet distribution. An exact polynomial-time algorithm is known for best-responding to the posterior distribution for an opponent assuming a Dirichlet prior with multinomial sampling in the case of normal-form games; however, for the case of imperfect-information games the best known algorithm is a sampling algorithm based on approximating an infinite integral without theoretical guarantees. The main result is the first exact algorithm for accomplishing this in imperfect-information games. We also present an algorithm for another natural setting where the prior is the uniform distribution over a polyhedron.
from cs.AI updates on arXiv.org http://ift.tt/22ee9sx
via IFTTT
No comments:
Post a Comment