We consider the problem of predicting the next observation given a sequence of past observations. We show that for any distribution over observations, if the mutual information between past observations and future observations is upper bounded by $I$, then a simple Markov model over the most recent $I/\epsilon$ observations can obtain KL error $\epsilon$ with respect to the optimal predictor with access to the entire past. For a Hidden Markov Model with $n$ states, $I$ is bounded by $\log n$, a quantity that does not depend on the mixing time. We also demonstrate that the simple Markov model cannot really be improved upon: First, a window length of $I/\epsilon$ ($I/\epsilon^2$) is information-theoretically necessary for KL error ($\ell_1$ error). Second, the $d^{\Theta(I/\epsilon)}$ samples required to accurately estimate the Markov model when observations are drawn from an alphabet of size $d$ is in fact necessary for any computationally tractable algorithm, assuming the hardness of strongly refuting a certain class of CSPs.
from cs.AI updates on arXiv.org http://ift.tt/2hbc5CZ
via IFTTT
No comments:
Post a Comment