Sparse databases with hundreds of variables are commonplace. In this setting, it is both statistically and computationally challenging to detect true predictive relationships between variables and also to suppress false positives. This paper proposes a new approach to dependency detection that combines probabilistic programming, information theory, and non-parametric Bayesian modeling. The key ideas are to (i) build an ensemble of joint probability models for the whole database via approximate posterior inference in CrossCat, a non-parametric factorial mixture; (ii) identify independencies by analyzing model structures; and (iii) report the distribution on conditional mutual information induced by posterior uncertainty over the ensemble of models. This paper presents experiments showing that the approach finds relationships that pairwise correlation misses, including context-specific independencies, on databases of mathematics exam scores and global indicators of macroeconomic development.
from cs.AI updates on arXiv.org http://ift.tt/2ePLlUt
via IFTTT
No comments:
Post a Comment