Patrick McGuire: Detecting Dependencies in High-Dimensional, Sparse Databases Using Probabilistic Programming and Non-parametric Bayes. (arXiv:1611.01708v1 [stat.ML])

Monday, November 7, 2016

Detecting Dependencies in High-Dimensional, Sparse Databases Using Probabilistic Programming and Non-parametric Bayes. (arXiv:1611.01708v1 [stat.ML])

Sparse databases with hundreds of variables are commonplace. In this setting, it is both statistically and computationally challenging to detect true predictive relationships between variables and also to suppress false positives. This paper proposes a new approach to dependency detection that combines probabilistic programming, information theory, and non-parametric Bayesian modeling. The key ideas are to (i) build an ensemble of joint probability models for the whole database via approximate posterior inference in CrossCat, a non-parametric factorial mixture; (ii) identify independencies by analyzing model structures; and (iii) report the distribution on conditional mutual information induced by posterior uncertainty over the ensemble of models. This paper presents experiments showing that the approach finds relationships that pairwise correlation misses, including context-specific independencies, on databases of mathematics exam scores and global indicators of macroeconomic development.

from cs.AI updates on arXiv.org http://ift.tt/2ePLlUt
via IFTTT

Patrick McGuire

Latest YouTube Video

Monday, November 7, 2016

Detecting Dependencies in High-Dimensional, Sparse Databases Using Probabilistic Programming and Non-parametric Bayes. (arXiv:1611.01708v1 [stat.ML])

No comments:

Click to Show Support

Click to Show Support