Latest YouTube Video

Thursday, June 30, 2016

A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness. (arXiv:1606.09632v1 [cs.LG])

The aggregation and denoising of crowd labeled data is a task that has gained increased significance with the advent of crowdsourcing platforms and massive datasets. In this paper, we propose a permutation-based model for crowd labeled data that is a significant generalization of the common Dawid-Skene model, and introduce a new error metric by which to compare different estimators. Working in a high-dimensional non-asymptotic framework that allows both the number of workers and tasks to scale, we derive optimal rates of convergence for the permutation-based model. We show that the permutation-based model offers significant robustness in estimation due to its richness, while surprisingly incurring only a small additional statistical penalty as compared to the Dawid-Skene model. Finally, we propose a computationally-efficient method, called the OBI-WAN estimator, that is uniformly optimal over a class intermediate between the permutation-based and the Dawid-Skene models, and is uniformly consistent over the entire permutation-based model class. In contrast, the guarantees for estimators available in prior literature are sub-optimal over the original Dawid-Skene model.

DONATE to arXiv: One hundred percent of your contribution will fund improvements and new initiatives to benefit arXiv's global scientific community. Please join the Simons Foundation and our generous member organizations and research labs in supporting arXiv. https://goo.gl/QIgRpr



from cs.AI updates on arXiv.org http://ift.tt/29hkVNl
via IFTTT

No comments: