We review the task of Sentence Pair Scoring, popular in the literature in various forms --- slanted as Answer Sentence Selection, Paraphrasing, Semantic Text Scoring, Next Utterance Ranking, Recognizing Textual Entailment or e.g.\ a component of Memory Networks.
We argue that all such tasks are similar from the model perspective and propose new baselines by comparing the performance of common IR metrics and popular convolutional, recurrent and attention-based neural models across many Sentence Pair Scoring tasks and datasets. We discuss the problem of evaluating randomized models, propose a statistically grounded methodology, and attempt to improve comparisons by releasing new datasets that are much harder than some of the currently used well explored benchmarks.
To address the current research fragmentation in a future-proof way, we introduce a unified open source software framework with easily pluggable models and tasks, allowing easy evaluation of sentence models on a wide range of semantic natural language datasets. This allows us to outline a path towards a universal learned semantic model for machine reading tasks. We support this plan by experiments that demonstrate reusability of trained sentence models across tasks and corpora of very different nature.
from cs.AI updates on arXiv.org http://ift.tt/1S1WM78
via IFTTT
No comments:
Post a Comment