Patrick McGuire: Authorship Attribution Using a Neural Network Language Model. (arXiv:1602.05292v1 [cs.CL])

Wednesday, February 17, 2016

Authorship Attribution Using a Neural Network Language Model. (arXiv:1602.05292v1 [cs.CL])

In practice, training language models for individual authors is often expensive because of limited data resources. In such cases, Neural Network Language Models (NNLMs), generally outperform the traditional non-parametric N-gram models. Here we investigate the performance of a feed-forward NNLM on an authorship attribution problem, with moderate author set size and relatively limited data. We also consider how the text topics impact performance. Compared with a well-constructed N-gram baseline method with Kneser-Ney smoothing, the proposed method achieves nearly 2:5% reduction in perplexity and increases author classification accuracy by 3:43% on average, given as few as 5 test sentences. The performance is very competitive with the state of the art in terms of accuracy and demand on test data. The source code, preprocessed datasets, a detailed description of the methodology and results are available at http://ift.tt/1mICOp3.

Donate to arXiv

from cs.AI updates on arXiv.org http://ift.tt/1Oh5oUR
via IFTTT

Patrick McGuire

Latest YouTube Video

Wednesday, February 17, 2016

Authorship Attribution Using a Neural Network Language Model. (arXiv:1602.05292v1 [cs.CL])

No comments:

Click to Show Support

Click to Show Support