Patrick McGuire: Fundamental differences between Dropout and Weight Decay in Deep Networks. (arXiv:1602.04484v2 [cs.LG] UPDATED)

Monday, March 7, 2016

Fundamental differences between Dropout and Weight Decay in Deep Networks. (arXiv:1602.04484v2 [cs.LG] UPDATED)

We study dropout and weight decay applied to deep networks with rectified linear units and the quadratic loss. We show how using dropout in this context can be viewed as adding a regularization penalty term that grows exponentially with the depth of the network when the more traditional weight decay penalty grows polynomially. We then show how this difference affects the inductive bias of algorithms using one regularizer or the other: we describe a random source of data that dropout is unwilling to fit, but that is compatible with the inductive bias of weight decay. We also describe a source that is compatible with the inductive bias of dropout, but not weight decay. We also show that, in contrast with the case of generalized linear models, when used with deep networks with rectified linear units and the quadratic loss, the regularization penalty of dropout (a) is not just a function of the independent variables, but also depends on the response variables, and (b) can be negative. Finally, the dropout penalty can drive a learning algorithm to use negative weights even when trained with monotone training data.

Donate to arXiv

from cs.AI updates on arXiv.org http://ift.tt/1KSFax2
via IFTTT

Patrick McGuire

Latest YouTube Video

Monday, March 7, 2016

Fundamental differences between Dropout and Weight Decay in Deep Networks. (arXiv:1602.04484v2 [cs.LG] UPDATED)

No comments:

Click to Show Support

Click to Show Support