Mercurial > pylearn
view weights.py @ 507:b8e6de17eaa6
modifs to smallNorb
author | James Bergstra <bergstrj@iro.umontreal.ca> |
---|---|
date | Wed, 29 Oct 2008 18:06:49 -0400 |
parents | 4f3c66146f17 |
children |
line wrap: on
line source
""" Routine to initialize weights. @note: We assume that numpy.random.seed() has already been performed. """ from math import pow, sqrt import numpy.random sqrt3 = sqrt(3.0) def random_weights(nin, nout, scale_by=1./sqrt3, power=0.5): """ Generate an initial weight matrix with nin inputs (rows) and nout outputs (cols). Each weight is chosen uniformly at random to be in range: [-scale_by*sqrt(3)/pow(nin,power), +scale_by*sqrt(3)/pow(nin,power)] @note: Play with scale_by, but reasonable values are <=1, maybe 1./sqrt3 power=0.5 is strongly recommanded (see below). Suppose these weights w are used in dot products as follows: output = w' input If w ~ Uniform(-r,r) and Var[input_i]=1 and x_i's are independent, then Var[w]=r2/3 Var[output] = Var[ sum_{i=1}^d w_i input_i] = d r2 / 3 To make sure that variance is not changed after the dot product, we therefore want Var[output]=1 and r = sqrt(3)/sqrt(d). This choice corresponds to the default values scale_by=sqrt(3) and power=0.5. More generally we see that Var[output] = Var[input] * scale_by. Now, if these are weights in a deep multi-layer neural network, we would like the top layers to be initially more linear, so as to let gradients flow back more easily (this is an explanation by Ronan Collobert). To achieve this we want scale_by smaller than 1. Ronan used scale_by=1/sqrt(3) (by mistake!) and got better results than scale_by=1 in the experiment of his ICML'2008 paper. Note that if we have a multi-layer network, ignoring the effect of the tanh non-linearity, the variance of the layer outputs would go down roughly by a factor 'scale_by' at each layer (making the layers more linear as we go up towards the output). """ return (numpy.random.rand(nin, nout) * 2.0 - 1) * scale_by * sqrt3 / pow(nin,power)