pylearn: weights.py comparison

comparison weights.py @ 490:4f3c66146f17

Moved weights.py out of sandbox

author	Joseph Turian <turian@gmail.com>
date	Tue, 28 Oct 2008 10:54:26 -0400
parents	sandbox/weights.py@3daabc7f94ff
children

comparison

equal deleted inserted replaced

-:bb6bdd3b7ff3
+:4f3c66146f17
+"""
+Routine to initialize weights.
+@note: We assume that numpy.random.seed() has already been performed.
+"""
+from math import pow, sqrt
+import numpy.random
+sqrt3 = sqrt(3.0)
+def random_weights(nin, nout, scale_by=1./sqrt3, power=0.5):
+"""
+Generate an initial weight matrix with nin inputs (rows) and nout
+outputs (cols).
+Each weight is chosen uniformly at random to be in range:
+[-scale_by*sqrt(3)/pow(nin,power), +scale_by*sqrt(3)/pow(nin,power)]
+@note: Play with scale_by, but reasonable values are <=1, maybe 1./sqrt3
+power=0.5 is strongly recommanded (see below).
+Suppose these weights w are used in dot products as follows:
+output = w' input
+If w ~ Uniform(-r,r) and Var[input_i]=1 and x_i's are independent, then
+Var[w]=r2/3
+Var[output] = Var[ sum_{i=1}^d w_i input_i] = d r2 / 3
+To make sure that variance is not changed after the dot product,
+we therefore want Var[output]=1 and r = sqrt(3)/sqrt(d).  This choice
+corresponds to the default values scale_by=sqrt(3) and power=0.5.
+More generally we see that Var[output] = Var[input] * scale_by.
+Now, if these are weights in a deep multi-layer neural network,
+we would like the top layers to be initially more linear, so as to let
+gradients flow back more easily (this is an explanation by Ronan Collobert).
+To achieve this we want scale_by smaller than 1.
+Ronan used scale_by=1/sqrt(3) (by mistake!) and got better results than scale_by=1
+in the experiment of his ICML'2008 paper.
+Note that if we have a multi-layer network, ignoring the effect of the tanh non-linearity,
+the variance of the layer outputs would go down roughly by a factor 'scale_by' at each
+layer (making the layers more linear as we go up towards the output).
+"""
+return (numpy.random.rand(nin, nout) * 2.0 - 1) * scale_by * sqrt3 / pow(nin,power)

Mercurial > pylearn

comparison weights.py @ 490:4f3c66146f17