comparison weights.py @ 490:4f3c66146f17

Moved weights.py out of sandbox
author Joseph Turian <turian@gmail.com>
date Tue, 28 Oct 2008 10:54:26 -0400
parents sandbox/weights.py@3daabc7f94ff
children
comparison
equal deleted inserted replaced
489:bb6bdd3b7ff3 490:4f3c66146f17
1 """
2 Routine to initialize weights.
3
4 @note: We assume that numpy.random.seed() has already been performed.
5 """
6
7 from math import pow, sqrt
8 import numpy.random
9
10 sqrt3 = sqrt(3.0)
11 def random_weights(nin, nout, scale_by=1./sqrt3, power=0.5):
12 """
13 Generate an initial weight matrix with nin inputs (rows) and nout
14 outputs (cols).
15 Each weight is chosen uniformly at random to be in range:
16 [-scale_by*sqrt(3)/pow(nin,power), +scale_by*sqrt(3)/pow(nin,power)]
17 @note: Play with scale_by, but reasonable values are <=1, maybe 1./sqrt3
18 power=0.5 is strongly recommanded (see below).
19
20 Suppose these weights w are used in dot products as follows:
21 output = w' input
22 If w ~ Uniform(-r,r) and Var[input_i]=1 and x_i's are independent, then
23 Var[w]=r2/3
24 Var[output] = Var[ sum_{i=1}^d w_i input_i] = d r2 / 3
25 To make sure that variance is not changed after the dot product,
26 we therefore want Var[output]=1 and r = sqrt(3)/sqrt(d). This choice
27 corresponds to the default values scale_by=sqrt(3) and power=0.5.
28 More generally we see that Var[output] = Var[input] * scale_by.
29
30 Now, if these are weights in a deep multi-layer neural network,
31 we would like the top layers to be initially more linear, so as to let
32 gradients flow back more easily (this is an explanation by Ronan Collobert).
33 To achieve this we want scale_by smaller than 1.
34 Ronan used scale_by=1/sqrt(3) (by mistake!) and got better results than scale_by=1
35 in the experiment of his ICML'2008 paper.
36 Note that if we have a multi-layer network, ignoring the effect of the tanh non-linearity,
37 the variance of the layer outputs would go down roughly by a factor 'scale_by' at each
38 layer (making the layers more linear as we go up towards the output).
39 """
40 return (numpy.random.rand(nin, nout) * 2.0 - 1) * scale_by * sqrt3 / pow(nin,power)