Mercurial > pylearn
annotate sandbox/weights.py @ 484:3daabc7f94ff
Added Yoshua's explanation
author | Joseph Turian <turian@gmail.com> |
---|---|
date | Tue, 28 Oct 2008 01:33:27 -0400 |
parents | 23221eefb70e |
children |
rev | line source |
---|---|
466
23221eefb70e
Added pylearn.sandbox.weights.random_weights
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff
changeset
|
1 """ |
23221eefb70e
Added pylearn.sandbox.weights.random_weights
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff
changeset
|
2 Routine to initialize weights. |
23221eefb70e
Added pylearn.sandbox.weights.random_weights
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff
changeset
|
3 |
23221eefb70e
Added pylearn.sandbox.weights.random_weights
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff
changeset
|
4 @note: We assume that numpy.random.seed() has already been performed. |
23221eefb70e
Added pylearn.sandbox.weights.random_weights
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff
changeset
|
5 """ |
23221eefb70e
Added pylearn.sandbox.weights.random_weights
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff
changeset
|
6 |
484
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
7 from math import pow, sqrt |
466
23221eefb70e
Added pylearn.sandbox.weights.random_weights
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff
changeset
|
8 import numpy.random |
484
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
9 |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
10 sqrt3 = sqrt(3.0) |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
11 def random_weights(nin, nout, scale_by=1./sqrt3, power=0.5): |
466
23221eefb70e
Added pylearn.sandbox.weights.random_weights
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff
changeset
|
12 """ |
23221eefb70e
Added pylearn.sandbox.weights.random_weights
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff
changeset
|
13 Generate an initial weight matrix with nin inputs (rows) and nout |
23221eefb70e
Added pylearn.sandbox.weights.random_weights
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff
changeset
|
14 outputs (cols). |
23221eefb70e
Added pylearn.sandbox.weights.random_weights
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff
changeset
|
15 Each weight is chosen uniformly at random to be in range: |
484
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
16 [-scale_by*sqrt(3)/pow(nin,power), +scale_by*sqrt(3)/pow(nin,power)] |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
17 @note: Play with scale_by, but reasonable values are <=1, maybe 1./sqrt3 |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
18 power=0.5 is strongly recommanded (see below). |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
19 |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
20 Suppose these weights w are used in dot products as follows: |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
21 output = w' input |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
22 If w ~ Uniform(-r,r) and Var[input_i]=1 and x_i's are independent, then |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
23 Var[w]=r2/3 |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
24 Var[output] = Var[ sum_{i=1}^d w_i input_i] = d r2 / 3 |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
25 To make sure that variance is not changed after the dot product, |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
26 we therefore want Var[output]=1 and r = sqrt(3)/sqrt(d). This choice |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
27 corresponds to the default values scale_by=sqrt(3) and power=0.5. |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
28 More generally we see that Var[output] = Var[input] * scale_by. |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
29 |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
30 Now, if these are weights in a deep multi-layer neural network, |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
31 we would like the top layers to be initially more linear, so as to let |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
32 gradients flow back more easily (this is an explanation by Ronan Collobert). |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
33 To achieve this we want scale_by smaller than 1. |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
34 Ronan used scale_by=1/sqrt(3) (by mistake!) and got better results than scale_by=1 |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
35 in the experiment of his ICML'2008 paper. |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
36 Note that if we have a multi-layer network, ignoring the effect of the tanh non-linearity, |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
37 the variance of the layer outputs would go down roughly by a factor 'scale_by' at each |
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
38 layer (making the layers more linear as we go up towards the output). |
466
23221eefb70e
Added pylearn.sandbox.weights.random_weights
Joseph Turian <turian@iro.umontreal.ca>
parents:
diff
changeset
|
39 """ |
484
3daabc7f94ff
Added Yoshua's explanation
Joseph Turian <turian@gmail.com>
parents:
466
diff
changeset
|
40 return (numpy.random.rand(nin, nout) * 2.0 - 1) * scale_by * sqrt3 / pow(nin,power) |